The University of Western Australia
Computer Science and Software Engineering
 
 

Department of Computer Science and Software Engineering

CITS3002 Computer Networks

Practical project 2020 - Getting Started

(due 5pm Fri 22nd May - end of week 11)

See also: the Project and Clarifications

There are two significant, and different, networking aspects to be addressed in this project. Once those aspects are designed, implemented, and tested, attention can move to designing, implementing, and testing the business logic of the project.

  1. Each station server needs to support queries from a web-browser, that is both receive and respond to queries, using the TCP/IP transport protocol. When using TCP, a long-term connection is established between two processes (between your web-browser and any of your "identical" station servers), until either end-point closes the connection (and if either of them crashes, the connection will be closed, but the other end may not know why).

    The project requires that you use the Hyper-Text Transport Protocol (HTTP) and Hyper-Text Meta-Language (HTML) protocols within a Transmission Control Protocol (TCP) over Internet Protocol (IP) connection for communication between the web-browser and each station server. Very simple HTTP exchanges and simple HTML content are sufficient. Your HTML content (text) is embedded within the payload of HTTP, which is embedded within the payload of TCP, which is embedded within the payload of IP, which (normally) would be embedded within the payload of an Ethernet frame. However, our project's network traffic never leaves your single computer, so the IP packets are queued and exchanged through your computer's RAM.

    Network traffic across a TCP/IP connection is bi-directional and reliable - one end writes a query, the other end reads the (uncorrupted) arriving query, determines/calculates a response, writes that response, and the other (original) end reads it. Both ends can write data at the same time - the two messages do not interfere with each other when 'crossing'.

  2. Each station server also communicates with other station servers using the UDP/IP datagram-based protocol. While no long-lasting connection is established between two communicating partners using UDP/IP, they can communicate by writing and reading self-contained messages. Each message (the payload of a datagram) will arrive at only the intended destination zero or more times. If a datagram arrives, it will be uncorrupted. Each datagram is individually addressed, so the same message (payload) can be sent to multiple destinations in separate datagrams by (effectively) just changing the destination address.

Communication between web-browser and a station server

Let's see what a web-browser (acting as a client) sends to a server, sending its traffic using HTTP. Firstly, we'll need to know, or choose, a port-number for the communication. A port number is a small integer (on a server/computer) identifying a 'communication endpoint' able to receive some network communication from another computer. Both the sending and receiving processes may be on the same computer, and the operating system kernel (internally) uses the port-number to determine if there's a running process able/wanting to receive the communication and, if so, the kernel delivers the communication to that process.

OK, let's find an available port-number - meaning, one not currently in use by another process.
Download this Linux/macOS bash shellscript portsinuse.sh (which will probably download when you click on the link), make it executable, and run it:

shell>  chmod +x portsinuse.sh
shell>  ./portsinuse.sh

Choose a port-number not listed there (and above 1023), say 4444, and run the command:

shell>  nc -l 4444

You've just started a new server process, listening on port 4444, and expecting a connection and network traffic using TCP/IP (the default). Ahh, this project looks easy!

Now let's use your web-browser, acting as a TCP/IP-based client, to connect to that nc server. Because (assuming) your web-browser and the nc process are running on the same computer, each will be able to communicate using the 'generic' hostname of localhost (or, depending on your computer's configuration, you may need to use the IP address 127.0.0.1).

Open a new tab in your browser, and in the address bar enter the URL https://localhost:4444/ .
(that's http colon, not https colon - don't believe how your browser renders this webpage!)

In the shell window of your nc process you should immediately see what the browser has sent to nc, commencing with:

GET / HTTP/1.1
Host: localhost:4444
.....

The 8 or so lines you see is a basic HTTP header (version 1.1), followed by a blank line. Your web-browser is still waiting for a reply from nc, so we need to also reply using HTTP. Type (cut-and-paste) in your shell window:

HTTP/1.1 200 OK
Content-Type: text/html
Connection: Closed

<html>
<body>
<h1>Hello from nc!</h1>
</body>
</html>

and then control-C or control-D to terminate the TCP/IP connection between nc and web-browser. You've replied with an HTTP header (version 1.1), indicating that the 'request' was valid and understood (reply code 200), indicated that the type of your reply will be HTML (case is insigificant), and eventually sent 5 lines of HTML. You should see the reply rendered in your browser's tab.

Now, repeat the whole exercise, re-running nc and this time requesting the URL https://localhost:4444/?to=Perth_Stn into your browser. You should see this new request arrive with:

GET /?to=Perth_Stn HTTP/1.1
Host: localhost:4444
.....

Hmmm now, if only nc was actually our own station server program, written in Java, or C, or Python, we might know what to do with the request, and what to write back as our reply!

While this part of the project involves sending an HTTP request over a TCP/IP connection, and sending an HTTP reply carrying an HTML message back across the same TCP/IP connection, we've almost finished it without having written a line of code. Of course, we need to replicate some of the work that nc performed in establishing the server's connection on our chosen port-number (4444), and that's where our first bit of coding becomes programming-language specific.

Helpful language-specific tutorials on TCP and UDP

  1. Minimal tutorials on HTTP - Requests and HTTP - Responses, both from Tutorialspoint.
  2. For C and C++ developers - A Guide to Network Programming using Internet sockets, by Brian "Beej" Hall.
  3. For Python developers - Transports and Protocols, socket - Low-level networking interface, and socketserver - A framework for network servers, all from python.org.
  4. For Java developers - All About Sockets, from Oracle,
    and Java - Networking, from Tutorialspoint.

Defining and executing the transport network

Consider the following excellent diagram of a simple 4-station network:

 

  • Each station has a distinct name
  • Each station is directly connected to others by a (physical) bus or train route
  • Each station has a URL enabling a TCP/IP connection from a web-browser
  • Each URL specifies the unencrypted HTTP protocol, not encrypted HTTPS
  • Each station has a port able to receive UDP/IP datagrams
  • Stations only know other station's UDP/IP ports if they're directly connected by a bus or train route
  • Because our project runs on a single computer, all URLs refer to the hostname localhost
  • Because our project runs on a single computer, all URLs, and TCP and UDP sockets will use distinct ports

The above network requires 4 distinct operating system processes to execute, each written in your choice of two programming languages. We can invoke all of these processes from the command-line, or from a shellscript, as below. Each process receives a small number of command-line arguments, providing its station's name, its unique port for TCP/IP-based communication with a web-browser, its unique port for UDP/IP-based communication with other stations using datagrams, the UDP/IP-based port(s) of directly connected (neighbour) stations. Notice, also, that all processes have been 'started in the background', because none needs to remain connected to the invoking keyboard.

shell>  ./station North_Terminus 2210 2608 2606 &
shell>  ./station East_Station 2230 2606 2608 2602 2605 &
shell>  ./station.py West_Station 2220 2602 2605 2606 &
shell>  ./station.py South_Busport 2240 2605 2606 2602 &

When invoked, each station process first initialises its TCP and UDP ports, and then reads a comma-separated textfile providing its own timetable information. While your servers will need to read in and parse the contents of the timetable files, you can assume that all their contents are correct (time-formats are correct, departure times precede arrival times, destination station names exist, etc). For example, the file  tt-North_Terminus  may contain lines of the form:

North_Terminus,-31.8448,115.7963
07:15,12,Stop1,07:54,East_Station
08:15,12,Stop1,08:46,East_Station
....
14:20,12,Stop1,14:50,East_Station

The first line contains the station's name (also forming the filename), and latitude and longitude (which you can ignore, though some students wanted to drap a map on their webpages! :-) The second and subsequent lines each define a single bus connection leaving the station. The second line can be read as: "At 7.15am, bus number 12 leaves Stop1, arriving at 7.54am at East_Station". Note that there is no networking information (protocols or ports) in the file, and that North_Terminus does not know anything about West_Station or South_Busport.

The file  tt-East_Station  will have exactly the same format, but likely have more timetable entries because it has more direct connections.

East_Station,-31.9442,115.8771
05:15,Line1,PlatformB,06:10,West_Station
05:22,180,StopA,05:51,South_Busport
06:15,Line1,PlatformB,07:18,West_Station
06:16,13,StopC,06:51,North_Terminus
....

Simplifying the building and execution of your transport networks

Let's assume that we have a textfile name  adjacency  holding the adjacency matrix of our transport network, such as:

North_Terminus  East_Station
East_Station    North_Terminus South_Busport West_Station
West_Station    East_Station   South_Busport
South_Busport   West_Station   East_Station

The first word on each line provides a station name, and all following words are the names of neighbouring stations. In the above example, all links are bidirectional. Now, the shellscript assignports.sh (which will probably download when you click on the link), will produce a new shellscript with unused TCP and UDP ports assigned for each command. Running:

shell>  ./assignports.sh adjacency startstations.sh

will produce the shellscript named  startstations.sh 

./station North_Terminus 4001 4002 4004 &
./station East_Station 4003 4004 4002 4008 4006 &
./station West_Station 4005 4006 4004 4008 &
./station South_Busport 4007 4008 4006 4004 &

Now, with all commands to start each server in a single shellscript, we can use makeform.sh (it will probably download when you click on the link), to find the station names and TCP ports in the starting script, and produce a webpage that can be used to query each server. Running:

shell>  ./makeform.sh startstations.sh myform.html

Communication between station servers

Station servers communicate with each other using UDP/IP datagrams. Servers do not establish or maintain connections between each other, and send datagrams on-demand. Stations know the ports on which their neighbouring stations are expecting datagrams, but not the ports of other stations. Datagrams will be used to transmit both queries and replies between neighbouring stations.

The contents of each UDP datagram (in its payload) will need to be formatted so that they can be unambiguously understood by its receiver. Their format does not need to match that of the HTTP or HTML protocols (that would be overkill for our requirements). However, you will be defining a protocol understood by your stations.

Be warned that, while two Java programs can exchange Java objects across a network, programs written in other programming languages will not be able to parse/understand Java objects.

Using the Transperth GTFS dataset

Perth's Public Transport Authority (PTA) provides public access to its scheduled times, stop locations, and route information from its webpage www.transperth.wa.gov.au/About/Spatial-Data-Access. You may download your own copy of the data (about 90MB) by clicking on the first link "By downloading the data you are agreeing to the terms of the License..."

The data is released as a collection of inter-related textfiles following the Google Transit Feed Specification (GTFS), which is also used by many other public transport companies, worldwide.

Finally. The shellscript buildtransperthtimetables.sh (it will probably download when you click on the link), reads the Transperth files and reduces the information to a set of more manageable timetable files, suitable for this project. Read the following points carefully:

  • Do not attempt to use the timetable files produced by this shellscript unless you have testing your project with a dataset of far fewer, smaller files, such as the 4-station network, above.

  • Using the Transperth dataset from April 30th, the script runs for 6 minutes on a 2017 Linux/Ubuntu system, and for 14 minutes on a 2013 macOS/Catalina system - so grab that cup of coffee. You only need to run it once, and then keep the timetable files it generates.

  • The script produces a total of 95 timetable files, one file for each station. A station is a physical location where bus and train trips start and stop, or just pass through. A station is an 'umbrella' for at least one stop - for example, a typical train station may have 2 train platforms, and 2 bus stops. A number of stations have just one stop, Elizabeth Quay Bus Station has 130 stops. This project only manages stations, such as physically large bus and train stations and shopping centres, with each just considered as a single 'big' stop. Because the generated files do not include every bus stop on every road, even if visited by many well-known bus routes, you may not find your familiar routes in the generated files.

  • The Transperth data includes all trips, scheduled 7 days per week, with different schedules each day (and accounting for public holidays). To significantly simplify things the buildtransperthtimetables.sh script only gathers data for Wednesdays, and we assume that every day uses the same (Wednesday) schedule.

  • The first line of every timetable file has 3 comma-separated fields - the station's name, and its latitude and longitude (not really required). Each following line has five comma-separated fields - the departure time of each bus or train, the number (or name) of each bus (or train), a description of the stop from where the bus leaves, the arrival time at the next (neighbouring) station, the name of the next (neighbouring) station. This file format is identical to the  tt-East_Station  example, above. One stop has just 1 departing trip each day (a 100byte file), two stops in St Georges Terrace have 945 (90KB files).

  • The Transperth network has 95 connected stations in the Perth Metropolitan area (used on Wednesdays). It should be possible execute the whole network on a single laptop computer (with station servers written in C or Python), but that would be a nightmare to debug and manage. You should approach the project using connected networks of increasing size: 1 station, 2 stations, 5 stations, 10 stations, 20 stations,....
    Running more than 10 Java virtual machines on a standard single computer will probably not be possible.

 


Good luck,

Chris McDonald
updated 7th May 2020.

This Page

Written by: [email protected]