The University of Western Australia
Computer Science and Software Engineering

Department of Computer Science and Software Engineering

CITS4407 Open Source Tools and Scripting

Assignment 2 - due 5pm Friday 29th May (week 12)


The goal of this assignment is to assess your understanding of the use of the shell and open-source tools to effectively filter and process textfiles and streams of text. You will be assessed on the clarity and quality of your shellscripts to examine and report on the data. While the efficiency of your shellscript will not be assessed, you should take care to avoid any excessive slow practices.

  • The assignment contributes 30% towards your final mark in CITS4407 this semester. It is anticipated that this assignment will require 20 hours to complete.

  • The deadline for this assignment is 5pm Friday 29th May (week 12).

  • It's considered good practice to include comments in your shellscripts, to explain your design and logic. Include your name and student number in a comment near the top of each of your shellscripts, and (many) comments describing your approach (if not trivial).

  • The assignment is individual work. You may discuss general ideas with other students, but you may not share parts of your developed solutions. You may use material found in books or tutorials (either physical or online) but must cite the sources of such material.

  • Submit your assignment as either one or more files (in a single Zip archive, or as individual files), using cssubmit.
    DO NOT submit your scripts as Microsoft Word files.

  • The assignments will be marked on a Linux (Ubuntu) computer, using the standard bash shell. Add a comment to your scripts if you did not develop your work on Linux (either native or using Windows WSL), indicating, for example, that you used Apple's macOS.

  • Your solutions will be assessed on both their design/approach and their correctness. The precise format of any output is not specified, though ensure that it is clear and unambiguous.

The tasks ( /30 marks )

  1. [6 marks]
    The ZIP file contains a number of (text) files containing a program written in the C programming language (there's no need to understand C for this task). The program, when compiled using the provided Makefile, calculates the correlation-coefficient and line of best fit, for 2-column data (sample data in the file RESULTS). Download the ZIP file, and expand it into your working directory.

    A helpful programming practice is to place comments in source code files to indicate the whole program's version number and, possibly, the release date. You'll notice that the provided source code files all contain the C comment:

    // calcmarks, version 1, released Fri May 8 11:20:00 AWST 2020

    For this task you are required to write a shellscript, named updateversion, which updates the comment found in each C source code file so that it (now) contains the next version number, and the current date/time as the next release date. For example, after successfully running your shellscript, each C source file will contain a comment, such as:

    // calcmarks, version 2, released Fri May 8 13:44:19 AWST 2020

    Do not assume that the project (collection of source files) will always be called calcmarks (it won't!). Instead, your shellscript should receive the project's name as a command-line parameter.

    Your shellscript should also detect and report the problem of any of the files having different versions numbers to any others.

    Finally, add a new target to the project's Makefile to update the version number in the C source files and in the Makefile, itself. Be very careful with this last part (i.e. while testing, keep a backup copy of your Makefile in case you 'destroy' it).

  2. [6 marks]
    The Department of Computer Science and Software Engineering runs its own small web-server, named, to support teaching-related applications. As with most web-servers, each request is logged, one request per line, and each request's fields include: the requesting IP address, data and time of the request, the URL requested, the web-server's integer return code (indicating success or error), and the number of bytes transferred.

    The text file secure-access.log-20200510 (9MB, 93000 lines) provides the access logfile for a recent and typical week of activity.
    Don't forget, you can select smaller datasets (subsets) by using head and tail.

    This task asks you to develop at least TWO distinct graphical representations of the data in the logfile. Each representation must employ a different visualisation (chart) type. Only one representation may be a 'simple' one, such as a histogram showing the distribution of bytes delivered. The other visualisations should present some more insightful information, such as any URLs that are 'trending' across the week, or more meaningful descriptions of the locations from which requests are made.

    For this first task, you'll probably find it easiest to develop TWO distinct shellscripts, or TWO distinct shell functions in one shellscript, producing TWO distinct plots. Each shellscript, or shell function, should produce its own plot, which you may produce in TWO, or in just one, HTML webpage.

  3. [18 marks]
    Perth's Public Transport Authority (PTA) provides public access to its scheduled times, stop locations, and route information from its webpage You may download your own copy of the data (about 90MB when uncompressed) by clicking on the first link "By downloading the data you are agreeing to the terms of the License..."

    The data is released as a collection of inter-related textfiles following the Google Transit Feed Specification (GTFS), which is also used by many other public transport companies, worldwide.

    Perth has a very good suburban train service. Unfortunately it is not very extensive and, if you need to a reach a destination via train, you often need to first catch a bus (or walk) to a train station. Perth also has a very attractive tourist destination, Rottnest Island. Unfortunately you cannot reach Rottnest Island by train, but you can travel to the last station on the Fremantle Train Line (Stop No: 99352), which is right next to the Rottnest Island B-shed ferry terminal! Perfect.

    So, if you have an urge to visit Rottnest Island, and you live less than a kilometre or twenty minutes walk from a train station, you will walk from your current location to the nearest train station, and catch a train toward the ferry terminal. If you're not close to the Fremantle Train Line, you may first need to catch another train to Perth Station (Stop No: 99007) or Perth Underground Station (Stop number 99601) and then catch a Fremantle Line train from Perth Station.
    See the Clarifications.

    This task asks you to write a shellscript accepting two command-line arguments representing the latitude and longitude of your current location. Using the Google Transit Feed Specification (GTFS) data, your shellscript should first determine if your location is within one kilometre of a train station, and then determine the sequence of times and train stations required to get you to the ferry terminal. You're ready to leave at the time you run the shellscript!

    The output of the shellscript will be an HTML (text) webpage, reporting whether your impulsive dash to Rottnest is possible, and the instructions/directions to get you there. Buses given you motion-sickness, so you can only travel by a combination of walking and train.

    Be warned that the last ferry to Rottnest Island leaves at 15:30pm, so you'll need to ensure that you can catch it!

    Embed a Google Map into your webpage, showing the location and times of your starting location, and the train stations where you get on and off any trains.

    Have your script simply print out your starting location, the departure time from your starting location, and the train stations where you get on and off any trains; any clear format is suitable.

    To calculate the distance, in metres, between a pair of latitude/longitude coordinates, you'll need to employ the haversine formula [Wikipedia].
    You may wish to perform the calculation by invoking a single program or, if using AWK, by calling an AWK function. Here's the code for each:

    • haversine.c (which will require compiling - see comments in file), and
    • haversine.awk (which should be embedded in a larger AWK script).

Please post requests for clarification about any aspect of the assignment to help4407, so that all students may remain equally informed.


Good luck.

Chris McDonald
May 2020.

This Page

Written by: [email protected]