The University of Western Australia
Computer Science and Software Engineering
 
 

Department of Computer Science and Software Engineering

CITS4407 Open Source Tools and Scripting

Assignment 1 - due 9am Monday 20th April

Clarifications.

The goal of this assignment is to assess your understanding of the use of the shell and open-source tools to effectively filter and process textfiles and streams of text. You will be assessed on the clarity and quality of your shellscripts to examine and report on the data. While the efficiency of your shellscript will not be assessed, you should take care to avoid any excessive slow practices.

  • The assignment contributes 20% towards your final mark in CITS4407 this semester. It is anticipated that this assignment will require 10 hours to complete.

  • The deadline for this assignment is 9am Monday 20th April.

  • It's considered good practice to include comments in your shellscripts, to explain your design and logic. Include your name and student number in a comment near the top of each of your shellscripts, and (many) comments describing your approach (if not trivial).

  • The assignment is individual work. You may discuss general ideas with other students, but you may not share parts of your developed solutions. You may use material found in books or tutorials (either physical or online) but must cite the sources of such material.

  • Submit your assignment as either one or more files (in a single Zip archive, or as individual files), using cssubmit.
    DO NOT submit your scripts as Microsoft Word files.

  • The assignments will be marked on a Linux (Ubuntu) computer, using the standard bash shell. Add a comment to your scripts if you did not develop your work on Linux (either native or using Windows WSL), indicating, for example, that you used Apple's macOS.

  • Your solutions will be assessed on both their design/approach and their correctness. The precise format of any output is not specified, though ensure that it is clear and unambiguous.


The tasks ( /20 marks )

  1. [4 marks]
    The standard date command may be used to report dates and times in a number of formats. By default, it reports the current date and time.

    Write a shellscript which accepts two command-line arguments each of the form DD/MM/YYYY, such as 25/12/2020, and reports the number of days between the two dates. Successive dates are considered to be one day apart. A helper.

    shell> ./between 02/04/2020 20/04/2020 18 shell> ./between 20/04/2020 02/04/2020 18

  2. [4 marks]
    In Australia, a game of Saturday Lotto involves drawing eight numbered balls from a transparent barrel. Balls are uniquely numbered 1 through 45. The first six numbers drawn are termed the primary numbers and the final two are termed supplementary numbers. A standard Lotto ticket contains six unique numbers, and will win the main prize if they are the (six) primary numbers.

    Using bash's $RANDOM environment variable, write a shellscript to print the numbers on a valid Saturday Lotto ticket in numeric order. For example:

    shell> ./mylotto 2 3 8 24 39 40

  3. [6 marks]
    Consider a plain textfile holding just the words of a book. The text is written normally using sentences, and blank lines separate paragraphs. Within the text some words and phrases appear between angle-brackets, such as:
    <Linux> is a popular operating system
    and Companies use <open source> software.

    The purpose of identifying specific words and phrases this way, is so that we may generate an index for the book. The textfile does not contain the index itself, but a shellscript will generate the index. At most one bracketed word or phrase appears on each line, but each bracketed word or phrase may appear multiple times in the file.

    Develop a shellscript to generate a book's index. The name of the book's textfile is passed as a command-line argument to the script, and the script's output will consist of the index items, and the line-number(s) on which they appear.

    A sample execution and its (possible) output appears below. Simply create your own textfile(s) for testing, but no need to submit them.

    shell> ./bookindex modern-computing.txt ..... linux: 12 130 open source: 44 .....

  4. [6 marks]
    Consider the Titanic Dataset Observations which provides the manifest and fate of passengers aboard the Titanic cruise ship in 1912. The dataset is hosted on www.kaggle.com, a repository of thousands of datasets, supported by Google. To download the dataset (using your browser), you will first need to sign up for a free Kaggle membership (I've never received any spam from them), which you can immediately cancel after downloading the dataset.

    The second field of the dataset reports the fate of each passenger after the shipwreck - 0 signifies that they did not survive. We will make the (unrealistic) assumption that all passengers with the same surname were from the same family, and we'll define a family to comprise two or more people.

    Develop a shellscript which reports the surnames of the whole families that survived the shipwreck.


Please post requests for clarification about any aspect of the assignment to help4407, so that all students may remain equally informed.

 

Good luck.

Chris McDonald
April 2020.

This Page

Written by: [email protected]