The University of Western Australia
Computer Science and Software Engineering
 
 

Department of Computer Science and Software Engineering

CITS4407 Open Source Tools and Scripting

Assignment 1 - Sample solutions and discussion

The goal of these tasks was to develop simple solutions using common commands and shell features, well supported by comments explaining their design.

Each of these sample solutions only employ shell features, such as the control-flow sequences using if and while, and common commands such as grep, cut, expr, sort, and wc, seen by the time the assignment was released. Each sample solution is a shellscript in a single textfile, which first ensures that it was invoked with the correct number of arguments, and either reports an error message or employs a default input file.

Of course, there are many different ways to complete each task, and more efficient solutions (often invoking fewer commands), can be developed with more advanced features (such as regular expressions) presented later in our unit.

The tasks ( /20 marks )

  1. [4 marks]
    The standard date command may be used to report dates and times in a number of formats. By default, it reports the current date and time.

    Write a shellscript which accepts two command-line arguments each of the form DD/MM/YYYY, such as 25/12/2020, and reports the number of days between the two dates. Successive dates are considered to be one day apart.

    This was a relatively easy task, requiring some quite common utilities, such as cut, date, and expr, and shell control-flow with if and while. The biggest challenge in this task was supporting dates in different years; we needed to sum the number days until the end of the first year, the number days in each intervening year (which may be zero), and the number days until the second date. Fortunately we can use date to also account for leap years. You were only required to develop a solution for one operating system platform, but we note that the command-line options for date are different on Linux and macOS. We can support the difference by using a shell function to check the operating system being used, and then invoking date with different options.
    between.sh :

    #!/bin/bash # ENSURE THAT SCRIPT WAS INVOKED WITH 2 DATES if [ $# != '2' ]; then echo "Usage: $0 date1 date2" exit 1 fi # date ON LINUX AND macOS SUPPORTS DIFFERENT COMMAND-LINE OPTIONS function dayofyear() { if [ `uname` == "Linux" ]; then date -d @`date -d "$1" '+%s'` '+%j' else date -j -f "%m/%d/%Y" "$1" "+%j" fi } # DETERMINE THE YEAR AND THE DAY-OF-THE-YEAR OF EACH DATE D1=`echo $1 | cut -c1-2` M1=`echo $1 | cut -c4-5` Y1=`echo $1 | cut -c7-10` J1=`dayofyear "$M1/$D1/$Y1"` D2=`echo $2 | cut -c1-2` M2=`echo $2 | cut -c4-5` Y2=`echo $2 | cut -c7-10` J2=`dayofyear "$M2/$D2/$Y2"` # ARE THE DATES IN THE SAME YEAR? if [ "$Y1" == "$Y2" ]; then # ENSURE THAT WE CALCULATE THE POSITIVE DIFFERENCE (ABSOLUTE VALUE) if [[ "$J1" < "$J2" ]]; then diff=`expr $J2 - $J1` else diff=`expr $J1 - $J2` fi # DATES ARE IN DIFFERENT YEARS else # ENSURE THAT Y1 IS LESS THAT Y2 (POSSIBLY SWAP THEM AROUND) if [[ "$Y1" > "$Y2" ]]; then t=$Y1 ; Y1=$Y2 ; Y2=$t t=$J1 ; J1=$J2 ; J2=$t fi # CALCULATE DAYS FROM DATE1 UNTIL END OF YEAR1 daysinY1=`dayofyear "12/31/$Y1"` diff1=`expr $daysinY1 - $J1` echo "days until end of $Y1 = $diff1" # SUM THE NUMBER OF DAYS OF INTERVENING YEARS y=`expr $Y1 + 1` middle=0 while [[ $y < $Y2 ]]; do daysinyear=`dayofyear "12/31/$y"` middle=`expr $middle + $daysinyear` echo " $y has $daysinyear days" y=`expr $y + 1` done # CALCULATE DAYS FROM BEGINNING OF YEAR2 UNTIL DATE2 # firstdayofY2=`dayofyear "01/01/$Y2"` firstdayofY2=1 diff2=`expr $J2 - $firstdayofY2` echo "days from start of $Y2 = $diff2" diff=`expr $diff1 + $middle + $diff2` fi # REPORT THE DIFFERENCE BETWEEN DATES echo $diff

  2. [4 marks]
    In Australia, a game of Saturday Lotto involves drawing eight numbered balls from a transparent barrel. Balls are uniquely numbered 1 through 45. The first six numbers drawn are termed the primary numbers and the final two are termed supplementary numbers. A standard Lotto ticket contains six unique numbers, and will win the main prize if they are the (six) primary numbers.

    This (too easy?) task required the use of shell arithmetic to generate 6 random numbers, each between 1 and 45, ensuring that each random number is unique. We continue the random generation process until we have 6 unique value. There are, of course, several ways to implement this; here we employ shell control-flow using if and while and the common commands sort, wc, and grep. We also employ a temporary file, so we must remember to remove that when finished.
    mylotto.sh :

    #!/bin/bash # SET THE LIMITS OF OUR PROBLEM NNUMBERS=6 MAXVALUE=45 # DEFINE THE NAME OF A TEMPORARY FILE THAT WE'LL USE CHOSEN="tmp-chosen" # CONTINUE UNTIL WE HAVE A VALID SELECTION OF NUMBERS while true; do i=0 while [[ $i < $NNUMBERS ]] ; do (( n = $RANDOM % $MAXVALUE + 1 )) ; echo $n (( i = $i + 1 )) done > $CHOSEN # ENSURE THAT WE HAVE EXACTLY THE REQUIRED NUMBER OF (UNIQUE) NUMBERS if sort -u < $CHOSEN | wc -l | grep -q "$NNUMBERS\$" then break fi done # REPORT THE CHOSEN NUMBERS, IN SORTED ORDER sort -n < $CHOSEN rm -f $CHOSEN

  3. [6 marks]
    Consider a plain textfile holding just the words of a book. The text is written normally using sentences, and blank lines separate paragraphs.
    .....
    Develop a shellscript to generate a book's index. The name of the book's textfile is passed as a command-line argument to the script, and the script's output will consist of the index items, and the line-number(s) on which they appear.

    This task was made much easier because each line could only contain a single term to be indexed. Even though the input text is unstructured, we can use the < and > delimiters as if they separate fields, and keep the characters between them - the index terms. Each term is converted to lowercase, and all terms sorted alphabetically. Because the terms can be re-found on lines with (obviously) ascending line numbers, we don't need to sort the line numbers for each term. We employ temporary files, and must remove them before we exit.
    bookindex.sh :

    #!/bin/bash # ENSURE THAT SCRIPT WAS INVOKED WITH 1 FILENAME if [ $# != '1' ] then echo "Usage: $0 filename" exit 1 fi # DEFINE THE NAMES OF TEMPORARY FILES THAT WE'LL USE TERMS="tmp-terms" LINES="tmp-lines" # FIND ALL LINES THAT HAVE BOTH THE < AND > DELIMITERS grep '<' < $1 | grep '>' | \ # EXTRACT THE TERMS BETWEEN THE DELIMITERS cut '-d<' -f2 | cut '-d>' -f1 | \ # CONVERT ALL TERMS TO LOWERCASE, STORING ONE COPY OF EACH tr A-Z a-z | sort -u > $TERMS # READ EACH TERM FROM THE FILE while read eachterm do # FIND THE LINES CONTAINING EACH TERM, KEEP THE LINE NUMBERS grep -n "<$eachterm>" < $1 | cut -d: -f1 > $LINES # DISPLAY EACH TERM AND ITS LINE NUMBERS echo "$eachterm:" `cat $LINES` done < $TERMS # REMOVE OUR TEMPORARY FILES rm -f $TERMS $LINES

  4. [6 marks]
    Consider the Titanic Dataset Observations which provides the manifest and fate of passengers aboard the Titanic cruise ship in 1912.
    ....
    Develop a shellscript which reports the surnames of the whole families that survived the shipwreck.

    While the input file's field delimiter was a comma, it also appeared within the field storing people's fullname. This standard technique is employed by Excel, and is frequently seen in many CSV datafiles. We can negotiate this by first treating the double-quote character as the field delimiter to extract fullnames, and then using the comma to separate surnames from first names. We employ a temporary file, and must remove it before we exit.
    survivors.sh :

    #!/bin/bash # CHECK IF SCRIPT WAS INVOKED WITH A FILENAME, OR USE A DEFAULT NAME if [ $# == '0' ] then inputfile="titanicdata.csv" else inputfile="$1" fi # DEFINE THE NAMES OF TEMPORARY FILES THAT WE'LL USE SURNAMES="tmp-surnames" # Input file format: # PassengerId,Survived,Pclass,Name,Sex,Age,.... # FIND THE SURNAMES (BETWEEN DOUBLE-QUOTES) FROM FIELD-4 cut '-d"' -f2 < $inputfile | \ # KEEP JUST THE SURNAME (BEFORE THE COMMA) cut -d, -f1 | \ # COUNT HOW MANY TIMES EACH SURNAME APPEARS sort | uniq -c | \ # REMOVE SURNAMES THAT ONLY APPEAR ONCE grep -v " 1" > $SURNAMES # FOREACH SURNAME FOUND... while read count surname do # DETERMINE IF ANY WITH THIS SURNAME DID NOT SURVIVE (FIELD-2 is 0) if grep "\"$surname," < $inputfile | cut -d, -f2 | grep -q 0 then continue # some have not survived else echo "$surname - all survived" fi done < $SURNAMES # REMOVE OUR TEMPORARY FILES rm -f $SURNAMES

This Page

Written by: [email protected]