Use the arrow keys to move to the last character one the last line.
Use the arrow keys to move to the word 'philosophy'.
Delete the word and its following comma.
Now save the modified file to disk, but do not exit or quit.
Use a cursor navigation method other than the arrow keys to move to the word 'pipes'.
Before 'pipes' insert the word 'communication'.
Now quit the editor without saving your changes.
You should now be back at the shell prompt.
[Refresher] Given a file of simple plain text,
develop a command sequence to uniquely list all words found in the
file (list each unique word just once).
We'll define a "word" to be any sequence of one-or-more alphabetic characters,
the words "Hello" and "hello" are distinct words,
and all other non-alphabetic characters should be ignored.
The desired output is simply the list of words (each listed only once) in the
file. For the given file, the output would begin with:
ⓘ Helpful command for this exercise: tr.
For this exercise, you're asked to modify your previous solution,
and develop a (very) rudimentary spelling checker.
Firstly, we'll need a "dictionary" of valid words.
On Linux platforms, the file
provides a collection of words collated from (many old) newspaper articles.
There's also a copy here:
On macOS, the file
provides a collection of words found in the 1934 edition of
Webster's International Dictionary.
View the dictionary on your system using less.
For this exercise,
let's more rigorously define a "word" to be three or more lowercase characters.
Now, with reference to the dictionary on your system,
develop a command or shellscript that finds the words
in the textfile (from the first exercise)
that do not appear in the dictionary - potential spelling errors.
ⓘ Helpful commands for this exercise: tr, comm, sort.
create and a new textfile using vi,
and write a shellscript to execute the command sequence from the
You'll want a simple
vi Editor "Cheat Sheet"
When finished creating your shellscript,
exit the vi editor,
make your shellscript executable,
and test that it works.
Now, extend the previous exercise so that your shellscript receives a
command-line argument informing it which textfile it should spellcheck.
Inside your shellscript (file),
you can access the provided command-line argument using the value of $1.
Here's a few more textfiles:
🌶 When sorting a textfile
containing both a header-line and data in multiple columns,
we must be careful to not sort the header-line, too,
else the header-line may end up in the "middle" of the lines of output.
Consider the textfile
If we just sort it by either its 1st field (state name),
then the initial header line will be incorrectly positioned in the
middle of the output.
Write a shellscript named sorttable to sort the textfile,
by the number of its international students,
while keeping the header-line at the top of the output.
ⓘ Helpful commands for this exercise: sort, head, tail.
Additional exercises involving filtering text data
A few students have asked for some additional exercises in filtering plain-text
data (similar to Exercise sheet 2).
ⓘ Helpful commands for these exercises:
cut, sort, uniq, grep, head, tail, wc.
[file required: UWA-ENROLMENTS.tsv]
The file UWA-ENROLMENTS is a 42,000 line textfile.
The first column provides (randomized) UWA student numbers,
and the second column presents the units in which they were enrolled
(data is not from this year).
How many distinct enrolments (lines) are there in the file?
How many distinct students are there in the file?
How many distinct units are there in the file?
How many distinct teaching periods (similar to semesters) are there?
How many (CITS) units are presented by
Computer Science and Software Engineering?
Which units(s) have the largest enrolment?
🌶Which student(s) are taking the most units this year?
🌶🌶Which units are offered in more than one teaching period?
The shorter file wificapture-1.txt
provides details of 10,000 captured wireless Ethernet (WiFi) frames,
as does the
longer file wificapture-2.txt with 230,000 frames.
The contents of each frame have then been formatted to a textfile,
providing details of each frame (one frame per line).
Note that only the frame's header is captured,
and none of its data-payload (which is likely encrypted, anyway).
Thus, the only privacy concerns exposed by this data
include which device was communicating with which other device,
and at what time.
No personal or private data is exposed.
The tab-SEPARATED fields of each line (frame) are:
time-of-day (in seconds and microseconds),
the transmitting device's distinct MAC (Media Access Control) address,
the receiving device's distinct MAC address,
the source device's distinct MAC address,
the destination device's distinct MAC address,
the length (in bytes) of the frame,
the signal strength with which the frame was received,
a short English description of the frame.
Different frame types will appear to have different numbers of fields.
Actually, all fields are present,
and an 'empty' field will be represented by 2 TAB chartacters in a row.
For the following exercises,
we are only interested in the source and the destination MAC addresses,
which will always be present.
While processing each frame (line)
ignore all MAC addresses of the form ff:ff:ff:ff:ff:ff -
the special broadcast address for
frames transmitted to any device that can hear it.
Develop a number of short command sequences to find:
The single source device sending traffic most frequently,
🌶The single source device sending the greatest volume of traffic,
🌶🌶The 5 pairs of source and destination devices which collectively
(considered pairwise) send the highest number of frames.