The filename has two extensions and, for their meaning,
we read them from right to left.
The first (righmost) is .zip, signifying that the file has been
compressed by the
zip command which appends the .zip extension.
The second extension .csv is an acronym for Comma Separated Values,
a common textfile format produced by Microsoft Excel and,
as we shall see,
many other applications.
So, our data is in a textfile that has been compressed by the
Without any options,
wc will report line, word, and character counts.
-l option, only the line count is reported.
Fields are delimited by the comma character (remember 'csv').
We can use the
cut command to 'break' the file into its fields.
The default field delimiter for
cut is a tab,
so we need to override the default and specify the comma.
We also only require the 2nd field.
Consider a list of anything. The easiest way to list each distinct item and eliminate duplicates is to first sort the items - then all identical items will appear consecutively. It's now easy to report the distinct items, and immediately remove any repeats.
We can use the
sort and the
in combination to perform our task,
first by using a temporary file and input and output file redirection,
remembering to remove the temporary files:
Better still, we can avoid the use of those temporary files by directly connecting the output of each command to the input of the next command. We use a sequence of communication pipes to build a command pipeline. We visualise the data flowing from left-to-right between the commands, with typically less data flowing through each successive pipe.
The command sequence
... | sort | uniq is so common,
that the actions of
uniq have been 'built in' to
We don't wish the 3 words of our required service-station name
to be interpreted (by the shell) as 3 distinct command arguments.
We can keep all words of the name 'together' by enclosing them in single-quotes.
grep (standing for global regular expression print!
will find all lines matching the pattern given as its first argument.
grep has found all matching lines,
we pass its output to
cut to extract just the 5th field (the prices).
In the previous exercise we 'threw away' too much data by only reporting the prices - we need the fuel type (PULP) as well:
Getting closer; now we need to sort the output by price.
We also need to treat the (now) 2nd field as numeric, not just a string
(else '101' comes before '13').
sort that the comma is our field-separator,
to use the 2nd field as the sort key,
and to sort numerically.
There's the lowest price on the first line. We could finally extract it with:
Phew, fantastic! But we should really re-read the question:
There's no single correct answer to this exercise, but let's find the hotter month by examining minimum (field 3) and maximum (field 4) temperatures. Note that we're only interested in the lines providing data, and that they all include dates in a regular format. We'll ignore the fact that February 2020 had one extra day!
No-one likes hot nights; perhaps we could add together all minimum temperatures across the month. Unfortunately, there's no well-known command to add a column of numbers, so let's search the web for bash add column of numbers.
Many solutions employ the
which we'll investigate later in the unit,
provides a solution employing an uncommon command sequence (and new to me!)
So the sum of February 2020's minimums is 87 degrees more than 2019, even allowing for its extra day!