Home > Undergraduate > CITS4407 Open Source Tools and Scripting >    Labs

  CITS4407 OPEN SOURCE TOOLS AND SCRIPTING
 
 

Lab 7: Regex, grep and sed

Questions

  1. Use grep (and wc) to answer the following questions about Alice_in_Wonderland.txt
    1. How many times does the word rabbit appear in Alice_in_Wonderland.txt? What about Rabbit?
    2. How could you search for both rabbit and Rabbit at once?
    3. How many times does the word Alice appear in Alice_in_Wonderland.txt? Note that it sometimes occurs more than once on the same line. You might want to look at grep -o.
    4. How many lines do not contain the word Caterpillar or caterpillar?
  2. Solve the following problems about arcade.csv with grep:
    1. Print all lines in arcade.csv where the team is GREEN
    2. What arcade score did Molly get?
    3. Print all lines in arcade.csv where the machine is b and the score began with the number 4
  3. You've been sent a list of university enrolment data (australian-universities.csv) but the data is messy. Write a sed script to do the following:
    • Remove all lines that do not contain the word University (case insensitive, so you should also keep UNIversity etc)
    • Remove all lines that contain letters after the name field
    • Replace all full stops (.) with commas (,)
    • Remove all trailing commas
  4. The following questions use sed to operate on json files in /lab/week8/aces
    1. Write a sed command to convert the json field name "seat" to "crew".
    2. Test your sed command on heroes.json. Pay particular attention to Mace Windu's caption. Adjust your sed command to only affect the field name and not his caption.
    3. villains.json was written in a rush and contains invalid json. There should be a comma at the end of every field in an object except the last one. Fortunately for us, the "keywords" field is always the last field in an object, so we can make use of that when adding the missing commas. Write a sed command to add a comma at the end of every line, except lines ending in one of the following characters:
      , { } ] [
      This is a challenging regex. Start by adding a comma on every line and then gradually ignore lines ending with the above characters, one by one. ] and [ are particularly tricky.
    4. villains.json is also missing quotes around some field names. Write a sed command to add quotes around any field names which do not have them. For example:
      name: "Grand Moff Tarkin"
      
      should be:
      "name": "Grand Moff Tarkin"
      
      Note: this is a challenging regex. You will probably need to use sed in extended mode with -r so that you can use the + operator (match one or more occurences of a character). Note that in extended mode, capturing groups do not need to be escaped (so use ( ) instead of \( \)).

Bonus

  1. See how many levels you can beat at https://alf.nu/RegexGolf. In particular, try and beat "It never ends" - you will need to use a negative lookahead.
  2. What regex would you write to save the day in this situation?
    Regular Expressions
  3. vim supports regex-style search with / and sed-style replacement with s. For example: /foo$ matches any line ending in "foo", and :%s/foo/bar/g replaces any occurrence of "foo" with "bar". Repeat this week's lab exercises using vim's search and replace commands. You will need to double check on the exact syntax for escaping brackets and using capturing groups, as it may vary slightly from the syntax used by sed.
  4. Somewhere in /lab/week6/mess is a diff file describing some changes to Alice_in_Wonderland.txt. You don't know what this file is called (it has a silly name). Play with grep to work out what sort of strings are common in all diff files. Then, use grep to find the diff file. Hint: you can use grep -- PATTERN to prevent grep from treating your pattern as an argument to grep.


Department of Computer Science & Software Engineering
The University of Western Australia
Last modified: 8 February 2022
Modified By: Daniel Smith

UWA