CITS2002 Systems Programming  
 

Unit home

Project 1

help2002

Lecture & Workshop
recordings on LMS

Schedule

FAQ

Unit outline

C textbooks

OS textbooks

Information resources


Extra reading

Past projects

Recent feedback


Working effectively

Look after yourself!

Project 2 2020

The goal of this project is to write a command-line utility program in C99, named mergetars, which merges the contents of multiple tar archive files into a single tar archive file.
 
Successful completion of the project will develop your understanding of advanced features of the C99 programming language, and many useful Linux operating system system-calls and POSIX function calls.

Project Description

thanks to media.sciencephoto.com
A medium-sized business has decided to migrate its files to cloud-based storage, requiring it to first identify all files to migrate. A critical disk failure at the worst possible time now requires all files to be recovered from recent backups. However, the business' IT wizard has recently left for a lucrative position at a cloud-based storage company.

Management has located the backups, but they have been poorly labeled, making it impossible to easily identify what is contained in each backup and when each was made. The decision has been made to migrate just the latest copy of each file to the cloud, which will require an 'intelligent merging' of the backups' contents.

The backups have been made using the widely available tar command, a well-defined file format whose name is a contraction of tape archive, reflecting the backup media with which the command was first used. While the tar command supports many actions to create, list, extract, and append tar archive files, it offers no support to merge archives together.

The business has located many backups each holding thousands of files. The task to identify all duplicate files, and to find the most recent version of similar files, is too large to be performed manually, and your team has been contracted to develop a new command-line utility program to intelligently merge all of the backups' contents into a single (large) tar archive.

Program invocation

The purpose of your mergetars command-line utility is to merge the contents of multiple tar archive files into a single tar archive. The program receives the name of one or more input filenames, and a single output filename (if only a single input filename is provided, then mergetars will act like a simple file-copying program, although there is no requirement to check for this special case). A typical program invocation is:

prompt> ./mergetars input_tarfile1 [input_tarfile2 ...] output_tarfile

Filenames will always end with the suffix  .tar   – indicating that the archive does not involve any compression – or with the suffix  .tar.gz  or  .tgz  – indicating that the archive is (or will be) compressed using the GZIP compression algorithm. The standard tar utility supports these cases using its  -z  command-line option. There is no requirement for mergetars to support any other compression schemes.

The merging criteria

The inputs are merged to form the output according to the following definitions and rules:

  • Two or more files from different archives are considered the same file if they have the same relative pathname (including the name of the directory holding the file and the filename itself).

    If from different archives, the files "project2/src/mergetars.c" and "project2/src/mergetars.c" are considered the same file.
    In contrast, "monday/project2/src/mergetars.c" and "wednesday/project2/src/mergetars.c" are considered different files.

  • Two or more files from different archives with the same relative pathname, are considered different versions of the same file. The output archive should contain just the latest of all such versions.
    If two or more files have the same modification time, then the largest of these should be copied. If two or more files have the same modification time and size, the file from the latest tarfile (on the command-line) should be copied.

  • All other files with different relative pathnames are considered different files. The output archive should contain one copy of each different file.

  • Suggested approach

    The project can be completed by following these recommended (but not required) steps:

    1. Learn how to use the standard tar utility from the command-line.
    2. Create two or more 'similar' tar archives, and determine (by hand) which files should appear in a merged archive.
    3. Create 'skeleton' C source files containing 'empty' functions for each distinct responsibility. Create and test a Makefile to compile and link the files.
    4. Then, check the program's command-line arguments.
    5. Use your system's standard tar utility (called from your C99 code) to expand each input tar archive into a new directory.
    6. Identify and copy (to a new directory structure) all files that should be uniquely added to the output tar archive. Remember to set the modification time of each file, appropriately.
    7. Use your system's standard tar utility to create the new output tar archive.
    8. Cleanup up before exiting, removing any temporary files and directories that you have created.

    It is anticipated (though not required) that a successful project will use (some of) the following system-calls, and standard C99 & POSIX functions: 

    • perror()exit()
    • mkdtemp()mkdir()opendir()readdir()stat()closedir()
    • fork()execl()wait()
    • open()read()write()close(), utimes()
    • strcpy()strcmp()strdup()
    • malloc()calloc()realloc(), and  free()


    Project requirements

    1. Your project must be developed in multiple source files and must be compiled and linked using a Makefile, containing approriate variable definitions and automatic variables.

    2. The default target of your Makefile must be named mergetars, and its execution must produce a program named mergetars.

    3. Your program must 'clean up after itself'. If your program creates any temporary files or directories, then these must all be removed before your program exits. Your program does not have to free all of its dynamically allocated memory before it exits.

    4. Your project must employ sound programming practices, including the use of meaningful comments, well chosen identifier names; appropriate choice of basic data-structures, data-types, and functions; and appropriate choice of control-flow constructs.


    Good luck!

    Chris McDonald.

The University of Western Australia

Computer Science and Software Engineering

CRICOS Code: 00126G
Presented by [email protected]