The University of Western Australia
School of Computer Science and Software Engineering
 
 

School of Computer Science and Software Engineering

CITS2200 Data Structures and Algorithms Home

Algorithm Quotes

"The Feynman Problem Solving Algorithm: (1) write down the problem; (2) think very hard; (3) write down the answer." Richard Feynman

"Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better." Edsger Dijkstra

"Beware of bugs in the above code. I have only proved it correct, not tried it." Donald Knuth

"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.", Tony Hoare

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.", Donald Knuth

Project

Due: week 13, Wednesday, May 27, 2015, 5pm

Your task is to implement lossless data compression/decompression algorithms in java and to analyse their performance. Compression and decompression algorithms are used frequently in applications such as zip, gzip, winzip, compress, and well as in computer networks and various data formats.

This assignment can be done in pairs, although you may work on your own if you prefer.

This assignment is worth 20% of your final grade, and consists of:

  1. 10% - Java classes and source files, to be submitted to cssubmit
  2. 10% - Written report, to be submitted at the front office.

Your implementation should consist of a set of java classes including one the implements the compressor interface in the CITS2200 package. This class accepts an InputStream and OutputStream as parameters and will write a compressed version of the input stream to the output stream.

A basic example is available. The class RLE is an implementation of a very basic compression algorithm called Run-length Encoding, which hardly ever works (unless you combine it with the Burrows Wheeler Transform, or something similar). This should give you an idea how to handle IO, command line parameters etc. There is also an inner class called WriteBuffer, which can be used to buffer bits (so if you only want to write 3 bits at a time, it will store them until you have at least 8 bits to write, and then output a byte). This code has not been well tested, so please feel free to ask questions, or make suggestions on the discussion forum.

Your program should be able to be run from the command line, accepting input and output files as paramaters. For example,

java Compress -c file.txt file.z

may run your compression program on file.txt to produce file.z, and

java Compress -d file.z file2.txt

may run your decompression algorithm on file.z to output the file file2.txt which is identical to file.txt. Your program should also be designed to give good compression rates on common file formats, including text files, class files and pdfs.

Compression Algorithms

There is a research component to this project. You are required to investigateexisting lossless compression algorithms and choose one, or more to implement. In the workshops we will discuss the Huffman Compression algorithm, and the these notes cover Huffman and LZ algorithms.

Other potential algorithms include run-length encoding, range coding, and the Burrows Wheeler transform.

Assessment

The report is worth 10% of your final grade and should consist of the following parts:
  1. A description of the algorithm used (both in English and psuedo-code) (25%)
  2. A description of your implmentation, including private methods (25%)
  3. A theoretical analysis of the complexity of the algorithms and data structures used (25%)
  4. Experimental analysis of the efficiency of your programs in terms of the compression achieved and the time taken (25%)
The report should not be more than 5 pages.

The marking scheme for the software is as follows:

  1. Correctness (i.e. decompressing a compressed file returns the original file) 50%
  2. Efficiency (compression rate achieved, and time taken) 25%
  3. Programming Style (use of object oriented design, commenting, etc) 25%
Note, your report should also reference all resources used to complete your assignment. The project should be submitted using cssubmit, as well as submitting a hardcopy of the report to the CSSE office. As the semester proceeds, your ongoing marks will be updated regularly and stored in a database that you can check by using the csmarks program.

This Page