"The Feynman Problem Solving Algorithm: (1) write down the problem; (2) think very hard; (3) write down the answer." Richard Feynman
"Simplicity is a great virtue but it requires hard work to achieve it and education to appreciate it. And to make matters worse: complexity sells better." Edsger Dijkstra
"Beware of bugs in the above code. I have only proved it correct, not tried it." Donald Knuth
"There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult.", Tony Hoare
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.", Donald Knuth
Your task is to implement lossless data compression/decompression algorithms in java and to analyse their performance. Compression and decompression algorithms are used frequently in applications such as zip, gzip, winzip, compress, and well as in computer networks and various data formats.
This assignment can be done in pairs, although you may work on your own if you prefer.
This assignment is worth 20% of your final grade, and consists of:
Your implementation should consist of a set of java classes including one the implements the compressor interface in the CITS2200 package. This class accepts an InputStream and OutputStream as parameters and will write a compressed version of the input stream to the output stream.
A basic example is available. The class RLE is an implementation of a very basic compression algorithm called Run-length Encoding, which hardly ever works (unless you combine it with the Burrows Wheeler Transform, or something similar).
This should give you an idea how to handle IO, command line parameters etc. There is also an inner class called WriteBuffer
, which can be used to buffer bits (so if you only want to write 3 bits at a time, it will store them until you have at least 8 bits to write, and then output a byte).
This code has not been well tested, so please feel free to ask questions, or make suggestions on the discussion forum.
Your program should be able to be run from the command line, accepting input and output files as paramaters. For example,
java Compress -c file.txt file.z
may run your compression program on file.txt
to produce file.z
, and
java Compress -d file.z file2.txt
may run your decompression algorithm on file.z
to output the file file2.txt
which is identical to
file.txt
.
Your program should also be designed to give good compression rates on common file formats, including text files, class files and pdfs.
Other potential algorithms include run-length encoding, range coding, and the Burrows Wheeler transform.
The marking scheme for the software is as follows: