CITS4009: Project

Submission date

October 31, 11:59 pm

Overview

Your task is to implement a real data project from start to finish. This includes the whole data science process: loading data, cleaning data, manipulating and modelling the data, producing graphs that represent the data and then presenting the findings in a business-like document. The final product will be a document containing all of this information and the R code that produced it (if you would prefer to use another language, such as Python, that is acceptable).
The data set is from the UCI dataset respositories we have used in class, and contains information about individual household electric power consumption. All the information needed about the data is contained within the web page linked. The data contains measurements of electric power consumption in one household with a one-minute sampling rate over a period of almost 4 years. Different electrical quantities and some sub-metering values are available. Given how large this data set is, the only dates you should be concerned about are 8th of February to 9th of February 2007, but if you want to explore further .
Keep in mind what you do with the data is very much up to you: some suggestions for probing would be power consumption by day, voltage over time, power by submeter over time etc. It may seem daunting without having instructions to follow, but start somewhere and see what you can come up with. There will be an interactive lecture on the 5th of October that will show this process; it is in your best interest to attend.
The document could be submitted as a PDF or Powerpoint etc., and must be professional: it should read as though you are presenting this information to a boss or manager etc. There is so set format for this, as real-world data projects in businesses do not have set formats; think about how best to present the information. The document should be standalone (it can be emailed to someone and all the information they need will be in the document). This should be no more than 10 A4 pages, included diagrams and graphs. Good luck!

Marking Rubric

Data Processing (30%)

Data Modelling (10%)

Data Visualisations (10%)

Presentation Document (30%)

BONUS: Creativity and Complex Modelling / Graphs (Up to 20%)