School of Computer Science and Software Engineering

CITS3401 Data Warehousing and Data Mining — Midterm

Unit Coordinator

Dr. Wei Liu
wei.liu@uwa.edu.au
Rm: CSSE 2.18
Ext: 3095

 

Lecturer

Dr. Wei Liu

 

Lab Demonstrator

Dr. Syed Mohammed Shamsul Islam (Shams)

 

Help Forum

help3401

Consultation Time

Where: Room 2.18
Time: Tuesdays 2:00-3:00pm
No appointment needed.

 

News:

Mid-semester Test Protocol

The mid-semester test will be conducted at the lecture venue during the first hour of the lecture.

  • Date: Tuesday 26 April 2016
  • Time: 10:00am - 11:00am
  • Venue: CSSE Seminar Room 1.24
  • What to bring (must): Pen
  • What to bring (optional): Calculator, Correction Liquid, Water, lucky-charm :-)

Format of Mid-semester Test

Similar to the final exam, the mid-semester test consists of short and long answer questions.

For short answer questions, an answer of 5-8 sentences is expected.

For long answer questions, an answer of half to one page is expected.

Mid-semester Test Sample Questions

[30 marks] Data Warehousing

This section covers material from lecture notes: lecture2.pdf and lecture3.pdf. A total of 30 marks worth of questions as a combination of short and long questions.

Below is an illustration of what short answer and long answer questions look like. The same key learning points may be examined either in the short answer format or in the long answer format.

Short Answers
  1. (5 marks) What is a data cube? Use an example to explain the OLAP operations roll up and drill down in relation to a data cube.
  2. (5 marks) Explain the concept of a data warehouse and the main steps required for constructing a data warehouse.
  3. (5 marks) Explain the meaning of star schema and snowflake schema in relation to a data warehouse.
  4. (5 marks) Explain what multiway array aggegration means and how it achieves simultaneous aggregation.
  5. (5 marks) Explain the difference between distributive, algebraic and holistic measures, using example measures.
Long Answers
  1. (20 marks)

    Suppose that a data warehouse consists of three dimensions time, doctor and patient, and two measures count (the number of patients examined) and charge (fee that a doctor charges a patient for a visit).

    Draw either a star or a snowflake schema for the above data warehouse.

    Starting with the base cuboid [day, doctor, patient], what specific OLAP operations should be performed in order to list the total fee collected by each doctor in 2010?

    Starting with the base cuboid [day, doctor, patient], what specific OLAP operations should be performed in order to list the total fee paid by patient John Citizen in the years 2009 and 2010 combined?

[20 marks] Data Exploration, Data Cleaning and Data Reduction

This section covers material from lecture notes: lecture4.pdf, lecture5.pdf and lecture6.pdf. A total of 20 marks worth of questions as a combination of short and long questions.

Short answers
  1. (5 marks) What are the common strategies for dealing with missing data?
  2. (5 marks) What is a 5 number summary of the dataset? How is it related to a boxplot?
  3. (5 marks) How would you classify the various data reduction techniques we discussed in the lectures?
Long Answers
  1. (10 marks) Given the following data table and the forumulas, demonstrate the process of calculating correlation for Table 1 and chi-square for Table 2.

    Forumula for Correlation Calculation

    Forumula for Chi-Square Calculation

    Table 1
    Time Point AllElectronics HighTech
    t1 6 20
    t2 5 10
    t3 4 14
    t4 3 5
    t5 2 5
    Table 2
    Student Read Science Fiction Play Chess
    s1 Yes No
    s2 No No
    s3 No Yes
    s4 Yes Yes
    s5 Yes Yes

Sample Answer

Sample Midterm's Sample Answer

This Page

Last Edited on:
Sunday 24th of April 2016 01:57:08 PM

Website Feedback:
wei.liu@uwa.edu.au