School of Computer Science and Software Engineering

# School of Computer Science and Software Engineering

## Dr. Syed Mohammed Shamsul Islam (Shams)

help3401

#### Consultation Time

Where: Room 2.18
Time: Tuesdays 2:00-3:00pm
No appointment needed.

### Mid-semester Test Protocol

The mid-semester test will be conducted at the lecture venue during the first hour of the lecture.

• Date: Tuesday 26 April 2016
• Time: 10:00am - 11:00am
• Venue: CSSE Seminar Room 1.24
• What to bring (must): Pen
• What to bring (optional): Calculator, Correction Liquid, Water, lucky-charm :-)

### Format of Mid-semester Test

Similar to the final exam, the mid-semester test consists of short and long answer questions.

For short answer questions, an answer of 5-8 sentences is expected.

For long answer questions, an answer of half to one page is expected.

### Mid-semester Test Sample Questions

#### [30 marks] Data Warehousing

This section covers material from lecture notes: lecture2.pdf and lecture3.pdf. A total of 30 marks worth of questions as a combination of short and long questions.

Below is an illustration of what short answer and long answer questions look like. The same key learning points may be examined either in the short answer format or in the long answer format.

1. (5 marks) What is a data cube? Use an example to explain the OLAP operations roll up and drill down in relation to a data cube.
2. (5 marks) Explain the concept of a data warehouse and the main steps required for constructing a data warehouse.
3. (5 marks) Explain the meaning of star schema and snowflake schema in relation to a data warehouse.
4. (5 marks) Explain what multiway array aggegration means and how it achieves simultaneous aggregation.
5. (5 marks) Explain the difference between distributive, algebraic and holistic measures, using example measures.
1. (20 marks)

Suppose that a data warehouse consists of three dimensions time, doctor and patient, and two measures count (the number of patients examined) and charge (fee that a doctor charges a patient for a visit).

Draw either a star or a snowflake schema for the above data warehouse.

Starting with the base cuboid [day, doctor, patient], what specific OLAP operations should be performed in order to list the total fee collected by each doctor in 2010?

Starting with the base cuboid [day, doctor, patient], what specific OLAP operations should be performed in order to list the total fee paid by patient John Citizen in the years 2009 and 2010 combined?

#### [20 marks] Data Exploration, Data Cleaning and Data Reduction

This section covers material from lecture notes: lecture4.pdf, lecture5.pdf and lecture6.pdf. A total of 20 marks worth of questions as a combination of short and long questions.

1. (5 marks) What are the common strategies for dealing with missing data?
2. (5 marks) What is a 5 number summary of the dataset? How is it related to a boxplot?
3. (5 marks) How would you classify the various data reduction techniques we discussed in the lectures?
1. (10 marks) Given the following data table and the forumulas, demonstrate the process of calculating correlation for Table 1 and chi-square for Table 2.

Table 1
Time Point AllElectronics HighTech
t1 6 20
t2 5 10
t3 4 14
t4 3 5
t5 2 5
Table 2
Student Read Science Fiction Play Chess
s1 Yes No
s2 No No
s3 No Yes
s4 Yes Yes
s5 Yes Yes

Sample Midterm's Sample Answer

# The University of Western Australia

## University information

CRICOS Code: 00126G