General Instructions

Aims of the Lab

In this lab you will learn and practice how to apply data mining techniques on different applications. In particular we will cover the following objectives:

Outlier Detection

Outliers or anomalies are data points that may be generated by errors in the data generation process so that they deviate from the rest of the data. In statistical modaling, outliers are usually ‘treated’ (removed or replaced with representative statistics like mean). Howerver, in data mining, outliers may be the ‘thing’ that the data scientist is looking for. For example, fraudulant electronic transactions may happen once in a million. Yet, it is important to detect them.

Regression

Regression analysis is used to find the relationship among variables. Regression analyssis has a rich history of well developed procedures for predicting values and finding trends in data.

Association Rule Mining

Association rule mining find how frequently different items occur together. Association rule mining is heavily used in market basket analysis. * Q5 As a preliminary you have to invoke ‘arules’ library.

LS0tDQp0aXRsZTogIkNJVFMgNDAwOSBMYWIgNiAtIENob29zaW5nIGFuZCBFdmFsdWF0aW5nIE1vZGVscyAtIHBhcnQgMiINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCiMjIyBHZW5lcmFsIEluc3RydWN0aW9ucw0KKiBZb3VyIGxhYnNoZWV0cyB3aWxsIGJlIHN0cnVjdHVyZWQgd2l0aCBjb21wbGVtZW50b3J5IGluZm9ybWF0aW9uLiBUaGUgbGFicyB3aWxsIGNsb3NlbHkgZm9sbG93IHRoZSBzdHJ1Y3R1cmUgb2YgIlByYWN0aWNhbCBEYXRhIFNjaWVuY2Ugd2l0aCBSIiBib29rIGJ5IE5pbmEgWnVtZWwgYW5kIEpvaG4gTW91bnQgDQoqIEZyb20gZWFjaCBsYWIgeW91IGFyZSBleHBlY3RlZCB0byBhbnN3ZXIgYWxsIHRoZSBxdWVzdGlvbnMgcHJlc2VudGVkIHdpdGggYSBxdWVzdGlvbiBudW1iZXIuIA0KDQojIyMgQWltcyBvZiB0aGUgTGFiDQpJbiB0aGlzIGxhYiB5b3Ugd2lsbCBsZWFybiBhbmQgcHJhY3RpY2UgaG93IHRvIGFwcGx5IGRhdGEgbWluaW5nIHRlY2huaXF1ZXMgb24gZGlmZmVyZW50IGFwcGxpY2F0aW9ucy4gSW4gcGFydGljdWxhcg0Kd2Ugd2lsbCBjb3ZlciB0aGUgZm9sbG93aW5nIG9iamVjdGl2ZXM6DQoNCiogUHJlcGFyZSBkYXRhIGZvciBtb2RlbGxpbmcNCiogVXNpbmcgbW9kZWxzIG9uIG91dGxpZXIgZGV0ZWN0aW9uLCBhc3NvY2lhdGlvbiBydWxlIG1pbmluZyBhbmQgcmVncmVzc2lvbg0KKiBFdmFsdWF0aW5nIG1vZGVsIHF1YWxpdHkNCg0KDQojIE91dGxpZXIgRGV0ZWN0aW9uDQpPdXRsaWVycyBvciBhbm9tYWxpZXMgYXJlIGRhdGEgcG9pbnRzIHRoYXQgbWF5IGJlIGdlbmVyYXRlZCBieSBlcnJvcnMgaW4gdGhlIGRhdGEgZ2VuZXJhdGlvbiBwcm9jZXNzIHNvIHRoYXQgdGhleSBkZXZpYXRlIGZyb20gdGhlIHJlc3Qgb2YgdGhlIGRhdGEuIEluIHN0YXRpc3RpY2FsIG1vZGFsaW5nLCBvdXRsaWVycyBhcmUgdXN1YWxseSAndHJlYXRlZCcgKHJlbW92ZWQgb3IgcmVwbGFjZWQgd2l0aCByZXByZXNlbnRhdGl2ZSBzdGF0aXN0aWNzIGxpa2UgbWVhbikuIEhvd2VydmVyLCBpbiBkYXRhIG1pbmluZywgb3V0bGllcnMgbWF5IGJlIHRoZSAndGhpbmcnIHRoYXQgdGhlIGRhdGEgc2NpZW50aXN0IGlzIGxvb2tpbmcgZm9yLiBGb3IgZXhhbXBsZSwgZnJhdWR1bGFudCBlbGVjdHJvbmljIHRyYW5zYWN0aW9ucyBtYXkgaGFwcGVuIG9uY2UgaW4gYSBtaWxsaW9uLiBZZXQsIGl0IGlzIGltcG9ydGFudCB0byBkZXRlY3QgdGhlbS4gDQoNCiogKipRMSoqIENyZWF0ZSBhIHN0YW5kYXJkIG5vcm1hbCBkYXRhc2V0IG9mIDEwMDAgZGF0YSBwb2ludC4gWW91IGNhbiB1c2UgJ3Jub3JtKCknIGZ1bmN0aW9uLiA5OS43JSBvZiB0aGUgZGF0YSBvZiBhIHN0YW5kYXJkIG5vcm1hbCBkaXN0cmlidXRpb24gd2l0aGluIHRoZSByYW5nZSBvZiAzIHN0YW5kYXJkIGRldmlhdGlvbiBmcm9tIHRoZSBtZWFuLiBIb3cgbWFueSBvdXRsaWVyIHBvaW50cyBhcmUgdGhlcmUgaW4geW91ciBkYWFzZXQ/IFdoYXQgYXJlIHRoZSB2YWx1ZXMgb2YgdGhlc2Ugb3V0bGllcnM/DQoNCiogKipRMioqIERvd25sb2FkIHRoZSBtZWFuIG1vbnRobHkgc29sYXIgZXhwb3N1cmUgZnJvbSB5ZWFyIDE5OTAgdG8gMjAxNyBpbiAnUGVydGgnIChzdGF0aW9uIG51bWJlciA5MjI1KSBmcm9tIHRoZSBCZXJlYXUgb2YgTWV0ZW9yb2xvZ3kgd2Vic2l0ZSAoaHR0cDovL3d3dy5ib20uZ292LmF1L2NsaW1hdGUvZGF0YS8pLiBMb2FkIHRoZSBkYXRhIGludGEgdmFyaWFibGUgY2FsbGVkICdzbHJhZCcuDQoNCiAgICAyLjEgUGxvdCB0aGUgaGlzdG9ncmFtIG9mIHRoZSBtZWFuIG1vbnRobHkgc29sYXIgZXhwb3N1cmUgZm9yIGFsbCB0aGUgeWVhcnM/IERvZXMgdGhlIGRpc3RyaWJ1dGlvbiBsb29rIGxpa2UgYSBub3JtYWwgZGlzdHJpYnV0aW9uPyBJZiBub3QsIGhvdyBkbyB5b3UgZmluZCB0aGUgb3V0bGllcnMgZnJvbSB0aGlzIGRhdGE/IA0KICAgIA0KICAgIDIuMiBXZSBhcmUgaW50ZXJlc3RlZCB0byBsb29rIGF0IGFub21hbG91cyB5ZWFyIGFzIHdlbGwgYXMgdGhlIG1vbnRocyBvZiB0aGF0IHllYXIuIFlvdSBjYW4gdXNlIGJveHBsb3QgbWV0aG9kIHRvIHZpc3VhbGlzZSB0aGUgYW5zd2VyIHRvIHRoaXMgcXVlc3Rpb24uIA0KICAgIA0KICAgICAgICAyLjIuMSBEcmF3IGEgYm94cGxvdCBkaWFncmFtIG9mIHNvbGFyIGV4cG9zdXJlIHZzIHllYXIgd2hlcmUgdGhlIGJveHBsb3Qgb2YgZWFjaCB5ZWFyIGNvbXByaXNlIG9mIHRoZSBzb2xhciBleHBzdXJlIHZhbHVlcyBvZiB0aGUgbW9udGhzLiANCiAgICAgICAgDQogICAgICAgIDIuMi4yIE5vdyBkcmF3IGEgYm94cGxvdCBkaWFncmFtIG9mIHNvbGFyIGV4cG9zdXJlIHZzIG1vbnRoIHdoZXJlIHRoZSBib3hwbG90IG9mIGVhY2ggbW9udGggY29udGFpbnMgZGF0YSBmcm9tIGV2ZXJ5IHllYXIuIA0KICAgICAgICANCiAgICAgICAgMi4yLjMgV2hhdCB0aGUgbW9udGggaW4gd2hpY2ggdGhlIHNvbGFyIGV4cG9zdXJlIHdhcyByZWNvcmRlZCB0aGUgbG93ZXN0PyAoSGludDogVXNlIHRoZSBib3hwbG90cykNCiAgICAgICAgDQogICAgICAgIDIuMi40IFdoaWNoIHllYXIgaGFkIHRoZSBtZWRpYW4gc29sYXIgZXhwb3N1cmUgaGlnaGVzdD8NCiAgICAgICAgDQogICAgICAgIDIuMi41IEhvdyBtYW55IHllYXJzIGRpZCBQZXJ0aCByZWNlaXZlZCBzb2xhciBleHBvc3VyZSA+MzA/DQogICAgICAgIA0KICAgICAgICAyLjIuNiBXaGF0IGFub21hbGllcyBjYW4geW91IGZpbmQgZnJvbSB0aGUgcGxvdHMgaW4gMi4yLjEgYW5kIDIuMi4yPw0KICAgIA0KICAgIDIuMyBDYWxjdWxhdGUgdGhlIA0KICAgIA0KKiAqKlEzKiogV3JpdGUgYSBmdW5jdGlvbiAnc3Rkbm9ybUFub21zJyB3aGVyZSBpdCB0YWtlcyBhIGxpc3Qgb2YgdmFsdWVzIGFuZCByZXR1cm4gYSBsaXN0IGluZGljYXRpbmcgdHJ1ZS9mYWxzZSBmb3IgZWFjaCBpdGVtLiBUaGUgbWV0aG9kIGZpbmQgdGhlIG1lYW4gb2YgdGhlIGlucHV0IGxpc3QuIEZvciBlYWNoIGl0ZW0gaW4gdGhlIGxpc3QgeW91IHNob3VsZCBjaGVjayB3aGV0aHJlIGl0IGlzIG1vcmUgdGhhbiBvciBlcXVhbCB0byB0aHJlZSBzdGFuZGFyZCBkZXZpYXRpb25zIG9mIHRoZSBtZWFuLiBJZiBpdCBpcyBmdXJ0aGVyIHlvdSBtYXJrIHRoZSBpdGVtIGFzIHRydWUuIE90aGVyd2lzZSBmYWxzZS4gSW4gb3RoZXJ3b3JkcywgdGhpcyBmdW5jdGlvbiBpbXBsZW1lbnRzIHRoZSBhbm9tYWx5IGRldGVjdGlvbiBjb25jZXB0IHByZXNlbnRlZCBpbiBRMS4NCiAgICANCiAgICAzLjEgVXNlIHRoZSBzdGRub3JtQW5vbXMoKSB0byBmaW5kIGFub21hbGllcyBpbiBRMSBhbmQgUTIuMi42LiBBcmUgdGhlIHJlc3VsdHMgZXF1YWwgdG8geW91ciBwcmV2aW91cyBhdHRlbXB0cz8NCiAgICAgICAgDQogICAgDQoNCg0KDQojIFJlZ3Jlc3Npb24NClJlZ3Jlc3Npb24gYW5hbHlzaXMgaXMgdXNlZCB0byBmaW5kIHRoZSByZWxhdGlvbnNoaXAgYW1vbmcgdmFyaWFibGVzLiBSZWdyZXNzaW9uIGFuYWx5c3NpcyBoYXMgYSByaWNoIGhpc3Rvcnkgb2Ygd2VsbCBkZXZlbG9wZWQgcHJvY2VkdXJlcyBmb3IgcHJlZGljdGluZyB2YWx1ZXMgYW5kIGZpbmRpbmcgdHJlbmRzIGluIGRhdGEuIA0KDQoqICoqUTQqKiBXZSB3YW50IHRvIGZpbmQgd2hldGhlciB0aGVyZSBpcyBhIHRyZW5kIGluIGluIG1heGltdW0gc29sYXIgZXhwb3N1cmUgYW5kIGFsc28gaXMgdGhlcmUgYSB0cmVuZCBpbiBtaW5pbXVtIHNvbGFyIGV4cG9zdXJlIHRvIFBlcnRoPw0KDQogICAgNC4xIEZpbmQgdGhlIG1heGltdW0gc29sYXIgZXhwb3JlIHZhbHVlcyBmb3IgZXZlcnkgeWVhciBhbmQgc2F2ZSBpdCBhcyAnbWF4c2xyYWQnLiBZb3VyIGRhdGEgZnJhbWUgKG1heHNscmFkKSBzaG91bGQgaGF2ZSB0d28gY29sdW1uczogb25lIGZvciB5ZWFyIGFuZCBhbm90aGVyIGZvciBtYXhpbXVtIHNvbGFyIGV4cG9zdXJlLg0KICAgIA0KICAgIDQuMiBVc2UgbGluZWFyIHJlZ3Jlc3Npb24gdG8gZml0IHRoZSBtYXhzbHJhZCB0byBhIGxpbmVhciBtb2RlbC4gV2hhdCBjYW4gc2F5IGFib3V0IHRoZSBtYXhpbXVtIHNvbGFyIHJhZGlhdGlvbiBpbiAyMDE4PyANCiAgICANCiAgICA0LjMgSG93IGRvIHlvdSBtZWFzdXJlIHRoZSBhY2N1cmFjeSBvZiB0aGUgcmVzdWx0cz8gT25lIHdheSBvZiBjYWxjdWxhdGluZyB0aGUgYWNjdXJhY3kgb2YgcmVncmVzc2lvbiByZXN1bHRzIGlzIHRvIGZpbmQgdGhlIGVycm9yIGJldHdlZW4gb2JzZXJ2ZWQgYW5kIG1vZGVsIHZhbHVlcyAoZS5nLiBSTVNFKS4NCiAgICANCiAgICA0LjQgRmluZCB0aGUgdHJlbmQgb2YgbWluaW11bSBzb2xhciBleHBvc3VyZSB2YWx1ZSBmb3IgZXZlcnkgeWVhciBhbmQgc2F2ZSBpdCBhcyAnbWluc2xyYWQnLg0KICAgIA0KICAgIDQuNSBXaGF0IGdlbmVyYWwgY29uY2x1c2lvbnMgeW91IGNhbiBkcmF3IGZyb20gdGhlIHRyZW5kcyBvZiBtaW5pbXVtIGFuZCBtYXhpbXVtIHNvbGFyIGV4cG9zdXJlIHZhbHVlcyAoNC4yIGFuZCA0LjQpPw0KDQoNCg0KDQojIEFzc29jaWF0aW9uIFJ1bGUgTWluaW5nDQpBc3NvY2lhdGlvbiBydWxlIG1pbmluZyBmaW5kIGhvdyBmcmVxdWVudGx5IGRpZmZlcmVudCBpdGVtcyBvY2N1ciB0b2dldGhlci4gQXNzb2NpYXRpb24gcnVsZSBtaW5pbmcgaXMgaGVhdmlseSB1c2VkIGluIG1hcmtldCBiYXNrZXQgYW5hbHlzaXMuIA0KKiAqKlE1KiogQXMgYSBwcmVsaW1pbmFyeSB5b3UgaGF2ZSB0byBpbnZva2UgJ2FydWxlcycgbGlicmFyeS4NCg0KKiAqKlE1KiogTG9hZCB0aGUgaW5mb3JtYXRpb24gaW4gdGhlIGRhdGEvYm9va2RhdGEudHN2ICBpbnRvICdib29rZGF0YScuIFRoZSBib29rZGF0YSBpcyBpbiBzcGVjaWFsIGZvcm1hdCBjYWxsZWQgdHJhbnNhY3Rpb24uIFlvdSBjYW4gdXNlIHJlYWQudHJhbnNhY3Rpb25zKCkgbWV0aG9kLiBUaGUgZGF0YSBhcmUgc2VwYXJhdGVkIGJ5IHRhYnMuIEdpdmUgY29sdW1uIG5hbWVzOiAndXNlcklkJyBhbmQgJ3RpdGxlJy4gQWxzbywgZG8gbm90IHJlYWQgYW55IGR1cGxpY2F0ZSB0cmFuc2FjdGlvbnMuDQoNCiAgICA1LjEgSW5zcGVjdCB0aGUgbnVtYmVyIG9mIHRyYW5zYWN0aW9ucyBhbmQgbnVtYmVyIG9mIGNvbHVtbnMgb2YgJ2Jvb2tkYXRhJy4gVGhlIGNvbHVtbnMgb2YgdGhpcyB0cmFuc2FjdGlvbiBtYXRyaXggcmVwcmVzZW50cyBkaWZmZXJlbnQgYm9vayBuYW1lcy4gVGhlIHJvd3MgcmVwcmVzZW50IGEgc2luZ2xlIHRyYW5zYWN0aW9uIGRhdGEuIEZvciBleGFtcGxlIGNvbnNpZGVyIGEgcmVjb3JkIDEwMDAwMC4uLjAwMSBUaGVuIHRoZSBmaXJzdCBhbmQgbGFzdCBib29rcyBhcmUgY2hlY2tlZCBvdXQvKG9yIGluKSBpbiB0aGlzIHRyYW5zYWN0aW9uLiAwcyByZXByZXNlbnQgdGhlIGJvb2tzIHRoYXQgd2VyZSBub3QgdXNlZCBmb3IgdGhlIHRyYW5zYWN0aW9uLg0KICAgIA0KICAgIDUuMiBFeHBsb3JlIHRoZSBjb2x1bW4gbmFtZXMgdG8gZmluZCB0aGUgYm9vayBuYW1lcy4NCiAgICANCiAgICA1LjMgRmluZCB0aGUgZml2ZSBtb3N0IGZyZXF1ZW50IGJvb2tzLiBIb3cgbWFueSB0aW1lcyB0aGV5IG9jY3VyIHRvZ2V0aGVyPw0KICAgIA0KICAgIDUuNCBMZWFybiBob3cgdG8gdXNlIGFwcmlvcmkoKSBmdW5jdGlvbi4gDQogICAgICAgIA0KICAgICAgICA1LjQuMSBGaW5kIHRoZSBkaXN0cmlidXRpb24gb2YgdHJhbnNhY3Rpb24gc2l6ZXMuIEhpbnQgdXNlIHNpemUoKSBtZXRob2QuIHNhdmUgdGhlIHJlc3VsdHMgb24gJ2Jhc2tldFNpemVzJw0KICAgICAgICANCiAgICAgICAgNS40LjIgRmluZCB0aGUgc3Vic2V0IG9mIGJvb2tkYXRhIHdoZXJlIGJhc2tldFNpemVzID4xDQogICAgICAgIA0KICAgICAgICA1LjQuMyBVc2UgdGhlIGFwcmlvcmkgbWV0aG9kIGZpbmQgdGhlIHJ1bGVzL3BhdHRlcm5zIG9mIGJvb2sgZGF0YS4gU3BlY2lmeSBjb25maWRlbmMgPSAwLjc1IGFuZCBzdXBwb3J0ID0gMC4wMDIgYXMgcGFyYW1ldGVycy4NCiAgICAgICAgDQogICAgICAgIDUuNC40IFByaW50IHRoZSBzb3J0ZWQgcGFpciBvZiBib29rcyBiYXNlZCBvbiB0aGVpciBjb25maWRlbmNlIFlvdSBjYW4gdXNlIGluc3BlY3QoKSBtZXRob2QgYW5kIHNvcnQoKSBtZXRob2RzIGZvciB0aGlzLg0KDQoNCg0KDQo=