Difference between revisions of "Fall 2024: Data Mining Lab"
Jump to navigation
Jump to search
| Line 38: | Line 38: | ||
| style="width: 60%" | Apply data cleaning techniques on any dataset (e.g. Chronic Kidney Disease dataset from UCI repository). Techniques may include handling missing values, outliers and inconsistent values. Also, a set of validation rules may be specified for the particular dataset and validation checks performed. | | style="width: 60%" | Apply data cleaning techniques on any dataset (e.g. Chronic Kidney Disease dataset from UCI repository). Techniques may include handling missing values, outliers and inconsistent values. Also, a set of validation rules may be specified for the particular dataset and validation checks performed. | ||
| style="width: 15%" | Practical No. 1 | | style="width: 15%" | Practical No. 1 | ||
| − | | '''Dataset:''' [http://mkbhandari.com/mkwiki/data/fall2024/dm/ | + | | '''Dataset:''' [http://mkbhandari.com/mkwiki/data/fall2024/dm/datasets/kidneyDisease.csv '''abc.csv'''] <br> |
'''Download from Kaggle:''' [https://www.kaggle.com/datasets/mansoordaku/ckdisease Chronic KIdney Disease dataset] <br> | '''Download from Kaggle:''' [https://www.kaggle.com/datasets/mansoordaku/ckdisease Chronic KIdney Disease dataset] <br> | ||
'''Tutorial:''' [https://www.kaggle.com/code/alexisbcook/handling-missing-values#How-many-missing-data-points-do-we-have? Tutorial on Handling Missing values] | '''Tutorial:''' [https://www.kaggle.com/code/alexisbcook/handling-missing-values#How-many-missing-data-points-do-we-have? Tutorial on Handling Missing values] | ||
Revision as of 20:04, 4 September 2024
Contents
Instructions
- Please be on time to avoid the Attendance Penalty.
- Please sign on the Attendance Register before your take a seat.
- Please put your mobile phone in the Silent Mode.
- Each lab assignment needs to be submitted in the Google Classroom for evaluation(will be notified in the GC lab-wise, submit before the deadline).
- Turn off(shut down) your assigned computer and arrange the chair before you leave the lab.
Guidelines
- As per DUCS guidelines DSE: Data Mining
Lab 0: Getting Started ( week of 05th & 12th August 2024 )
| Q. NO. | Program | Practical No. | Remarks |
|---|---|---|---|
| 1 | https://www.cse.msu.edu/~ptan/dmbook/tutorials/tutorial1/tutorial1.html | Practice Set No. 1 | Introduction to Python |
| 2 | https://www.cse.msu.edu/~ptan/dmbook/tutorials/tutorial2/tutorial2.html | Practice Set No. 2 | Introduction to Numpy and Pandas |
| 3 | https://www.cse.msu.edu/~ptan/dmbook/tutorials/tutorial3/tutorial3.html | Practice Set No. 3 | Data Exploration |
Lab 1: ( week of 19th & 26th August 2024 )
| Q. NO. | Program | Practical No. | Remarks |
|---|---|---|---|
| 1 | Apply data cleaning techniques on any dataset (e.g. Chronic Kidney Disease dataset from UCI repository). Techniques may include handling missing values, outliers and inconsistent values. Also, a set of validation rules may be specified for the particular dataset and validation checks performed. | Practical No. 1 | Dataset: abc.csv Download from Kaggle: Chronic KIdney Disease dataset |
Lab 2: ( week of 2nd 9th September 2024 )
| Q. NO. | Program | Practical No. | Remarks |
|---|---|---|---|
| 1 | Apply data pre-processing techniques such as standardization/normalization, transformation, aggregation, discretization/binarization, sampling etc. on any dataset | Practical No. 2 | Dataset:
Download from Kaggle: abc.csv Tutorial: Tutorial on Preprocessing Techniques |
Projects
| Team No. | Project Title | Team Members | Outcomes/Remarks |
|---|---|---|---|
| 1 | Any Classification/Clustering Problem |
|
|
| 2 | Any Classification/Clustering Problem |
|
|