Difference between revisions of "Fall 2024: Data Mining Lab"

From MKWiki
Jump to navigation Jump to search
Line 38: Line 38:
 
| style="width: 60%" | Apply data cleaning techniques on any dataset (e.g. Chronic Kidney Disease dataset from UCI repository). Techniques may include handling missing values, outliers and inconsistent values. Also, a set of validation rules may be specified for the particular dataset and validation checks performed.
 
| style="width: 60%" | Apply data cleaning techniques on any dataset (e.g. Chronic Kidney Disease dataset from UCI repository). Techniques may include handling missing values, outliers and inconsistent values. Also, a set of validation rules may be specified for the particular dataset and validation checks performed.
 
| style="width: 15%" |  Practical No. 1
 
| style="width: 15%" |  Practical No. 1
| '''Dataset:''' [http://mkbhandari.com/mkwiki/data/fall2024/dm/datasets/kidneyDisease.csv '''abc.csv'''] <br>
+
| '''Dataset:''' [http://mkbhandari.com/mkwiki/data/fall2024/dm/datasets/kidneyDisease.csv '''kidneyDisease.csv'''] <br>
 
'''Download from Kaggle:''' [https://www.kaggle.com/datasets/mansoordaku/ckdisease Chronic KIdney Disease dataset] <br>
 
'''Download from Kaggle:''' [https://www.kaggle.com/datasets/mansoordaku/ckdisease Chronic KIdney Disease dataset] <br>
 
'''Tutorial:''' [https://www.kaggle.com/code/alexisbcook/handling-missing-values#How-many-missing-data-points-do-we-have? Tutorial on Handling Missing values]  
 
'''Tutorial:''' [https://www.kaggle.com/code/alexisbcook/handling-missing-values#How-many-missing-data-points-do-we-have? Tutorial on Handling Missing values]  

Revision as of 20:05, 4 September 2024

Instructions

  • Please be on time to avoid the Attendance Penalty.
  • Please sign on the Attendance Register before your take a seat.
  • Please put your mobile phone in the Silent Mode.
  • Each lab assignment needs to be submitted in the Google Classroom for evaluation(will be notified in the GC lab-wise, submit before the deadline).
  • Turn off(shut down) your assigned computer and arrange the chair before you leave the lab.

Guidelines

Lab 0: Getting Started ( week of 05th & 12th August 2024 )

Q. NO. Program Practical No. Remarks
1 https://www.cse.msu.edu/~ptan/dmbook/tutorials/tutorial1/tutorial1.html Practice Set No. 1 Introduction to Python
2 https://www.cse.msu.edu/~ptan/dmbook/tutorials/tutorial2/tutorial2.html Practice Set No. 2 Introduction to Numpy and Pandas
3 https://www.cse.msu.edu/~ptan/dmbook/tutorials/tutorial3/tutorial3.html Practice Set No. 3 Data Exploration

Lab 1: ( week of 19th & 26th August 2024 )

Q. NO. Program Practical No. Remarks
1 Apply data cleaning techniques on any dataset (e.g. Chronic Kidney Disease dataset from UCI repository). Techniques may include handling missing values, outliers and inconsistent values. Also, a set of validation rules may be specified for the particular dataset and validation checks performed. Practical No. 1 Dataset: kidneyDisease.csv

Download from Kaggle: Chronic KIdney Disease dataset
Tutorial: Tutorial on Handling Missing values

Lab 2: ( week of 2nd 9th September 2024 )

Q. NO. Program Practical No. Remarks
1 Apply data pre-processing techniques such as standardization/normalization, transformation, aggregation, discretization/binarization, sampling etc. on any dataset Practical No. 2 Dataset:

abc.csv

Download from Kaggle: abc.csv

Tutorial: Tutorial on Preprocessing Techniques

Projects

Team No. Project Title Team Members Outcomes/Remarks
1 Any Classification/Clustering Problem
  1. 1
  2. 2
  3. 3
  4. 4
2 Any Classification/Clustering Problem
  1. 1
  2. 2
  3. 3
  4. 4