Fall 2024: Data Mining Lab

1 Instructions
2 Guidelines
3 Lab 0: Getting Started ( week of 05^th & 12^th August 2024 )
4 Lab 1: ( week of 19^th & 26^th August 2024 )
5 Lab 2: ( week of 2^nd & 9^th September 2024 )
6 Lab 3: ( week of 16^th, 23^rd & 30^thSeptember 2024 )
7 Lab 4: ( week of 7^th October 2024 )
8 Projects

Instructions

Please be on time to avoid the Attendance Penalty.
Please sign on the Attendance Register before your take a seat.
Please put your mobile phone in the Silent Mode.
Each lab assignment needs to be submitted in the Google Classroom for evaluation(will be notified in the GC lab-wise, submit before the deadline).
Turn off(shut down) your assigned computer and arrange the chair before you leave the lab.

Guidelines

As per DUCS guidelines DSE: Data Mining

Lab 0: Getting Started ( week of 05^th & 12^th August 2024 )

Q. NO.	Program	Practical No.	Remarks
1	https://www.cse.msu.edu/~ptan/dmbook/tutorials/tutorial1/tutorial1.html	Practice Set No. 1	Introduction to Python
2	https://www.cse.msu.edu/~ptan/dmbook/tutorials/tutorial2/tutorial2.html	Practice Set No. 2	Introduction to Numpy and Pandas
3	https://www.cse.msu.edu/~ptan/dmbook/tutorials/tutorial3/tutorial3.html	Practice Set No. 3	Data Exploration

Lab 1: ( week of 19^th & 26^th August 2024 )

Q. NO.	Program	Practical No.	Remarks
1	Apply data cleaning techniques on any dataset (e.g. Chronic Kidney Disease dataset from UCI repository). Techniques may include handling missing values, outliers and inconsistent values. Also, a set of validation rules may be specified for the particular dataset and validation checks performed.	Practical No. 1	Dataset: kidneyDisease.csv Download from Kaggle: Chronic KIdney Disease dataset Tutorial: Tutorial on Handling Missing values

Lab 2: ( week of 2^nd & 9^th September 2024 )

Q. NO.	Program	Practical No.	Remarks
1	Apply data pre-processing techniques such as standardization/normalization, transformation, aggregation, discretization/binarization, sampling etc. on any dataset	Practical No. 2	Dataset: rain.csv Download from data.gov.in: Rainfall in India

Lab 3: ( week of 16^th, 23^rd & 30^thSeptember 2024 )

Q. NO.	Program	Practical No.	Remarks
1	Writing/Review of Chapter 1, Chapter 3, and Chapter 4 of Project Report	Project Work

Lab 4: ( week of 7^th October 2024 )

Q. NO.	Program	Practical No.	Remarks
1	Apply simple K-means algorithm for clustering any dataset. Compare the performance of clusters by varying the algorithm parameters. For a given set of parameters, plot a line graph depicting MSE obtained after each iteration.	Practical No. 3	Dataset: Mall_Customers.csv Download from data from kaggle: Mall Customer Segmentation Data

Projects

Team No.	Project Title	Team Members	Outcomes/Remarks
1	Understanding the Monsoon Pattern in Eastern Gangatic Plain	Akshary Sharma (25019) Abhay Yadav (25040) Anuj Gupta (25042) Amar Kumar (25065) Kunal Verma (25073)	Dataset: Report: Project Presentation:
2	NIRF Ranking Prediction	Abhishek Prasad (25007) Vishal Kumar (25014) Nitish Kumar (25023) Anshu Kumar Dubey (25036) Sunny Chauhan (25050)	Dataset: Report: Project Presentation:
3	Student Performance Prediction	Himanshu Kumar (25016) Kanan Pal (25072) Khushboo Yadav (25082) Diksha Joshi (25091)	Dataset: Report: Project Presentation:
4	FIFA Prediction	Arihant (25003) Ayush Pundir (25027) Pratyush (25060) Ashish (25066)	Dataset: Report: Project Presentation:
5	Breast Cancer Prediction	Vidhan (25044) Sandeep Kumar Sharma (25047) Ayushman Pandey (25094) Tanishk Panchal (25095)	Dataset: Report: Project Presentation:
6	YouTube spam comments classification	Devesh Chauhan (25011) Shatrughan (25084) Om Ranjan (25085) Aman Sagar (25086)	Dataset: Report: Project Presentation:
7	Olympic Data Analysis and Prediction	Kusum (25002) Aditya Kumar (25012) Divyanshi (25021) Tushar Rana (25064)	Dataset: Report: Project Presentation:
8	Credit Card Fraud Detection	Ritesh Dhawan (25037) Bitthal Varshney (25041) Ansh Raj (25081) Uday Raj Verma (25083) Astitwa Rawat (25088)	Dataset: Report: Project Presentation:
9	CreditMap: Exploring Credit Score Patterns through Data Mining	Himanshu Singh (25017) Garvit Kumar (25018) Mayank (25022) Abhishek Kumar Singh(25032)	Dataset: Report: Project Presentation:
10	Movie Recommendation System	Tanya Agrahari (25030) Prakash Mishra (25035) Adarsh Singh (25074) Shivam Verma (25078)	Dataset: Report: Project Presentation:
11	Wine Quality Prediction	Shivam Soni (250xx) ⁠Kashif (250xx) Akash Pathak (250xx) ⁠Priyanshu Sachan (250xx)	Dataset: Report: Project Presentation:

Fall 2024: Data Mining Lab

Contents

Instructions

Guidelines

Lab 0: Getting Started ( week of 05^th & 12^th August 2024 )

Lab 1: ( week of 19^th & 26^th August 2024 )

Lab 2: ( week of 2^nd & 9^th September 2024 )

Lab 3: ( week of 16^th, 23^rd & 30^thSeptember 2024 )

Lab 4: ( week of 7^th October 2024 )

Projects

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Teaching(Spring 2026)

Teaching(2010-till date)

Research

Activities

Tools

Fall 2024: Data Mining Lab

Contents

Instructions

Guidelines

Lab 0: Getting Started ( week of 05th & 12th August 2024 )

Lab 1: ( week of 19th & 26th August 2024 )

Lab 2: ( week of 2nd & 9th September 2024 )

Lab 3: ( week of 16th, 23rd & 30thSeptember 2024 )

Lab 4: ( week of 7th October 2024 )

Projects

Navigation menu

Search

Lab 0: Getting Started ( week of 05^th & 12^th August 2024 )

Lab 1: ( week of 19^th & 26^th August 2024 )

Lab 2: ( week of 2^nd & 9^th September 2024 )

Lab 3: ( week of 16^th, 23^rd & 30^thSeptember 2024 )

Lab 4: ( week of 7^th October 2024 )