Difference between revisions of "Fall 2024: Data Mining Lab"

Latest revision as of 21:54, 26 November 2024

1 Instructions
2 Guidelines
3 Lab 0: Getting Started ( week of 05^th & 12^th August 2024 )
4 Lab 1: ( week of 19^th & 26^th August 2024 )
5 Lab 2: ( week of 2^nd & 9^th September 2024 )
6 Lab 3: ( week of 16^th, 23^rd & 30^thSeptember 2024 )
7 Lab 4: ( week of 7^th October 2024 )
8 Projects

Instructions

Please be on time to avoid the Attendance Penalty.
Please sign on the Attendance Register before your take a seat.
Please put your mobile phone in the Silent Mode.
Each lab assignment needs to be submitted in the Google Classroom for evaluation(will be notified in the GC lab-wise, submit before the deadline).
Turn off(shut down) your assigned computer and arrange the chair before you leave the lab.

Guidelines

As per DUCS guidelines DSE: Data Mining

Lab 0: Getting Started ( week of 05^th & 12^th August 2024 )

Q. NO.	Program	Practical No.	Remarks
1	https://www.cse.msu.edu/~ptan/dmbook/tutorials/tutorial1/tutorial1.html	Practice Set No. 1	Introduction to Python
2	https://www.cse.msu.edu/~ptan/dmbook/tutorials/tutorial2/tutorial2.html	Practice Set No. 2	Introduction to Numpy and Pandas
3	https://www.cse.msu.edu/~ptan/dmbook/tutorials/tutorial3/tutorial3.html	Practice Set No. 3	Data Exploration

Lab 1: ( week of 19^th & 26^th August 2024 )

Q. NO.	Program	Practical No.	Remarks
1	Apply data cleaning techniques on any dataset (e.g. Chronic Kidney Disease dataset from UCI repository). Techniques may include handling missing values, outliers and inconsistent values. Also, a set of validation rules may be specified for the particular dataset and validation checks performed.	Practical No. 1	Dataset: kidneyDisease.csv Download from Kaggle: Chronic KIdney Disease dataset Tutorial: Tutorial on Handling Missing values

Lab 2: ( week of 2^nd & 9^th September 2024 )

Q. NO.	Program	Practical No.	Remarks
1	Apply data pre-processing techniques such as standardization/normalization, transformation, aggregation, discretization/binarization, sampling etc. on any dataset	Practical No. 2	Dataset: rain.csv Download from data.gov.in: Rainfall in India

Lab 3: ( week of 16^th, 23^rd & 30^thSeptember 2024 )

Q. NO.	Program	Practical No.	Remarks
1	Writing/Review of Chapter 1, Chapter 3, and Chapter 4 of Project Report	Project Work

Lab 4: ( week of 7^th October 2024 )

Q. NO.	Program	Practical No.	Remarks
1	Apply simple K-means algorithm for clustering any dataset. Compare the performance of clusters by varying the algorithm parameters. For a given set of parameters, plot a line graph depicting MSE obtained after each iteration.	Practical No. 3	Dataset: Mall_Customers.csv Download from data from kaggle: Mall Customer Segmentation Data

Projects

Team No.	Project Title	Team Members	Outcomes/Remarks
1	Understanding the Monsoon Pattern in Eastern Gangatic Plain	Akshary Sharma (25019) Abhay Yadav (25040) Anuj Gupta (25042) Amar Kumar (25065) Kunal Verma (25073)	Dataset: Report: Project Presentation:
2	NIRF Ranking Prediction	Abhishek Prasad (25007) Vishal Kumar (25014) Nitish Kumar (25023) Anshu Kumar Dubey (25036) Sunny Chauhan (25050)	Dataset: Report: Project Presentation:
3	Student Performance Prediction	Himanshu Kumar (25016) Kanan Pal (25072) Khushboo Yadav (25082) Diksha Joshi (25091)	Dataset: Report: Project Presentation:
4	FIFA Prediction	Arihant (25003) Ayush Pundir (25027) Pratyush (25060) Ashish (25066)	Dataset: Report: Project Presentation:
5	Breast Cancer Prediction	Vidhan (25044) Sandeep Kumar Sharma (25047) Ayushman Pandey (25094) Tanishk Panchal (25095)	Dataset: Report: Project Presentation:
6	YouTube spam comments classification	Devesh Chauhan (25011) Shatrughan (25084) Om Ranjan (25085) Aman Sagar (25086)	Dataset: Report: Project Presentation:
7	Olympic Data Analysis and Prediction	Kusum (25002) Aditya Kumar (25012) Divyanshi (25021) Tushar Rana (25064)	Dataset: Report: Project Presentation:
8	Credit Card Fraud Detection	Ritesh Dhawan (25037) Bitthal Varshney (25041) Ansh Raj (25081) Uday Raj Verma (25083) Astitwa Rawat (25088)	Dataset: Report: Project Presentation:
9	CreditMap: Exploring Credit Score Patterns through Data Mining	Himanshu Singh (25017) Garvit Kumar (25018) Mayank (25022) Abhishek Kumar Singh(25032)	Dataset: Report: Project Presentation:
10	Movie Recommendation System	Tanya Agrahari (25030) Prakash Mishra (25035) Adarsh Singh (25074) Shivam Verma (25078)	Dataset: Report: Project Presentation:
11	Wine Quality Prediction	Shivam Soni (250xx) ⁠Kashif (250xx) Akash Pathak (250xx) ⁠Priyanshu Sachan (250xx)	Dataset: Report: Project Presentation:

@@ Line 38: / Line 38: @@
 | style="width: 60%" | Apply data cleaning techniques on any dataset (e.g. Chronic Kidney Disease dataset from UCI repository). Techniques may include handling missing values, outliers and inconsistent values. Also, a set of validation rules may be specified for the particular dataset and validation checks performed.
 | style="width: 15%" |  Practical No. 1
-| '''Dataset:''' [http://mkbhandari.com/mkwiki/data/fall2024/dm/datasets/kidneyDisease.csv '''abc.csv'''] <br>
+| '''Dataset:''' [http://mkbhandari.com/mkwiki/data/fall2024/dm/datasets/kidneyDisease.csv '''kidneyDisease.csv'''] <br>
 '''Download from Kaggle:''' [https://www.kaggle.com/datasets/mansoordaku/ckdisease Chronic KIdney Disease dataset] <br>
 '''Tutorial:''' [https://www.kaggle.com/code/alexisbcook/handling-missing-values#How-many-missing-data-points-do-we-have? Tutorial on Handling Missing values]
 |}
-== '''Lab 2:''' ( week of 2<sup>nd</sup>  9<sup>th</sup>  September 2024 ) ==
+== '''Lab 2:''' ( week of 2<sup>nd</sup> &  9<sup>th</sup>  September 2024 ) ==
 {| class="wikitable" style="text-align: justify; width: 100%";
 |-
@@ Line 54: / Line 54: @@
 | style="width: 60%" | Apply data pre-processing techniques such as standardization/normalization, transformation, aggregation, discretization/binarization, sampling etc. on any dataset
 | style="width: 15%" |  Practical No. 2
-| Dataset:
+| '''Dataset:''' [http://mkbhandari.com/mkwiki/data/fall2024/dm/datasets/rain.csv '''rain.csv'''] <br>
-[http://mkbhandari.com/mkwiki/fall2024/dm/datasets/kidney_disease.csv '''abc.csv''']
+'''Download from data.gov.in:''' [https://www.data.gov.in/catalog/rainfall-india Rainfall in India]
+|}
-Download from Kaggle:
+== '''Lab 3:''' ( week of 16<sup>th</sup>, 23<sup>rd</sup> &  30<sup>th</sup>September 2024 ) ==
-[https://www.kaggle.com/datasets/mansoordaku/ckdisease abc.csv]
+{| class="wikitable" style="text-align: justify; width: 100%";
+|-
+! Q. NO.
+! Program
+! Practical No.
+! Remarks
+|-
+| style="width: 8%"  | 1
+| style="width: 60%" | Writing/Review of Chapter 1, Chapter 3, and Chapter 4 of Project Report
+| style="width: 15%" |  Project Work
+|
+|}
-Tutorial:
+== '''Lab 4:''' ( week of 7<sup>th</sup> October 2024 ) ==
-[https://www.kaggle.com/code/alexisbcook/handling-missing-values#How-many-missing-data-points-do-we-have? Tutorial on Preprocessing Techniques]
+{| class="wikitable" style="text-align: justify; width: 100%";
+|-
+! Q. NO.
+! Program
+! Practical No.
+! Remarks
+|-
+| style="width: 8%"  | 1
+| style="width: 60%" | Apply simple K-means algorithm for clustering any dataset. Compare the performance of clusters by varying the algorithm parameters. For a given set of parameters, plot a line graph depicting MSE obtained after each iteration.
+| style="width: 15%" |  Practical No. 3
+| '''Dataset:''' [http://mkbhandari.com/mkwiki/data/fall2024/dm/datasets/Mall_Customers.csv '''Mall_Customers.csv'''] <br>
+'''Download from data from kaggle:''' [https://www.kaggle.com/datasets/vjchoudhary7/customer-segmentation-tutorial-in-python Mall Customer Segmentation Data]
 |}
@@ Line 73: / Line 96: @@
 |-
 | style="width: 8%"  | 1
-| style="width: 45%" | Any Classification/Clustering Problem
+| style="width: 45%" | Understanding the Monsoon Pattern in Eastern Gangatic Plain
 | style="width: 25%" |
-# 1
+# '''Akshary Sharma (25019)'''
-# 2
+# Abhay Yadav (25040)
-# 3
+# Anuj Gupta (25042)
-# 4
+# Amar Kumar (25065)
+# Kunal Verma (25073)
+|
+* Dataset:
+* Report:
+* Project Presentation:
+|-
+|2|| NIRF Ranking Prediction||
+# '''Abhishek Prasad (25007)'''
+# Vishal Kumar (25014)
+# Nitish Kumar (25023)
+# Anshu Kumar Dubey (25036)
+# Sunny Chauhan (25050)
+|
+* Dataset:
+* Report:
+* Project Presentation:
+|-
+|3|| Student Performance Prediction ||
+# '''Himanshu Kumar (25016)'''
+# Kanan Pal (25072)
+# Khushboo Yadav (25082)
+# Diksha Joshi (25091)
+|
+* Dataset:
+* Report:
+* Project Presentation:
+|-
+|4|| FIFA Prediction ||
+# Arihant (25003)
+# '''Ayush Pundir (25027)'''
+# Pratyush (25060)
+# Ashish (25066)
+|
+* Dataset:
+* Report:
+* Project Presentation:
+|-
+|5|| Breast Cancer Prediction ||
+# Vidhan (25044)
+# '''Sandeep Kumar Sharma (25047)'''
+# Ayushman Pandey (25094)
+# Tanishk Panchal (25095)
+|
+* Dataset:
+* Report:
+* Project Presentation:
+|-
+|6|| YouTube spam comments classification ||
+# Devesh Chauhan (25011)
+# Shatrughan  (25084)
+# Om Ranjan (25085)
+# '''Aman Sagar (25086)'''
+|
+* Dataset:
+* Report:
+* Project Presentation:
+|-
+|7|| Olympic Data Analysis and Prediction ||
+# Kusum (25002)
+# '''Aditya Kumar (25012)'''
+# Divyanshi (25021)
+# Tushar Rana (25064)
+|
+* Dataset:
+* Report:
+* Project Presentation:
+|-
+|8|| Credit Card Fraud Detection ||
+# Ritesh Dhawan (25037)
+# Bitthal Varshney (25041)
+# Ansh Raj (25081)
+# '''Uday Raj Verma (25083)'''
+# Astitwa Rawat (25088)
+|
+* Dataset:
+* Report:
+* Project Presentation:
+|-
+|9|| CreditMap: Exploring Credit Score Patterns through Data Mining ||
+# Himanshu Singh (25017)
+# '''Garvit Kumar (25018)'''
+# Mayank  (25022)
+# Abhishek Kumar Singh(25032)
+|
+* Dataset:
+* Report:
+* Project Presentation:
+|-
+|10|| Movie Recommendation System ||
+# Tanya Agrahari (25030)
+# Prakash Mishra (25035)
+# '''Adarsh Singh (25074)'''
+# Shivam Verma (25078)
 |
-* Dataset: [http://mkbhandari.com/mkwiki/fall2024/dm/datasets/kidney_disease.csv '''kidney_disease.csv''']
+* Dataset:
 * Report:
 * Project Presentation:
 |-
-|2|| Any Classification/Clustering Problem||
+|11|| Wine Quality Prediction ||
-# 1
+# '''Shivam Soni (250xx)'''
-# 2
+# ⁠Kashif (250xx)
-# 3
+# Akash Pathak (250xx)
-# 4
+# ⁠Priyanshu Sachan (250xx)
 |
-* Dataset: [http://mkbhandari.com/mkwiki/fall2024/dm/datasets/kidney_disease.csv '''kidney_disease.csv''']
+* Dataset:
 * Report:
 * Project Presentation:
 |}

Difference between revisions of "Fall 2024: Data Mining Lab"

Latest revision as of 21:54, 26 November 2024

Contents

Instructions

Guidelines

Lab 0: Getting Started ( week of 05^th & 12^th August 2024 )

Lab 1: ( week of 19^th & 26^th August 2024 )

Lab 2: ( week of 2^nd & 9^th September 2024 )

Lab 3: ( week of 16^th, 23^rd & 30^thSeptember 2024 )

Lab 4: ( week of 7^th October 2024 )

Projects

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Teaching(Spring 2026)

Teaching(2010-till date)

Research

Activities

Tools

Difference between revisions of "Fall 2024: Data Mining Lab"

Latest revision as of 21:54, 26 November 2024

Contents

Instructions

Guidelines

Lab 0: Getting Started ( week of 05th & 12th August 2024 )

Lab 1: ( week of 19th & 26th August 2024 )

Lab 2: ( week of 2nd & 9th September 2024 )

Lab 3: ( week of 16th, 23rd & 30thSeptember 2024 )

Lab 4: ( week of 7th October 2024 )

Projects

Navigation menu

Search

Lab 0: Getting Started ( week of 05^th & 12^th August 2024 )

Lab 1: ( week of 19^th & 26^th August 2024 )

Lab 2: ( week of 2^nd & 9^th September 2024 )

Lab 3: ( week of 16^th, 23^rd & 30^thSeptember 2024 )

Lab 4: ( week of 7^th October 2024 )