Difference between revisions of "Fall 2024: Data Mining Lab"
Jump to navigation
Jump to search
| (30 intermediate revisions by the same user not shown) | |||
| Line 38: | Line 38: | ||
| style="width: 60%" | Apply data cleaning techniques on any dataset (e.g. Chronic Kidney Disease dataset from UCI repository). Techniques may include handling missing values, outliers and inconsistent values. Also, a set of validation rules may be specified for the particular dataset and validation checks performed. | | style="width: 60%" | Apply data cleaning techniques on any dataset (e.g. Chronic Kidney Disease dataset from UCI repository). Techniques may include handling missing values, outliers and inconsistent values. Also, a set of validation rules may be specified for the particular dataset and validation checks performed. | ||
| style="width: 15%" | Practical No. 1 | | style="width: 15%" | Practical No. 1 | ||
| − | | '''Dataset:''' [http://mkbhandari.com/mkwiki/data/fall2024/dm/datasets/kidneyDisease.csv ''' | + | | '''Dataset:''' [http://mkbhandari.com/mkwiki/data/fall2024/dm/datasets/kidneyDisease.csv '''kidneyDisease.csv'''] <br> |
'''Download from Kaggle:''' [https://www.kaggle.com/datasets/mansoordaku/ckdisease Chronic KIdney Disease dataset] <br> | '''Download from Kaggle:''' [https://www.kaggle.com/datasets/mansoordaku/ckdisease Chronic KIdney Disease dataset] <br> | ||
'''Tutorial:''' [https://www.kaggle.com/code/alexisbcook/handling-missing-values#How-many-missing-data-points-do-we-have? Tutorial on Handling Missing values] | '''Tutorial:''' [https://www.kaggle.com/code/alexisbcook/handling-missing-values#How-many-missing-data-points-do-we-have? Tutorial on Handling Missing values] | ||
|} | |} | ||
| − | == '''Lab 2:''' ( week of 2<sup>nd</sup> 9<sup>th</sup> September 2024 ) == | + | == '''Lab 2:''' ( week of 2<sup>nd</sup> & 9<sup>th</sup> September 2024 ) == |
{| class="wikitable" style="text-align: justify; width: 100%"; | {| class="wikitable" style="text-align: justify; width: 100%"; | ||
|- | |- | ||
| Line 54: | Line 54: | ||
| style="width: 60%" | Apply data pre-processing techniques such as standardization/normalization, transformation, aggregation, discretization/binarization, sampling etc. on any dataset | | style="width: 60%" | Apply data pre-processing techniques such as standardization/normalization, transformation, aggregation, discretization/binarization, sampling etc. on any dataset | ||
| style="width: 15%" | Practical No. 2 | | style="width: 15%" | Practical No. 2 | ||
| − | | Dataset: | + | | '''Dataset:''' [http://mkbhandari.com/mkwiki/data/fall2024/dm/datasets/rain.csv '''rain.csv'''] <br> |
| − | [http://mkbhandari.com/mkwiki/fall2024/dm/datasets/ | + | '''Download from data.gov.in:''' [https://www.data.gov.in/catalog/rainfall-india Rainfall in India] |
| + | |} | ||
| − | + | == '''Lab 3:''' ( week of 16<sup>th</sup>, 23<sup>rd</sup> & 30<sup>th</sup>September 2024 ) == | |
| − | + | {| class="wikitable" style="text-align: justify; width: 100%"; | |
| + | |- | ||
| + | ! Q. NO. | ||
| + | ! Program | ||
| + | ! Practical No. | ||
| + | ! Remarks | ||
| + | |- | ||
| + | | style="width: 8%" | 1 | ||
| + | | style="width: 60%" | Writing/Review of Chapter 1, Chapter 3, and Chapter 4 of Project Report | ||
| + | | style="width: 15%" | Project Work | ||
| + | | | ||
| + | |} | ||
| − | + | == '''Lab 4:''' ( week of 7<sup>th</sup> October 2024 ) == | |
| − | [https://www.kaggle.com/ | + | {| class="wikitable" style="text-align: justify; width: 100%"; |
| + | |- | ||
| + | ! Q. NO. | ||
| + | ! Program | ||
| + | ! Practical No. | ||
| + | ! Remarks | ||
| + | |- | ||
| + | | style="width: 8%" | 1 | ||
| + | | style="width: 60%" | Apply simple K-means algorithm for clustering any dataset. Compare the performance of clusters by varying the algorithm parameters. For a given set of parameters, plot a line graph depicting MSE obtained after each iteration. | ||
| + | | style="width: 15%" | Practical No. 3 | ||
| + | | '''Dataset:''' [http://mkbhandari.com/mkwiki/data/fall2024/dm/datasets/Mall_Customers.csv '''Mall_Customers.csv'''] <br> | ||
| + | '''Download from data from kaggle:''' [https://www.kaggle.com/datasets/vjchoudhary7/customer-segmentation-tutorial-in-python Mall Customer Segmentation Data] | ||
|} | |} | ||
| Line 73: | Line 96: | ||
|- | |- | ||
| style="width: 8%" | 1 | | style="width: 8%" | 1 | ||
| − | | style="width: 45%" | | + | | style="width: 45%" | Understanding the Monsoon Pattern in Eastern Gangatic Plain |
| style="width: 25%" | | | style="width: 25%" | | ||
| − | # | + | # '''Akshary Sharma (25019)''' |
| − | # 2 | + | # Abhay Yadav (25040) |
| − | # 3 | + | # Anuj Gupta (25042) |
| − | # 4 | + | # Amar Kumar (25065) |
| + | # Kunal Verma (25073) | ||
| + | | | ||
| + | * Dataset: | ||
| + | * Report: | ||
| + | * Project Presentation: | ||
| + | |- | ||
| + | |2|| NIRF Ranking Prediction|| | ||
| + | # '''Abhishek Prasad (25007)''' | ||
| + | # Vishal Kumar (25014) | ||
| + | # Nitish Kumar (25023) | ||
| + | # Anshu Kumar Dubey (25036) | ||
| + | # Sunny Chauhan (25050) | ||
| + | | | ||
| + | * Dataset: | ||
| + | * Report: | ||
| + | * Project Presentation: | ||
| + | |- | ||
| + | |3|| Student Performance Prediction || | ||
| + | # '''Himanshu Kumar (25016)''' | ||
| + | # Kanan Pal (25072) | ||
| + | # Khushboo Yadav (25082) | ||
| + | # Diksha Joshi (25091) | ||
| + | | | ||
| + | * Dataset: | ||
| + | * Report: | ||
| + | * Project Presentation: | ||
| + | |- | ||
| + | |4|| FIFA Prediction || | ||
| + | # Arihant (25003) | ||
| + | # '''Ayush Pundir (25027)''' | ||
| + | # Pratyush (25060) | ||
| + | # Ashish (25066) | ||
| + | | | ||
| + | * Dataset: | ||
| + | * Report: | ||
| + | * Project Presentation: | ||
| + | |- | ||
| + | |5|| Breast Cancer Prediction || | ||
| + | # Vidhan (25044) | ||
| + | # '''Sandeep Kumar Sharma (25047)''' | ||
| + | # Ayushman Pandey (25094) | ||
| + | # Tanishk Panchal (25095) | ||
| + | | | ||
| + | * Dataset: | ||
| + | * Report: | ||
| + | * Project Presentation: | ||
| + | |- | ||
| + | |6|| YouTube spam comments classification || | ||
| + | # Devesh Chauhan (25011) | ||
| + | # Shatrughan (25084) | ||
| + | # Om Ranjan (25085) | ||
| + | # '''Aman Sagar (25086)''' | ||
| + | | | ||
| + | * Dataset: | ||
| + | * Report: | ||
| + | * Project Presentation: | ||
| + | |- | ||
| + | |7|| Olympic Data Analysis and Prediction || | ||
| + | # Kusum (25002) | ||
| + | # '''Aditya Kumar (25012)''' | ||
| + | # Divyanshi (25021) | ||
| + | # Tushar Rana (25064) | ||
| + | | | ||
| + | * Dataset: | ||
| + | * Report: | ||
| + | * Project Presentation: | ||
| + | |- | ||
| + | |8|| Credit Card Fraud Detection || | ||
| + | # Ritesh Dhawan (25037) | ||
| + | # Bitthal Varshney (25041) | ||
| + | # Ansh Raj (25081) | ||
| + | # '''Uday Raj Verma (25083)''' | ||
| + | # Astitwa Rawat (25088) | ||
| + | | | ||
| + | * Dataset: | ||
| + | * Report: | ||
| + | * Project Presentation: | ||
| + | |- | ||
| + | |9|| CreditMap: Exploring Credit Score Patterns through Data Mining || | ||
| + | # Himanshu Singh (25017) | ||
| + | # '''Garvit Kumar (25018)''' | ||
| + | # Mayank (25022) | ||
| + | # Abhishek Kumar Singh(25032) | ||
| + | | | ||
| + | * Dataset: | ||
| + | * Report: | ||
| + | * Project Presentation: | ||
| + | |- | ||
| + | |10|| Movie Recommendation System || | ||
| + | # Tanya Agrahari (25030) | ||
| + | # Prakash Mishra (25035) | ||
| + | # '''Adarsh Singh (25074)''' | ||
| + | # Shivam Verma (25078) | ||
| | | | ||
| − | * Dataset: | + | * Dataset: |
* Report: | * Report: | ||
* Project Presentation: | * Project Presentation: | ||
|- | |- | ||
| − | | | + | |11|| Wine Quality Prediction || |
| − | # | + | # '''Shivam Soni (250xx)''' |
| − | # | + | # Kashif (250xx) |
| − | # | + | # Akash Pathak (250xx) |
| − | # | + | # Priyanshu Sachan (250xx) |
| | | | ||
| − | * Dataset: | + | * Dataset: |
* Report: | * Report: | ||
* Project Presentation: | * Project Presentation: | ||
|} | |} | ||
Latest revision as of 21:54, 26 November 2024
Contents
Instructions
- Please be on time to avoid the Attendance Penalty.
- Please sign on the Attendance Register before your take a seat.
- Please put your mobile phone in the Silent Mode.
- Each lab assignment needs to be submitted in the Google Classroom for evaluation(will be notified in the GC lab-wise, submit before the deadline).
- Turn off(shut down) your assigned computer and arrange the chair before you leave the lab.
Guidelines
- As per DUCS guidelines DSE: Data Mining
Lab 0: Getting Started ( week of 05th & 12th August 2024 )
| Q. NO. | Program | Practical No. | Remarks |
|---|---|---|---|
| 1 | https://www.cse.msu.edu/~ptan/dmbook/tutorials/tutorial1/tutorial1.html | Practice Set No. 1 | Introduction to Python |
| 2 | https://www.cse.msu.edu/~ptan/dmbook/tutorials/tutorial2/tutorial2.html | Practice Set No. 2 | Introduction to Numpy and Pandas |
| 3 | https://www.cse.msu.edu/~ptan/dmbook/tutorials/tutorial3/tutorial3.html | Practice Set No. 3 | Data Exploration |
Lab 1: ( week of 19th & 26th August 2024 )
| Q. NO. | Program | Practical No. | Remarks |
|---|---|---|---|
| 1 | Apply data cleaning techniques on any dataset (e.g. Chronic Kidney Disease dataset from UCI repository). Techniques may include handling missing values, outliers and inconsistent values. Also, a set of validation rules may be specified for the particular dataset and validation checks performed. | Practical No. 1 | Dataset: kidneyDisease.csv Download from Kaggle: Chronic KIdney Disease dataset |
Lab 2: ( week of 2nd & 9th September 2024 )
| Q. NO. | Program | Practical No. | Remarks |
|---|---|---|---|
| 1 | Apply data pre-processing techniques such as standardization/normalization, transformation, aggregation, discretization/binarization, sampling etc. on any dataset | Practical No. 2 | Dataset: rain.csv Download from data.gov.in: Rainfall in India |
Lab 3: ( week of 16th, 23rd & 30thSeptember 2024 )
| Q. NO. | Program | Practical No. | Remarks |
|---|---|---|---|
| 1 | Writing/Review of Chapter 1, Chapter 3, and Chapter 4 of Project Report | Project Work |
Lab 4: ( week of 7th October 2024 )
| Q. NO. | Program | Practical No. | Remarks |
|---|---|---|---|
| 1 | Apply simple K-means algorithm for clustering any dataset. Compare the performance of clusters by varying the algorithm parameters. For a given set of parameters, plot a line graph depicting MSE obtained after each iteration. | Practical No. 3 | Dataset: Mall_Customers.csv Download from data from kaggle: Mall Customer Segmentation Data |
Projects
| Team No. | Project Title | Team Members | Outcomes/Remarks |
|---|---|---|---|
| 1 | Understanding the Monsoon Pattern in Eastern Gangatic Plain |
|
|
| 2 | NIRF Ranking Prediction |
|
|
| 3 | Student Performance Prediction |
|
|
| 4 | FIFA Prediction |
|
|
| 5 | Breast Cancer Prediction |
|
|
| 6 | YouTube spam comments classification |
|
|
| 7 | Olympic Data Analysis and Prediction |
|
|
| 8 | Credit Card Fraud Detection |
|
|
| 9 | CreditMap: Exploring Credit Score Patterns through Data Mining |
|
|
| 10 | Movie Recommendation System |
|
|
| 11 | Wine Quality Prediction |
|
|