Sample Syllabus
Catalog Description: The course introduces fundamental and practical tools, techniques, and algorithms for Knowledge Discovery and Data Mining (KD&DM). It provides a balanced approach between methods and practices. On the methodological side, it covers the key techniques for transforming data into business intelligence including: Data Preprocessing, Data Quality, K-Nearest Neighborhood Algorithm, Machine Learning (ML) and Decision Trees (DT: C4.5, and CART), Artificial Neural Networks (ANN), Clustering, and Algorithm Evaluation Techniques. On the practical side, a number of case studies from the real world applications are analyzed and discussed to illustrate the practical significance of the various techniques and reinforce learning. A current list of KD&DM software products and algorithms are also introduced. Prerequisite: Knowledge of probability.
Textbook(s)
Required:
1. Discovering Knowledge in Data: An introduction to Data Mining, Daniel T. Larose, John Wiley, 2005
2. Lecture Notes and Handouts
Recommended:
Week-By-Week
There will be weekly exercises and bi-weekly projects/case studies.
Week
|
Topics Covered
|
Reading
|
Assignments
|
1
|
-
What is Data Mining & Knowledge Discover?
-
The Six Phases of Data Mining
|
Ch. 1
|
|
2
|
Five Business and Operations Applications
|
handout
|
|
3
|
-
Data Cleaning
-
Handling Missing Data
-
Identifying Misclassifications
|
Ch. 2
|
case study 1
|
4
|
1. Graphical Methods for Outliers
2. Data Transformation: Min-Max Normalization; Z-Score Standardization
|
Handout
|
|
5
|
-
Supervised and Unsupervised Learning
-
Methodology for Supervised Learning
-
k-Nearest Neighbor Algorithm
-
Distance Function
-
Database Considerations
|
Ch. 5, handout.
|
case study 2
|
6
|
-
k-Nearest Neighbor Algorithm for estimation and prediction
-
Choosing k
-
Case Study
|
Ch 5
|
|
7
|
-
C4.5 Algorithm
-
Classifications and Regression Trees (CART) Algorithm
|
Ch. 6
|
case study 3
|
8
|
-
Decision Rules
-
Comparison of the C4.5 and CART Algorithms Applied to Real Data
-
Case Studies
|
Ch. 6
|
|
9
|
-
Human Braine
-
Input and Output
-
Neural Network for Estimation and prediction
-
Summation Function
-
Sigmoid Activation Function
|
Ch. 7
|
case study 4
|
10
|
-
Back-Propagation Algorithm
-
Terminating Criteria
-
Learning Rate
-
Applications of ANN
-
Case Study
|
Ch. 7
|
|
11
|
-
Clustering Task
-
Hierarchical Clustering Methods
-
k-Means Clustering
|
Ch. 8
|
case study 5
|
12
|
-
Applications of k-Means Clustering
-
Applications of k-Means Clustering Using SAS Enterprise Miner
-
Case Study
|
Ch. 8
|
|
13
|
Model Evaluation Techniques
|
Handout
|
case study 6
|
14
|
Projects and Papers Presentations
An End-to-End Knowledge Discovery and Data Mining Project developed and executed during the semester by each students using a real world data set. The result is documented as a research project and presented at the class.
|
|
|