Machine Learning

### INTRO TO MACHINE LEARNING ###

Introductory Machine Learning draft, based on Andrew Ng’s materials, but is revamped as a time-saver much shorter and concise version. I also dive deep into the details of mechanical steps in the implementation of the learning algorithms, such that first-timers may understand the nuts and bolts better.
What’s covered is coded in Python (not included in the tutorial). Code updating done.

Download Tutorial: Intro_ML

Download Code (updating): https://github.com/suwangcompling/datascience

Content:

– Supervised Learning: Ch1 Linear Regression; Ch2 Logistic Regression; Ch3 Neural Network; Ch5 Support Vector Machine (SVM)

– Unsupervised Learning: Ch6 Clustering – K-means & Principal Component Analysis (PCA); Ch7 Anomaly/Outlier Detection

– Model Evaluation: Ch4 Learning Algorithm Evaluation

### INTRO TO STATISTICAL LEARNING ###

Based on Professor Trevor Hastie’s material.
Going into details of the most fundamental models in statistical/machine learning:  Notes on Statistical Learning

Contents

1 Fundamentals of Statistical Learning 4

1.1 Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2 Methods and Evaluation . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Special Topic: Bayes Classier . . . . . . . . . . . . . . . . . . . 6

1.4 Lab Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Linear Regression 8

2.1 Simple LR: Univariate . . . . . . . . . . . . . . . . . . . . . . . . 8

2.2 Multiple LR: Multivariate . . . . . . . . . . . . . . . . . . . . . . 11

2.3 Interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4 Non-linear Fit / Polynomial Regression . . . . . . . . . . . . . . 15

2.5 Issues in Linear Regression . . . . . . . . . . . . . . . . . . . . . 16

2.6 K-Nearest Neighbor Regression: a Nonparametric Model . . . . . 18

2.7 Lab Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Classication 22

3.1 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2 Linear Discriminant Analysis . . . . . . . . . . . . . . . . . . . . 24

3.3 Lab Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Resampling 31

4.1 Cross-Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.2 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.3 Lab Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5 Linear Model Selection & Regularization 37

5.1 Subset Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5.2 Shrinkage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5.3 Dimension Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.4 High Dimensional Data . . . . . . . . . . . . . . . . . . . . . . . 47

5.5 Lab Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6 Preliminaries on Nonlinear Methods 53

6.1 Polynomial Regression . . . . . . . . . . . . . . . . . . . . . . . . 53

6.2 Step Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

6.3 Basis Functions: A Generalization . . . . . . . . . . . . . . . . . 55

6.4 Regression Splines . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.4.1 Piecewise Polynomials & Basics of Splines . . . . . . . . . 55

6.4.2 Tuning Spline . . . . . . . . . . . . . . . . . . . . . . . . . 57

6.4.3 Spline vs. Polynomial Regression . . . . . . . . . . . . . . 57

6.4.4 Smoothing Splines . . . . . . . . . . . . . . . . . . . . . . 57

6.5 Local Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.6 Generalized Additive Model . . . . . . . . . . . . . . . . . . . . . 60

6.7 Lab Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60