Lecture 1

TODO

Put readings on todo list can be found on class webpage Put homeworks on todo list Midterm 1 Monday march 16 in class Discussion sections Monday-Wednesday

Core material

Find patterns in data and use them to make predictions
Models + stats help us understand patterns
Use optimization algorithms to learn the patterns

Classification

Simplest Case: you have two choices, given data, make a prediction
Knn best when few outliers

Classifying Numbers

Turn points into grid of 0s and 1s based on the color of the grid
Turn grid into a vector by flattening the vector
Create hyperplane in the n-dimensional space to group things

Testing and Validation

Train a classifier - it learns to distinguish 7 from not 7
Test the classifier on NEW images
There are two types of error
- Training set error: Fraction of training images not classified correctly
- Test set error: Fraction of misclassified NEW images, not seen during training
Outliers: Points whose labels are atypical (e.g solvent borrowers who defaulted anyway)
Overfitting: When the test error deteriorates because the classifier becomes too sensitive to outliers
Hyperparameters: Most ML algorithms have a few hyperparameters that control over/underfitting. eg k in k-nearest neighbors

Select classifiers by validation

Validation Set: Hold back a subset of the labeled data
Train the classifier multiple times with different hyperparameter settings
Choose setting(hyperparameter + learning algorithm) that works best on validation set

Now, we have 3 sets:

Training set: Used to learn model weights
Validation set: Used to tune hyperparameters, choose among different models
Test set: Used as FINAL evaluation. Test set kept in vault, ran once, at the very end

Kaggle.com

Runs ML competitions, including our HWs
We use 2 data sets:
- public set labels available during the competition
- private set labels known only to Kaggles

Techniques of Machine Learning Taught in Class

Supervised learning

Classification: Is this email spam?
Regression: How likely does this patient have cancer?

Unsupervised learning

Clustering: which DNA sequences are similar to each other?
Dimensionality Reduction: What are common features of faces?

Questions

Published Invalid Date

Just a kid looking to make itAlbert Su on Twitter