Machine Learning 201
Organizer: Doug Chang
Instructors: Dr. Michael Bowles & Dr. Patricia Hoffman
Overview of the Course
Machine Learning 201 and 202 cover topics in greater depth than 101 and 102. Participants in the class should come away able to read the current literature and apply what they read to their own work. Machine Learning 201 and 202 can be taken in any order.
Machine Learning 201 begins with ordinary least squares regression and extends this basic tool in a number of directions. We'll consider various regularization approaches. We'll introduce logistic regression and we'll learn how to code categorical inputs and outputs. We'll look at feature space expansions. These will lead naturally to generalizations of linear regression, known as the "generalized linear model" and the "generalized additive model".
Text: "The Elements of Statistical Learning  Data Mining, Inference, and Prediction" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman
See also Prof Robert Tibshirani's notes for stats 315a: http://wwwstat.stanford.edu/~tibs/stat315a.html
Prerequisites
Machine Learning 201 and 202 employ beginnerlevel probability, calculus and linear algebra (e.g. preruse the appendices in "Introduction to Data Mining" by Tan et. al. or Linear Algebra, and Probability Theory.) If you have taken Machine Learning 101 and 102 classes, you are well prepared for this course, but those are not required to start 201.
Participants should be familiar with R or be willing to pick R up outside of class. We will hand out Rcode for most of our examples, but we won't spend time in 201 going through introductory material on R. Come to the first class with R loaded on your computer. http://cran.rproject.org/ For your review, R are here: References for R, Reference for R Comments, More R references. To integrate R with Eclipse click here.
To get the most out of the class, participants will need to work through the homework assignments.
General Sequence of Classes:
Machine Learning 101: Supervised learning
Text: "Introduction to Data Mining", by PangNing Tan, Michael Steinbach and Vipin Kumar
Machine Learning 102: Unsupervised Learning and Fault Detection
Text: "Introduction to Data Mining", by PangNing Tan, Michael Steinbach and Vipin Kumar
Machine Learning 201: Advanced Regression Techniques, Generalized Linear Models, and Generalized Additive Models
Text: "The Elements of Statistical Learning  Data Mining, Inference, and Prediction" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman
Machine Learning 202: Collaborative Filtering, Bayesian Belief Networks, and Advanced Trees
Text: "The Elements of Statistical Learning  Data Mining, Inference, and Prediction" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman
Future Topics
Data Mining Social Networks
Text Mining
Recommender Methods
Big Data
Machine Learning 201 Syllabus:
Week 
Topics 
Homework 
Links 




1st Week 
Advanced Regression Topics


Lecture 1 and 2

1/12/2011 
Ordinary Least Squares  error bounds 



Subset Select, fwd & backward stepwise 



Least Angle Regression  LARS 



Attribute basis change 


1/13/2011 
Coefficient shrinkage methods

Homework01.pdf 


L1, L2 coefficient penalties




Ridge, lasso and elastic net 










2nd Week 
Regression Topics 

Lecture 3 and 4 
1/19/2011 
Logistic Regression

HW #1 Due 

1/20/2011 
Attribute Expansion

Homework02.pdf 









3rd Week 
Factor Inputs/Outputs


NotesWeek3 
1/26/2011 
Coding for Factor Inputs

HW #2 Due 

1/27/2011 
Errorcorrecting codes

Homework03.pdf






4th Week 
Generalized Linear Models


NotesWeek4

2/2/2011 

HW #3 Due 

2/3/2011 

Homework04.pdf 





5th Week 
Generalized Additive Models



2/9/2011 

HW #4 Due 

2/10/2011 







General Calendar for the Year:
Fall 2010: Machine Learning 101 & Machine Learning 102
Winter 2011: Machine Learning 101 & Machine Learning 201
Early Spring 2011: Machine Learning 102 & Machine Learning 202
We will be using the following text as a reference for the 201 and 202
"The Elements of Statistical Learning  Data Mining, Inference, and Prediction" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. This is an excellent book. Virtually everyone in the field knows it and uses it as a standard reference. This book is free to look at on line. http://wwwstat.stanford.edu/~tibs/ElemStatLearn
There are more Machine Learning References on Patricia's web site http://patriciahoffmanphd.com/
Anyone can read this web site, however only the instructors have permission to edit the site.
If you haven't already filled out the Register for Class form on the meetup page, please fill out the form now. If you haven't already signed up on the on the meetup page please do so now.
Comments (0)
You don't have permission to comment on this page.