Machine Learning 201
Organizer: Doug Chang
Instructors: Dr. Michael Bowles & Dr. Patricia Hoffman
Overview of the Course
Machine Learning 201 and 202 cover topics in greater depth than 101 and 102. Participants in the class should come away able to read the current literature and apply what they read to their own work. Machine Learning 201 and 202 can be taken in any order.
Machine Learning 201 begins with ordinary least squares regression and extends this basic tool in a number of directions. We'll consider various regularization approaches. We'll introduce logistic regression and we'll learn how to code categorical inputs and outputs. We'll look at feature space expansions. These will lead naturally to generalizations of linear regression, known as the "generalized linear model" and the "generalized additive model".
Text: "The Elements of Statistical Learning - Data Mining, Inference, and Prediction" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman
See also Prof Robert Tibshirani's notes for stats 315a: http://www-stat.stanford.edu/~tibs/stat315a.html
Prerequisites
Machine Learning 201 and 202 employ beginner-level probability, calculus and linear algebra (e.g. preruse the appendices in "Introduction to Data Mining" by Tan et. al. or Linear Algebra, and Probability Theory.) If you have taken Machine Learning 101 and 102 classes, you are well prepared for this course, but those are not required to start 201.
Participants should be familiar with R or be willing to pick R up outside of class. We will hand out R-code for most of our examples, but we won't spend time in 201 going through introductory material on R. Come to the first class with R loaded on your computer. http://cran.r-project.org/ For your review, R are here: References for R, Reference for R Comments, More R references. To integrate R with Eclipse click here.
To get the most out of the class, participants will need to work through the homework assignments.
General Sequence of Classes:
Machine Learning 101: Supervised learning
Text: "Introduction to Data Mining", by Pang-Ning Tan, Michael Steinbach and Vipin Kumar
Machine Learning 102: Unsupervised Learning and Fault Detection
Text: "Introduction to Data Mining", by Pang-Ning Tan, Michael Steinbach and Vipin Kumar
Machine Learning 201: Advanced Regression Techniques, Generalized Linear Models, and Generalized Additive Models
Text: "The Elements of Statistical Learning - Data Mining, Inference, and Prediction" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman
Machine Learning 202: Collaborative Filtering, Bayesian Belief Networks, and Advanced Trees
Text: "The Elements of Statistical Learning - Data Mining, Inference, and Prediction" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman
Future Topics
Data Mining Social Networks
Text Mining
Recommender Methods
Big Data
Machine Learning 201 Syllabus:
Week |
Topics |
Homework |
Links |
|
|
|
|
1st Week |
Advanced Regression Topics
|
|
Lecture 1 and 2
|
1/12/2011 |
Ordinary Least Squares - error bounds |
|
|
|
Subset Select, fwd & backward step-wise |
|
|
|
Least Angle Regression - LARS |
|
|
|
Attribute basis change |
|
|
1/13/2011 |
Coefficient shrinkage methods
|
Homework01.pdf |
|
|
L1, L2 coefficient penalties
|
|
|
|
Ridge, lasso and elastic net |
|
|
|
|
|
|
|
|
|
|
2nd Week |
Regression Topics |
|
Lecture 3 and 4 |
1/19/2011 |
Logistic Regression
|
HW #1 Due |
|
1/20/2011 |
Attribute Expansion
|
Homework02.pdf |
|
|
|
|
|
|
|
|
|
3rd Week |
Factor Inputs/Outputs
|
|
NotesWeek3 |
1/26/2011 |
Coding for Factor Inputs
|
HW #2 Due |
|
1/27/2011 |
Error-correcting codes
|
|
|
|
|
|
|
4th Week |
Generalized Linear Models
|
|
NotesWeek4
|
2/2/2011 |
|
HW #3 Due |
|
2/3/2011 |
|
|
|
|
|
|
|
5th Week |
Generalized Additive Models
|
|
|
2/9/2011 |
|
HW #4 Due |
|
2/10/2011 |
|
|
|
|
|
|
|
General Calendar for the Year:
Fall 2010: Machine Learning 101 & Machine Learning 102
Winter 2011: Machine Learning 101 & Machine Learning 201
Early Spring 2011: Machine Learning 102 & Machine Learning 202
We will be using the following text as a reference for the 201 and 202
"The Elements of Statistical Learning - Data Mining, Inference, and Prediction" by Trevor Hastie, Robert Tibshirani, and Jerome Friedman. This is an excellent book. Virtually everyone in the field knows it and uses it as a standard reference. This book is free to look at on line. http://www-stat.stanford.edu/~tibs/ElemStatLearn
There are more Machine Learning References on Patricia's web site http://patriciahoffmanphd.com/
Anyone can read this web site, however only the instructors have permission to edit the site.
If you haven't already filled out the Register for Class form on the meet-up page, please fill out the form now. If you haven't already signed up on the on the meet-up page please do so now.
Comments (0)
You don't have permission to comment on this page.