Homework5


Homework#5

 

Follow-on to the analysis that Charles showed us last night. 

 

Charles analysis showed that there was a strong correlation between current fraction of air conditioner usage and fraction usage 24 hours ago.  It also suggested some ways to answer the question Joe brought us with the data - "what variables have the most significant effect on the fraction of time that air conditioners are on?"

 

Formulate the problem as follows.  Use fractional air conditioner usage as the target variable that we're trying to predict.  Include in the attribute set, the past 24 hours of fractional usage and the past 24 hours of the other variables in the data set (temperature, humidity, etc.).  This will result in a long attribute list (24 past usage values + 24 past temperatures + 24 past humidities, etc.)

 

Use glmnet to regress current fractional usage on the past values of these variables.  Survey both the alpha variable (balance between ridge and L1 penalty) and the lambda variable (weight on penalty) in order to see what values minimise cross-validation test error.  Pick the winning model and print out weights in order to see what variables are most significant in predicting air-conditioner usage.