Published using Google Docs
PDS 2012 Homework #5
Updated automatically every 5 minutes

Practical Data Science - Fall 2012

Homework #5 -- Predictive Modeling

Due Wed. Nov. 28, 2012 at 5pm.

In this homework you will build a predictive model for targeting offers to consumers, and conduct some model performance analytics on the result.

The data are from a targeted mailing campaign for a non-profit charity, soliciting donations from its database of potential donors. You will find historical data here, in csv format with a “header” row specifying the features and the target variable “class” (indicating whether or not the donor donated in this campaign).  

You will build tree and logistic regression models from these data, and evaluate them based on the principles discussed in class and in the book.  Note that you may have to preprocess the data so that it is in the proper format for the modeling techniques.

http://scikit-learn.github.com/scikit-learn-tutorial/general_concepts.html

Your analysis will address the following general questions.

1) How do the generalization performances of tree models and logistic regression models compare on this problem?  Compute error rate, lift at 5% targeted, lift at 10% targeted.

2) Compute the same statistics, except build your models using training sets that have equal numbers of positives and negatives.  What (if any) major differences do you see in your results.

3) For the models learned in part 2, use matplotlib to plot cumulative response curves for your two models.  Ideally, these are plotted on the same graph for comparison.

4) Perform the previous steps with one additional supervised model of your choice. How does this model compare to decision trees and logistic regression?