THE MACHINE LEARNING LANDSCAPE
What Is Machine Learning?
—Arthur Samuel, 1959
—Tom Mitchell, 1997
Cont…
Why Use Machine Learning?
1. First you would look at what spam typically looks like. You might notice that some words or phrases (such as “4U,” “credit card,” “free,” and “amazing”) tend to come up a lot in the subject. Perhaps you would also notice a few other patterns in the sender’s name, the email’s body, and so on.
2. You would write a detection algorithm for each of the patterns that you noticed, and your program would flag emails as spam if a number of these patterns are detected.
3. You would test your program, and repeat steps 1 and 2 until it is good enough.
Cont…
Cont…
Cont…
Cont…
Cont…
Types of Machine Learning Systems
Supervised/Unsupervised Learning
Cont…
Cont…
Cont…
Unsupervised learning
Cont…
Cont…
Cont…
Cont…
Cont…
Cont…
Semisupervised learning
Cont…
Reinforcement Learning
Cont…
Batch and Online Learning
Cont…
Cont…
online learning systems is
how fast they should adapt
to changing data: this is
called the learning rate.
Instance-Based Versus Model-Based Learning
Cont…
Model-based learning
Cont…
Cont…
Cont…
Cont..
Cont…
• You studied the data.
• You selected a model.
• You trained it on the training data (i.e., the learning algorithm searched for the model parameter values that minimize a cost function).
• Finally, you applied the model to make predictions on new cases (this is called inference), hoping that this model will generalize well.
Main Challenges of Machine Learning
Insufficient Quantity of Training Data
Cont…
Non representative Training Data
Cont…
• First, to obtain the addresses to send the polls to, the Literary Digest used telephone directories, lists of magazine subscribers, club membership lists, and the like. All of these lists tend to favor wealthier people, who are more likely to vote Republican (hence Landon).
• Second, less than 25% of the people who received the poll answered. Again, this introduces a sampling bias, by ruling out people who don’t care much about politics, people who don’t like the Literary Digest, and other key groups. This is a special type of sampling bias called nonresponse bias.
Poor-Quality Data
Irrelevant Features
Overfitting the Training Data
Underfitting the Training Data
• Selecting a more powerful model, with more parameters
• Feeding better features to the learning algorithm (feature engineering)
• Reducing the constraints on the model (e.g., reducing the regularization hyperparameter)
Stepping Back
• There are many different types of ML systems: supervised or not, batch or online, instance-based or model-based, and so on.
• In a ML project you gather data in a training set, and you feed the training set to a learning algorithm. If the algorithm is model-based it tunes some parameters to fit the model to the training set (i.e., to make good predictions on the training set itself), and then hopefully it will be able to make good predictions on new cases as well. If the algorithm is instance-based, it just learns the examples by heart and generalizes to new instances by comparing them to the learned instances using a similarity measure.
• The system will not perform well if your training set is too small, or if the data is not representative, noisy, or polluted with irrelevant features (garbage in, garbage out). Lastly, your model needs to be neither too simple (in which case it will underfit) nor too complex (in which case it will overfit).
Testing and Validating
Data Mismatch