Classification
Asrul Abdullah
Adapted from Hands on Machine Learning with Scikit-learn, Keras and Tensorflow – Aurélien Géron
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
MNIST
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Code MNIST
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Code
>>> X, y = mnist["data"], mnist["target"]�>>> X.shape�(70000, 784)�>>> y.shape�(70000,) �
There are 70,000 images, and each image has 784 features. This is because each image is 28×28 pixels�
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Code
import matplotlib as mpl�import matplotlib.pyplot as plt�some_digit = X[0]�some_digit_image = some_digit.reshape(28, 28)�plt.imshow(some_digit_image, cmap = mpl.cm.binary, interpolation="nearest")�plt.axis("off")�plt.show() �
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Display MNIST
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Code
X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:] �
The MNIST dataset is actually already split into a training set (the first 60,000 images) and a test set (the last 10,000 images)
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Binary Classifier
y_train_5 = (y_train == 5) # True for all 5s, False for all other digits.�y_test_5 = (y_test == 5) �
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
SGD
Stochastic Gradient Descent (SGD) classifier, using Scikit-Learn’s SGDClassifier class.
This classifier has the advantage of being capable of handling very large datasets efficiently �
from sklearn.linear_model import SGDClassifier�sgd_clf = SGDClassifier(random_state=42)�sgd_clf.fit(X_train, y_train_5) �
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Predict
>>> sgd_clf.predict([some_digit])�array([ True]) �
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Performance Measures
Evaluating a classifier is often significantly trickier than evaluating a regressor
Measuring Accuracy Using Cross-Validation�
Cross-validation is a robust machine learning technique for evaluating model performance by partitioning data into subsets (folds), training on some, and testing on others iteratively. It prevents overfitting and ensures better generalization on unseen data compared to a single train-test split.
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Implementing Cross Validation
from sklearn.model_selection import StratifiedKFold�from sklearn.base import clone�skfolds = StratifiedKFold(n_splits=3, random_state=42)�for train_index, test_index in skfolds.split(X_train, y_train_5):�clone_clf = clone(sgd_clf)�X_train_folds = X_train[train_index]�y_train_folds = y_train_5[train_index]�X_test_fold = X_train[test_index]�y_test_fold = y_train_5[test_index]�clone_clf.fit(X_train_folds, y_train_folds)�y_pred = clone_clf.predict(X_test_fold)�n_correct = sum(y_pred == y_test_fold)�print(n_correct / len(y_pred)) # prints 0.9502, 0.96565 and 0.96495
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Cross Validation
>>> from sklearn.model_selection import cross_val_score�>>> cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring="accuracy")�array([0.96355, 0.93795, 0.95615]) �
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
BaseEstimator
from sklearn.base import BaseEstimator�class Never5Classifier(BaseEstimator):�def fit(self, X, y=None):�pass�def predict(self, X):�return np.zeros((len(X), 1), dtype=bool) �
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Testing Accuracy
>>> never_5_clf = Never5Classifier()�>>> cross_val_score(never_5_clf, X_train, y_train_5, cv=3, scoring="accuracy")�array([0.91125, 0.90855, 0.90915]) �
This demonstrates why accuracy is generally not the preferred performance measure for classifiers, especially when you are dealing with skewed datasets (i.e when some classes are much more frequent than others)
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Confusion Matrix
A much better way to evaluate the performance of a classifier is to look at the confusion matrix �
from sklearn.model_selection import cross_val_predict�y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3) �
>>> from sklearn.metrics import confusion_matrix�>>> confusion_matrix(y_train_5, y_train_pred)�array([[53057, 1522],� [ 1325, 4096]]) �
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Confusion Matrix
A confusion matrix is a table that is often used to describe the performance of a classification model (or "classifier") on a set of test data
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Confusion Matrix
| Actual: Positive | Actual: Negative |
Predicted: �Positive | tp | fp |
Predicted: Negative | fn | tn |
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Precision and Recall in Text Retrieval
| Relevant | Nonrelevant |
Retrieved | tp | fp |
Not Retrieved | fn | tn |
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Accuracy
Overall, how often is the classifier correct?
��
| Positive | Negative |
Predicted Positive | 1 | 1 |
Predicted Negative | 8 | 90 |
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
F Measure (F1/Harmonic Mean)
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
ROC Curve
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
ROC Curve
This is an ideal situation. Model has an ideal measure of separability. It is perfectly able to distinguish between positive class and negative class.
This is the worst situation. When AUC is approximately 0.5, model has no discrimination capacity to distinguish between positive class and negative class. Random predictions.
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Multiple ROC Curves
Comparison of multiple classifiers is usually straight-forward especially when no curves cross each other. Curves close to the perfect ROC curve have a better performance level than the ones closes to the baseline.
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Precision/Recall Tradeoff
a dilemma in machine learning where increasing the accuracy of positive predictions (precision) tends to decrease the model's ability to find all positive cases (recall), and vice versa.
This tradeoff arises when setting the classification threshold:
Increased Precision (Decreased Recall): As the threshold is raised, the model becomes more stringent. The model only predicts positive if it is very confident (reducing false positives), but it risks missing less obvious positives (increasing false negatives).
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Precision/Recall Tradeoff
Increased Recall (Decreasing Precision): As the threshold is lowered, the model becomes more stringent. The model detects almost all positives (decreasing false negatives), but it also flags more negatives as positives (increasing false positives).
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
Case Example
Spam Detection (Precision Priority):
We want all emails that land in the spam folder to be spam. We don't want important emails to end up there.
Tradeoff: Some spam emails might make it through to the inbox (low recall).
Cancer Detection (Recall Priority)
We want all cancer patients detected. We don't want to miss any sick patients.
Tradeoff: Some healthy patients might be misdiagnosed (low precision), but they will be re-examined by a doctor.
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas
How to Handle it ?
Since it is difficult to maximize both, a metric is used to find a balance, namely the F1-Score, which is the harmonic mean of precision and recall.
Universitas Muhammadiyah Pontianak
www.asrulabdullah.my.id
inovasi, kolaborasi & integritas