Loss functions and metrics
Valerii Babushkin
Mar 2023
Agenda
1. Losses
2. Metrics
3. Design Doc Example
1. Losses�
Losses
Losses
���
The loss function, also known as the objective or cost function, effectively defines how a model learns about the world and connections between dependent and independent variables, what it pays most attention to, what it tries to avoid and what it considers acceptable. Thus, the choice of a loss function can drastically affect your model's overall performance, even if everything else—features, target, model architecture, dataset size—remains unchanged.
Losses
Losses
���
We can review a simplified situation with a set of loss functions for regression problems being narrowed down to the two most widely used loss functions: MSE (Mean Squared Error) and MAE (Mean Absolute Error).
Imagine we have a vector of target values Y = [100, 100, 100, 100, 100, 100, 100, 100, 100, 1000] and vector dependent Variable X being equal for all samples.
If we train a model using MSE as a loss function, it will output a vector of predictions Y_hat = [190, 190, 190, 190, 190, 190, 190, 190, 190, 190]
If we train a model using MAE as a loss function, it will output a vector of predictions Y_hat = [100, 100, 100, 100, 100, 100, 100, 100, 100, 100]
�
Losses
Losses
���
When we calculate MSE and MAE for a model with the RMSE loss function, it will result in the following numbers:
MSE = 72,900, MAE = 162. With the mean of residuals being equal to 0 and the median of residuals being equal to 90
When we calculate MSE and MAE for a model with the MAE loss function, the result will be: MSE = 81,000, MAE = 90. With the mean of residuals being equal to -90 and the median of residuals being equal to 0
2. Metrics�
Metrics
Metrics
Metrics
The loss function we optimize and the metric we use to assess our model’s performance can be very different from each other. Recall that the end goal of the demand forecast system for Supermegaretail in the Design Document chapter was to reduce the gap between delivered and sold items, making it as narrow as possible while avoiding an out-of-stock situation.
If we try to visualize the pipeline, it might look as follows:
�
Metrics
Metrics
I’ve had a recent conversation with a friend of mine regarding the evaluation of fraud models. Fraud models usually try to solve binary classification tasks where 0 is non-fraud and 1 is fraud.
No metric is ideal, and it always depends on the final goal. However, when we speak about fraud models, we usually want to maintain a fraud ratio to legit transactions of some level. If we had 10 times more transactions, it would be ok to have 10 times more fraud, but not 20 or 30 times more. In other words, we want to have a probabilistic model.
Another thing is that fraud usually belongs to the class imbalance problem, and that balance is not stable through time. One day the ratio can be 1:100 (outburst of fraudulent transactions), the next day, 1:1000 (an ordinary day) and the day after, 1:10,000 (fraudsters took a vacation).
The most popular set of metrics for this family of models is Precision and Recall, which may not be the best choice.
The problem with Precision is that its calculations take both classes into account:
Precision = TP/(TP + FP)
�
Metrics
Metrics
Imagine we have a model that has a probability of 95% to predict that fraud is fraud (true positive, TP) and 5% to predict that non-fraud is fraud (false positive, FP).
Let’s review three scenarios where P is the number of positive samples, N is the number of negative samples:
As you can see, the class balance affected the metric significantly even when nothing else changed.
�
Metrics
Metrics
Now let’s take a look at Recall (Recall = TP/(TP+FN) = TP/P = True Positive Rate (TPR)) and examine the same three sсenarios:
In this case, the class balance didn’t affect the metric at all.
There is also a metric called Specificity that can replace Precision:
Specificity = TN/N = True Negative Rate(TNR) = 1 - False Positive Rate (FPR)
FPR = FP/N = FP/(FP + TN)
The same three examples will show the following picture:
Metrics
Metrics
3. DD Example
DD Example
https://www.manning.com/meap-program
Coming Soon