1 of 16

Loss functions and metrics

Valerii Babushkin

Mar 2023

2 of 16

Agenda

1. Losses

2. Metrics

3. Design Doc Example

3 of 16

1. Losses

4 of 16

Losses

Losses

���

The loss function, also known as the objective or cost function, effectively defines how a model learns about the world and connections between dependent and independent variables, what it pays most attention to, what it tries to avoid and what it considers acceptable. Thus, the choice of a loss function can drastically affect your model's overall performance, even if everything else—features, target, model architecture, dataset size—remains unchanged.

  1. Be globally continuous (changes in predictions lead to changes in losses).
  2. Be differentiable (its gradient can be calculated).

5 of 16

Losses

Losses

���

We can review a simplified situation with a set of loss functions for regression problems being narrowed down to the two most widely used loss functions: MSE (Mean Squared Error) and MAE (Mean Absolute Error).

Imagine we have a vector of target values Y = [100, 100, 100, 100, 100, 100, 100, 100, 100, 1000] and vector dependent Variable X being equal for all samples.

If we train a model using MSE as a loss function, it will output a vector of predictions Y_hat = [190, 190, 190, 190, 190, 190, 190, 190, 190, 190]

If we train a model using MAE as a loss function, it will output a vector of predictions Y_hat = [100, 100, 100, 100, 100, 100, 100, 100, 100, 100]

6 of 16

Losses

Losses

���

When we calculate MSE and MAE for a model with the RMSE loss function, it will result in the following numbers:

MSE = 72,900, MAE = 162. With the mean of residuals being equal to 0 and the median of residuals being equal to 90

When we calculate MSE and MAE for a model with the MAE loss function, the result will be: MSE = 81,000, MAE = 90. With the mean of residuals being equal to -90 and the median of residuals being equal to 0

7 of 16

2. Metrics

8 of 16

Metrics

9 of 16

Metrics

Metrics

The loss function we optimize and the metric we use to assess our model’s performance can be very different from each other. Recall that the end goal of the demand forecast system for Supermegaretail in the Design Document chapter was to reduce the gap between delivered and sold items, making it as narrow as possible while avoiding an out-of-stock situation.

If we try to visualize the pipeline, it might look as follows:

10 of 16

Metrics

Metrics

I’ve had a recent conversation with a friend of mine regarding the evaluation of fraud models. Fraud models usually try to solve binary classification tasks where 0 is non-fraud and 1 is fraud.

No metric is ideal, and it always depends on the final goal. However, when we speak about fraud models, we usually want to maintain a fraud ratio to legit transactions of some level. If we had 10 times more transactions, it would be ok to have 10 times more fraud, but not 20 or 30 times more. In other words, we want to have a probabilistic model.

Another thing is that fraud usually belongs to the class imbalance problem, and that balance is not stable through time. One day the ratio can be 1:100 (outburst of fraudulent transactions), the next day, 1:1000 (an ordinary day) and the day after, 1:10,000 (fraudsters took a vacation). 

The most popular set of metrics for this family of models is Precision and Recall, which may not be the best choice.

The problem with Precision is that its calculations take both classes into account: 

Precision = TP/(TP + FP)

11 of 16

Metrics

Metrics

Imagine we have a model that has a probability of 95% to predict that fraud is fraud (true positive, TP) and 5% to predict that non-fraud is fraud (false positive, FP).

Let’s review three scenarios where P is the number of positive samples, N is the number of negative samples:

  1. P = 10,000, N = 10,000, Precision = 0.95*10,000/(0.95*10,000 + 0.05* 10,000) = 0.95
  2. P = 100,000, N = 10,000, Precision = 0.95*100,000/(0.95*100,000 + 0.05* 10,000) = 0.99947
  3. P = 1000, N = 10,000, Precision = 0.95*1000/(0.95*1000 + 0.05* 10,000) = 0.65

As you can see, the class balance affected the metric significantly even when nothing else changed.

12 of 16

Metrics

Metrics

Now let’s take a look at Recall (Recall = TP/(TP+FN) = TP/P = True Positive Rate (TPR)) and examine the same three sсenarios:

  1. P = 10,000, N = 10,000, Recall = 0.95*10,000/(10,000) = 0.95
  2. P = 100,000, N = 10,000, Recall = 0.95*100,000/(100,000) = 0.95
  3. P = 1000, N = 10,000, Recall = 0.95*1000/(1000) = 0.95

In this case, the class balance didn’t affect the metric at all.

There is also a metric called Specificity that can replace Precision:

Specificity = TN/N = True Negative Rate(TNR) = 1 - False Positive Rate (FPR)

FPR = FP/N = FP/(FP + TN)

The same three examples will show the following picture:

  1. P = 10,000, N = 10,000, Specificity = 0.95*10,000/(10,000) = 0.95
  2. P = 100,000, N = 10,000, Specificity = 0.95*10,000/(10,000) = 0.95
  3. P = 1000, N = 10,000, Specificity = 0.95*10,000/(10,000) = 0.95

13 of 16

Metrics

Metrics

14 of 16

3. DD Example

15 of 16

DD Example

16 of 16