1 of 34

The Inventory Day

or how not to win the MELI Data Challenge 2021

Seminario virtual a cargo del Lic. Pedro Pury

Organizado por el grupo de investigación sobre Análisis y Procesamiento de Grandes Redes Sociales y Semánticas

Noviembre 30, 2022

2 of 34

pedro.pury@unc.edu.ar 22/11/30

Summary

  1. The Challenge
  2. The Data
  3. The Metric
  4. The Baseline
  5. The Benchmark
  6. Gaussian Model (ML)
  7. Stochastic Model
  8. Evaluation
  9. Concluding Remarks

3 of 34

pedro.pury@unc.edu.ar 22/11/30

0. Disclaimer

  • I do not work nor did I work at Mercado Libre
  • I didn't participate at the MeLi Data Challenge 2021
  • All the data used in this talk was made public on the official website of the event
  • This is not a talk about ML but I will make some comments on ML.
  • Opinions are mine alone and do not necessarily represent my institution.
  • I'm presenting a work in progress.

  • This talk is just another episode of the human-machine war...

4 of 34

pedro.pury@unc.edu.ar 22/11/30

1. The Challenge

  • Inventory problem

Papers since 50's, even a book in 2021

  • Inventory day (iday)

The task is to predict how long it will take for the inventory of a certain item to be sold completely. In inventory management theory this concept is known as inventory days.

  • SKU: Stock Keeping Unit

5 of 34

6 of 34

7 of 34

pedro.pury@unc.edu.ar 22/11/30

8 of 34

pedro.pury@unc.edu.ar 22/11/30

2. The DATA

Ml-challenge.mercadolibre.com HTTP ERROR 503 : (

3 sets

Training set for 660916 SKUs

2 months (February and March 2021) 59 days

> 38.9 x 106 entries

For do predictions on April 2021 30 days

9 of 34

10 of 34

11 of 34

12 of 34

13 of 34

14 of 34

pedro.pury@unc.edu.ar 22/11/30

3. The Metric

15 of 34

pedro.pury@unc.edu.ar 22/11/30

Point estimation

RPS = |u0 - u|

16 of 34

pedro.pury@unc.edu.ar 22/11/30

4. The Baseline the uniform distribution

17 of 34

pedro.pury@unc.edu.ar 22/11/30

4. The Baseline the uniform distribution

<RPS> = d/6

18 of 34

pedro.pury@unc.edu.ar 22/11/30

5. The Benchmark

19 of 34

pedro.pury@unc.edu.ar 22/11/30

6. Gaussian model

20 of 34

pedro.pury@unc.edu.ar 22/11/30

6. Gaussian model

21 of 34

pedro.pury@unc.edu.ar 22/11/30

7. Stochastic model

22 of 34

pedro.pury@unc.edu.ar 22/11/30

7. Stochastic model

23 of 34

pedro.pury@unc.edu.ar 22/11/30

7. Stochastic model

24 of 34

pedro.pury@unc.edu.ar 22/11/30

7. Stochastic model

25 of 34

pedro.pury@unc.edu.ar 22/11/30

26 of 34

pedro.pury@unc.edu.ar 22/11/30

27 of 34

pedro.pury@unc.edu.ar 22/11/30

8. Evaluation

Example: SKU 538100

Sales in 28 days of february 2021

0 0 2 1 2 0 0 0 0 1 0 2 1 0 0 0 0 0 0 1 0 0 2 1 0 0 1 1

Sales in 31 days of march 2021

0 1 2 0 0 0 1 0 1 0 3 1 0 0 1 0 1 1 0 0 0 0 0 2 0 1 0 1 2 3 4

1 3 4 5 8 9 10 11 12 14 15 16 18 21 25 m

28 of 34

Pedro Pury 22/11/30

29 of 34

Pedro Pury 22/11/30

30 of 34

Pedro Pury 22/11/30

31 of 34

Pedro Pury 22/11/30

32 of 34

Pedro Pury 22/11/30

Rank 12

495353 SKUs

3254422 evaluations

33 of 34

pedro.pury@unc.edu.ar 22/11/30

9. Concluding remarks

  • The model retrieves the most valuable information from the time series,

when there are data...

  • Does not require "learning" (training in the ML sense)
  • The model not only predicts the inventory day, but also gives the daily probability of a given stock. For ML this represents another different training problem
  • Contributes to the understanding of iday prediction
  • Given that it does not use contextual information, it cannot predict when

there is no data. It is not good enough to allow you to win the challenge :(

  • But it could be a tool to improve an ML model: ideas are accepted!!

34 of 34

pedro.pury@unc.edu.ar 22/11/30

That’s all folks !

GRACIAS