Brazil Ecommerce Sales Prediction using Prophet
Mazi Prima Reza
2017-2018 sales show a sudden spike on Black Friday.
This is the time series dataset that is currently used to be analyzed and forecasted future sales. It shows a sudden spike three times higher than normal days at 24 Nov 2018, or in Black Friday.
24 Nov 2018
Prophet could predict future sales with 20.42 RMSE
Train
Test
Yet Prophet overfit train dataset and could not predict the trend in the test dataset
During May - July 2018, there was a continuous drop that did not happen in the previous year. A simple google search didn’t answer why this drop happened. This unusual trends affect a higher error prediction in test dataset up to 70.42 RMSE.
But is this a good model?
Variant G performs better than others
Nine experiments are conducted in finding the best model to predict future sales. Turns out variant G is the best model based on what we have!
Variant Name | MinMax Scaler | Correcting Outliers | Hyperparameter Tuning | Holiday Context | RMSE | |
Train | Test | |||||
Control Group | FALSE | FALSE | FALSE | FALSE | 58.99 | 75.14 |
A | TRUE | FALSE | FALSE | FALSE | 58.99 | 75.41 |
B | TRUE | TRUE | FALSE | FALSE | 37.44 | 82.92 |
C | FALSE | FALSE | TRUE | FALSE | 59.98 | 76.20 |
D | TRUE | FALSE | TRUE | FALSE | 59.89 | 77.21 |
E | TRUE | TRUE | TRUE | FALSE | 36.56 | 81.88 |
F | FALSE | FALSE | TRUE | TRUE | 20.48 | 77.61 |
G | TRUE | FALSE | TRUE | TRUE | 20.43 | 74.06 |
H | TRUE | TRUE | TRUE | TRUE | 20.02 | 92.01 |
Future Improvement
Gather events information help Prophet understand data trend better.
Winsorizing is used to correct the outliers
Since black friday affect the sales three times than normal, obviously there will be an outlier and we have to handle this. Winsorizing is used to remove the outliers, it corrects outliers to the maximum values in the normal day.
Before
After