1 of 18

XI INTERNATIONAL CONFERENCE

“INFORMATION TECHNOLOGY AND IMPLEMENTATION” (IT&I-2024)

The Machine Learning Model Development Lifecycle for Prediction of Electrical Energy Market Volumes

Anatoliy Doroshenko�Dmitry Zhora�Oleksii Zhyrenkov

Institute of Software Systems, Kyiv, Ukraine

2 of 18

Introduction

Modern Python-based libraries for machine learning and data analysis:

scikit-learn
TensorFlow
Keras
PyTorch

Modern MLOps platforms for model deployment and version tracking:

BentoML
MLflow
Kubeflow
SageMaker
TFX

The usage of machine learning techniques de-facto became a standard for modern systems that need to provide a prediction. The regression algorithms available in scikit-learn library are used in current research to build the forecasting model for electrical energy markets.

3 of 18

Input Dataset

On July 1st, 2019, Ukraine adopted the European model that assumes the following four markets for electrical energy: bilateral, day-ahead, intraday, and balancing. The bilateral market can be also referenced as a future or forward market. In Ukraine, the total amount of deals is recorded every hour. The markers are organized in a way to balance the energy price and volatility. The dataset used for this research matches the time range from July 1st, 2020, to December 31st, 2021.

4 of 18

Outside Temperature

5 of 18

Dataset Augmentation

It is common for real-world processes that the dynamics of monitored parameters is affected by other factors that are not available in the original dataset. In particular, the electricity production is influenced by outside temperature and by consumer activity cycles.

One of the challenges is to provide the representation of time in a way that close moments in time would be interpreted as close by machine learning algorithm. The solution that is convenient from computational perspective is to calculate the sine and cosine functions when the corresponding argument represents the phase of the cycle. These periodic data series were calculated with the help of an algorithm written in Python.

6 of 18

Dataset Resampling

As original data record contains the figures representing the current hour it makes sense to add two types of columns in the dataset: the parameters that represent the past history and parameters that represent the future to be forecasted. The sliding time window of 24 hours was considered in both directions. Overall, 188 parameters were added for every record.

The obtained dataset had 13'129 records. For hyperparameter tuning it was split into training and testing parts with proportion of 80% to 20%. The random split functionality is provided by method from scikit-learn library. The obtained datasets were saved into .csv files, so that different machine learning algorithms considered later are evaluated with equal conditions.

Past History	Future History
BilateralM1, BilateralM2, … , BilateralM23	BilateralP1, BilateralP2, … , BilateralP24
DayAheadM1, DayAheadM2, … , DayAheadM23	DayAheadP1, DayAheadP2, … , DayAheadP24
IntradayM1, IntradayM2, … , IntradayM23	IntradayP1, IntradayP2, … , IntradayP24
BalancingM1, BalancingM2, … , BalancingM23	BalancingP1, BalancingP2, … , BalancingP24

7 of 18

R2 Score	Determination coefficient
MAPE	Mean absolute percentage error
MAE	Mean absolute error

Model Metrics

Three metrics shown in table below were used to compare the input parameter sets and different regression algorithms. Each metric measures the discrepancy between test set and forecasted data for selected output column representing one of electrical energy trading volumes. The R2 score was used to make a decision. The nearest neighbors regression algorithm was used to evaluate the performance of input parameters, it provides quite competitive results and has limited number of hyperparameters to tune.

	Observed values
	Predicted values
	Observed average

8 of 18

Feature Selection

The scikit-learn library provides automation facilities to select most informative input parameters. The following classes are worth mentioning: GridSearchCV, LassoCV and SelectFromModel. The latter option was used in current research. The alternative would be to manually evaluate up to 2ⁿ combinations, here n represents the total number of possible inputs. The periodic parameters were not added to the history as such parameters precisely indicate the moment in time. The final set of input features obtained with SelectFromModel class contained 60 entries out of 106.

selected_columns = \

['Bilateral', 'DayAhead', 'Intraday', 'Balancing', 'BilateralM1',

'DayAheadM1', 'IntradayM1', 'BalancingM1', 'DayAheadM2',

'IntradayM2', 'BalancingM2', 'IntradayM6', 'BilateralM7',

'DayAheadM7', 'IntradayM7', 'BalancingM7', 'BilateralM8',

'DayAheadM8', 'IntradayM8', 'BilateralM9', 'DayAheadM9',

'IntradayM9', 'BilateralM10', 'DayAheadM10', 'IntradayM10',

'BilateralM13', 'BilateralM14', 'IntradayM14', 'BalancingM14',

'BilateralM15', 'DayAheadM15', 'IntradayM15', 'BalancingM15',

'BilateralM16', 'DayAheadM16', 'IntradayM16', 'BilateralM17',

'DayAheadM17', 'IntradayM17', 'BilateralM18', 'DayAheadM18',

'IntradayM18', 'IntradayM19', 'BilateralM21', 'BilateralM22',

'DayAheadM22', 'IntradayM22', 'BilateralM23', 'DayAheadM23',

'IntradayM23', 'BalancingM23', 'TempUkr', 'TempKiev', 'CosDay',

'SinWeek', 'CosWeek', 'SinMonth', 'CosMonth', 'SinYear', 'CosYear']

9 of 18

Prediction Error (MWh)

10 of 18

Error Histogram

11 of 18

R2 Score Comparison

Regression Algorithm	Bilateral	DayAhead	Intraday	Balancing
Histogram Gradient Boosting	0.987344	0.972738	0.878364	0.919632
Ada Boost Regressor	0.980086	0.961343	0.851729	0.903254
Gradient Boosting Regressor	0.978789	0.963179	0.846663	0.901125
Extra Trees Regressor	0.974619	0.959632	0.864845	0.898156
Nearest Neighbors Regressor	0.967512	0.948956	0.860665	0.875551
Random Forest Regressor	0.966803	0.947184	0.831671	0.873048
Support Vector Machine	0.938416	0.907901	0.782819	0.785732
Multi-Layer Perceptron (QNO)	0.935896	0.904092	0.754444	0.791107
Multi-Layer Perceptron (SGD)	0.934140	0.908779	0.773580	0.815628
Elastic Net Regressor	0.929248	0.903003	0.755470	0.779082
Linear Regression	0.929214	0.902979	0.755526	0.779067
Bayes Ridge Regressor	0.925025	0.892584	0.741958	0.778845

12 of 18

MAPE Comparison

Regression Algorithm	Bilateral	DayAhead	Intraday	Balancing
Histogram Gradient Boosting	0.009708	0.035550	0.306800	3.414739
Ada Boost Regressor	0.010436	0.039889	0.299648	3.703527
Gradient Boosting Regressor	0.011671	0.041963	0.331089	4.306912
Extra Trees Regressor	0.013403	0.044706	0.397157	3.687793
Nearest Neighbors Regressor	0.014842	0.047414	0.312221	4.160123
Random Forest Regressor	0.015383	0.050163	0.444903	4.214244
Support Vector Machine	0.020497	0.065063	0.446288	5.010112
Multi-Layer Perceptron (QNO)	0.022011	0.068955	0.484654	4.251736
Multi-Layer Perceptron (SGD)	0.023281	0.067661	0.497681	4.585881
Elastic Net Regressor	0.021644	0.067856	0.460139	5.917427
Linear Regression	0.021679	0.067995	0.460715	5.929356
Bayes Ridge Regressor	0.022225	0.069814	0.501726	5.903614

13 of 18

MAE Comparison (MWh)

Regression Algorithm	Bilateral	DayAhead	Intraday	Balancing
Histogram Gradient Boosting	114.528	136.017	107.623	287.495
Ada Boost Regressor	122.856	151.671	107.042	308.036
Gradient Boosting Regressor	137.181	161.254	119.274	324.798
Extra Trees Regressor	156.724	165.609	118.335	320.904
Nearest Neighbors Regressor	175.124	183.691	112.622	344.727
Random Forest Regressor	180.816	187.724	131.586	360.659
Support Vector Machine	239.541	247.547	150.333	475.954
Multi-Layer Perceptron (QNO)	257.264	260.785	159.517	481.078
Multi-Layer Perceptron (SGD)	270.583	256.261	157.399	445.710
Elastic Net Regressor	253.327	259.857	155.518	487.916
Linear Regression	253.691	260.242	155.743	488.161
Bayes Ridge Regressor	260.956	269.083	160.748	487.498

14 of 18

MLOps Constituents

The paradigm of MLOps includes aspects like best practices, sets of concepts, as well as a development culture when it comes to end-to-end conceptualization, implementation, monitoring, deployment, and scalability of machine learning products.

MLOps is aimed at productionizing machine learning systems by bridging the gap between development (Dev) and operations (Ops).

Principles: CI/CD automation, workflow orchestration, reproducibility; versioning of data, model and code; collaboration; continuous ML training and evaluation; ML metadata tracking and logging; continuous monitoring; and feedback loops.

15 of 18

Existing Instruments

16 of 18

BentoML and FastAPI

Register model in model registry
Create Service and workers
Add additional logic layer
Package model as Bento
Deploy as docker image

17 of 18

Model Deployment

18 of 18

Conclusion

The production-ready forecasting solution requires multiple stages:

data preprocessing and augmentation
selection of input parameters
selection of machine learning algorithm
hyperparameter optimization
model training and serialization
creation of REST API layer
creation of Docker image
networking and autoscaling configuration
deployment into Kubernetes cluster

The ONNX standard for model serialization allows for interoperability between different languages and platforms. The usage of popular machine learning libraries like scikit-learn greatly simplifies the research phase. The deployment platforms like BentoML automatically provide features for service resilience and scalability.