1 of 28

FINANCIAL TIME SERIES ANALYSIS

21AIE461

Machine Learning Approaches for Financial Time Series

Forecasting

Team 2

2 of 28

TEAM MEMBERS

ADITHYAN SUKUMAR (CB.EN.U4AIE19004)

ANIRUDH VADAKKEDATH (CB.EN.U4AIE19008)

ARJUN ANIL (CB.EN.U4AIE19012)

RAJATH RAJESH (CB.EN.U4AIE19051)

3 of 28

CONTENTS

LITERATURE REVIEW

INTRODUCTION

DATASET

METHODOLOGY

RESULTS

4 of 28

Introduction

5 of 28

Introduction

Forecasting time series data is an important task for brokers,financial analysts and traders as they make decisions daily about buying financial assets. Hence knowing the market factors beforehand helps in minimizing loss.

Time series data are made of components like level, trend, seasonality , cyclicity, noise .

The commonly used models for this task are ARIMA and GARCH models. Advancements in machine learning methods have also helped in time series forecasting. Certain machine models can outperform time series models in forecasting tasks.

6 of 28

Literature Review

7 of 28

TITLE	AUTHORS	INFERENCE
Financial time-series analysis of Brazilian stock market using machine learning	Amir H. Gandomi	Compared the performance of single classifiers and ensembles methods in predicting the trend of movement of future financial assets.
Financial time series forecasting with machine learning techniques: a survey	Krollner	Used ANN and found that it gave better performance compared to traditional ml algorithms.
Financial time series forecasting-a machine learning approach	Alexiei Dingli	Used regression models to achieve an 0.0117 RMSE for next day price.

8 of 28

TITLE	AUTHORS	INFERENCE
High frequency financial time series prediction: machine learning approach	Ekaterina Zankova	Used four regressors of different nature: decision tree, multilayer perceptron, k nearest neighbors and support vector.
Financial series prediction: Comparison between precision of time series models and machine learning methods	Xin-Yao Qian	Compared the performance of svm with ARIMA and found that svm performed better in forecasting task
Machine learning techniques for stock prediction	Vatsal H	Combined svm and boosting techniques for forecasting .

10 of 28

Dataset

The experiments were conducted on the daily closing price of two stock index: Nasdaq and S&P500 and two most capitalized cryptocurrencies: Bitcoin (BTC), Ethereum (ETH), and exchange rate EUR USD.

Dataset was collected during the period from 01/01/2015 to 30/06/2020 for all series according to Yahoo Finance

The Nasdaq consists of 1384 observations, 1383 for S&P500, 1434 for exchange rate EUR USD, 2008 for BTC and 1278 for ETH

We are using 80% of dataset for training, 20% for testing

12 of 28

Methodology

13 of 28

In this project we have used various machine learning algorithms for time series forecasting. The algorithms used comes under supervised learning. In supervised learning we feed features and target variables to the machine learning models.
We conducted experiments on the 5 different dataset mentioned above. The two popular financial assets, Nasdaq and S&P500 and EUR-USD have shown a seasonal lag, which is a multiple of 5 when the daily observations are considered and two cryptocurrencies, BTC and ETH have shown a seasonal lag which is a multiple of 7.
Once the data was scaled using MinMax scaler, For financial datasets five values are used for training with which the sixth value is predicted and on the cryptocurrencies dataset we used seven values for training using which the eighth value is predicted.
80% of all available datasets is used for training and the other 20% is used for testing. Machine learning model such as support vector machines, Random Forest Regressor, Gradient boosting regressor and MLP Regressor.

14 of 28

SVM

Support Vector Machine (SVM) is an extension of the Support Vector Classifier. It arises from extending the function space in a particular way using kernel functions.
The main idea of the SVM method is to map the original vector to a higher dimensional space and search for the maximum boundary separation hyperplane in this space.
Support Vector Regression (SVR) is the regression process performed by SVM which tries to identify the hyperplane that maximizes the margin between two classes and minimize the total error.

15 of 28

MLP Regressor

In this project we have used a Multi-Layer Perceptron (MLP) with three layers (input layer, one hidden layer and output layer with one neuron) to predict the target variable.
Due to non linear activations present in MLP it can learn complex patterns from time series data.
The network learns by finding appropriate weights for neurons which minimizes the difference between predicted and target value. The loss can be minimized using optimization algorithms like gradient descent etc.

16 of 28

Gradient Boosting Machine

Gradient boosting is a machine learning technique used in regression and classification tasks, among others.

It gives a prediction model in the form of an ensemble of weak prediction models, which are typically decision trees.When a decision tree is the weak learner, the resulting algorithm is called gradient-boosted trees; it usually outperforms random forest.

17 of 28

Random Forest

Random forest is a Supervised Machine Learning Algorithm that is used widely in Classification and Regression problems. It builds decision trees on different samples and takes their majority vote for classification and average in case of regression.
One of the most important features of the Random Forest Algorithm is that it can handle the data set containing continuous variables as in the case of regression and categorical variables as in the case of classification.
Ensemble simply means combining multiple models. Thus a collection of models is used to make predictions rather than an individual model.

19 of 28

NASDAQ stock.

20 of 28

S&P 500 stock.

21 of 28

EUR-USD stock.

22 of 28

Bitcoin stock.

23 of 28

Ethereum stock.

24 of 28

NASDAQ stock.

25 of 28

S&P 500 stock.

EUR-USD stock

26 of 28

BTC stock

ETH stock

27 of 28

CONCLUSION

This project shows that the efficiency of ML models in predicting time series data is at par with DL models. The best performing models in this method were able to get MAPEs in the range 3-12 % when testing the models on out of sample forecasting. These results were obtained by using lag values of Close price only and the accuracy could further by improved if more features like ‘open’, ‘max’, ‘min’ and ‘average prices’ are used to fit the model. Future research could focus on this method as well as the use of DL for feature selection.

1 of 28

2 of 28

3 of 28

4 of 28

5 of 28

6 of 28

7 of 28

8 of 28

9 of 28

10 of 28

11 of 28

12 of 28

13 of 28

14 of 28

15 of 28

16 of 28

17 of 28

18 of 28

19 of 28

20 of 28

21 of 28

22 of 28

23 of 28

24 of 28

25 of 28

26 of 28

27 of 28

28 of 28