1 of 50

Crypto Coins Prediction Using Neural Networks

Group 4: The Deadpool Squad

By Ying Yu, Jeffrey Barton, and Anurag Aiyer

2 of 50

Brief Dataset Recap

Dataset Overview

  • 234 crypto coins/altcoins with open, close, low, high, and volume
  • 7 different types of datasets: weekly, hourly, and daily changes. We chose D1 as a group since there includes more data for the daily scale and can provide high capabilities for prediction

Goal of the Dataset

  • We want to predict Bitcoin prices leveraging deep learning models learned in the course

2

3 of 50

Data Preprocessing

Data cleaning

Data Cleaning was not needed as the dataset was populated with the correct data types and no missing values were found

Data Normalization

Data (all fields numerical) is normalized prior to training. Data normalization is important when training recurrent neural networks for model stability.

3

Sample data information of bitcoin dataset from D1 folder

4 of 50

Sequence Separation

  • Dataset of length N was split into sequences of K+1.
  • The output sample is saved as the “target” to be learned.
  • Number of sequences is

4

Seq K_n

K_(n+1)

K_(n+2)

Dataset Length N

Training Sequence

Output

5 of 50

Train/Test Split

  • Dataset was split into three regions: 75% Train, 15% Test, 10% Validation
  • The regions were non-overlapping.
  • Sequence lengths of 10, 20 and 40 were chosen.
  • Table is for sequence length of 20.

5

Dataset

Train #

Test #

Val #

1 Day

1,753

335

216

4 Hour

10,614

2,107

1,397

30 Min

84,954

16,975

11,309

6 of 50

Model Evaluation

  • The primary evaluation metric used is the Root Mean Squared Error (RMSE).

  • Two primary loss functions used to compute training loss: MSE and L1 (MAE)

6

7 of 50

Model Regularization

Two regularization methods were used:

  1. L1 Lasso Regularization

  1. Dropout Layer
    1. Randomly sets input units to 0 with probability p to prevent overfitting.

7

8 of 50

Model Optimizer

Two optimizers were evaluated for learning:

  1. Stochastic Gradient Descent (SGD)
    1. Updates the weights (w) based on a preselected learning rate (eta).
    2. A learning rate schedule is often used to improve learning.

  1. Adaptive Moment Estimation (Adam) [4]
    1. “ An algorithm for first-order gradient-based optimization of stochastic objective functions based on adaptive estimates of lower-order moments
      1. 1st Moment = Mean, 2nd Moment = Biased Variance
    2. Computes individual adaptive learning rates for different parameters.

8

9 of 50

Learning Schedule Rate

Three learning rates were selected for SGD optimization [5]:

  1. Stepped Learning Rate (StepLR)
    1. Reduces the learning rate every step_size epochs.
  2. Cosine Annealing (CosineAnnealingLR) [6]
    • Uses a cosine function to decrease/increase the learning rate
  3. Plateau (ReduceLROnPlateau) [7]
    • Keeps the learning rate constant while learning, but once learning plateaus, it decreases the learning rate.
    • Test loss used as the learning metric.

9

10 of 50

Model Training

Model Training

Recurrent Neural Network (RNN)

Long Short Term Memory (LSTM)

Gated Recurrent Unit

(GRU)

Single hidden linear layer

Gated hidden linear layers and cell state vectors

Gated hidden linear layer

10

[1]

[2]

[3]

11 of 50

Hyper Parameter Tuning (RNN)

  • The RNN was selected as the network to help define the hyperparameters for the other models.
  • Used the Daily BTC dataset
  • Evaluated using
    • Validation RMSE for accuracy
    • Epochs till convergence
      • How long do I need to train for?
    • Train vs. Test Loss
      • Does it converge at all?

11

Parameter

Value

# of Layers

3

Hidden Size

1028

# of Epochs

250

Initial LR

0.01

L1 Reg Lambda

0.001

Batch Size

# of Samples

Activation Function

ReLU

Sequence Length

20

Dropout Prob.

0.0

12 of 50

SGD + L1 Regularization

  • The SGD optimizer had trouble learning with L1 regularization on.
  • No change in LR, LR Scheduler, Number of Epochs or appear to help.

12

Parameter

Value

Optimizer

SDG

Loss Function

MSE

LR Schedule

Cosine Annealing

T Max

10

Eta Min

0

Alpha = 0.0 (Off)

RMSE: 0.03139

Alpha = 0.001 (On)

RMSE = Too Large

13 of 50

SGD + L1 Regularization

(OFF) Alpha = 0.0

(ON) Alpha = 0.001

L1 Regularization introduced instabilities that made it difficult to train with SGD.

13

14 of 50

SGD + Dropout

  • A Dropout value of 0.10 had the best performance.
  • However, the improvement is only slight improvement over the other methods.

14

Parameter

Value

RMSE

Dropout

0.00

0.0295

Dropout

0.10

0.0266

Dropout

0.25

0.0281

15 of 50

Adam Optimizer

Avg RMSE was 0.0238, an improvement over SGD.

Lessoned Learned:

  • Adam optimizer had trouble converging with a larger model (Hidden Size and # Layers).
  • LR Schedule didn’t appear to matter for Adam.
    • Adam adapts the LR for each parameter so it handles it own schedule.
  • Dropout caused convergence issues, so it was turned off.
  • The largest improvement was the stability. Adam consistently converged to a solution.

15

Parameter

Value

Loss Function

L1 Loss

Optimizer

Adam

# of Layers

3

Hidden Size

256

Dropout

0.0

LR Scheduler

Cosine Annealing (10, 0)

16 of 50

Best RNN

Best Achieved RMSE: 0.02099

16

Parameter

Value

# of Layers

2

Hidden Size

64

# of Epochs

100

Initial LR

0.01

L1 Reg Lambda

0.001

Batch Size

# of Samples

Activation Function

ReLU

Sequence Length

20

Dropout Prob.

0.0

LR Schedule

ReduceLROnPlateau(‘min’, 0.5, 5)

Loss

L1

Optimizer

Adam

17 of 50

Data Output

  • Data was trained for Open, Close, Low, High and Volume.
  • Open, Close, Low, and High were able to train well.
  • Volume never was able to be predicted.
  • We did not explore removing this from the data.

17

18 of 50

Timescale

The error decreased as smaller time increments were used. Some possible reasons are below:

  1. More time samples are available to learn from. Time decrease was roughly 6x and 24x more data, respectively
  2. The price’s stochastic nature may be less chaotic in smaller time increments.

18

Daily RMSE: 0.02099

4 Hour RMSE: 0.0104

30 Min RMSE: 0.005

19 of 50

Hyperparameter Tuning - LSTM

  • The LSTM was selected as the network to help define the hyperparameters.
  • Used the Daily BTC dataset
  • Evaluated using
    • Test RMSE for accuracy
    • Train vs. Predicted Plot

19

Parameter

Value

# of Layers

1

Hidden Size

50

# of Epochs

100

Initial LR

0.01

L1 Reg Lambda

0.0001

Batch Size

# of Samples

Sequence Length

30

Dropout Prob.

0.0

20 of 50

Hyperparameter Tuning - LSTM

20

Parameter

Value

# of Layers

1

Hidden Size

50

# of Epochs

100

Initial LR

0.01

L1 Reg Lambda

0.0001

Batch Size

# of Samples

Sequence Length

30

Dropout Prob.

0.0

Reason of Choice

The value for each parameter is very basic. This is because our data is not complex and data size is not big. Adding more layers/units might cause overfitting and consume more computational power.

21 of 50

Hyperparameter Tuning - LSTM

21

Sequence Length Comparison

Since our data is not big and complex, I tried three short sequence lengths – 10, 20, and 30. For our problem, as we are predicting coin prices based on the historical data, it might be better to have longer sequence length. However, since our data is limited, it might cause overfitting and I decided to stop at 30. Also, since we are using the daily data, using shorter sequence length should be fine. (For hourly dataset we might need longer sequence to capture for daily patterns.)

Sequence Length

Test RMSE

10

0.022302017

20

0.021161662

30

0.021013899

The training speed slowed down noticeably at 30

22 of 50

Model Generalization- LSTM

22

Data size smaller compared to the other coins. Data was underfitting a bit using the same parameters. Increased number of layers and epochs.

23 of 50

Model Output - LSTM

For Aptos Coin:

23

Parameter

Value

# of Layers

2

Hidden Size

50

# of Epochs

500

Initial LR

0.01

L1 Reg Lambda

0.0001

Batch Size

# of Samples

Sequence Length

30

Dropout Prob.

0.0

Test RMSE from 0.047582 -> 0.037924

24 of 50

Hyper Parameter Tuning (GRU)

  • The GRU was selected as an alternative network to help define the hyperparameters.
  • Used the Daily BTC dataset
  • Evaluated using
    • Test RMSE for accuracy
    • Actual vs Predicted Plots
      • How well does it converge?

24

Parameter

Value

# of Layers

2

Hidden Size

50

# of Epochs

100

Initial LR

0.01

L1 Reg Lambda

0.0001

Batch Size

# of Samples

Sequence Length

10-40

Dropout Prob.

0.0

25 of 50

GRU - SGD + L1 Regularization

  • The SGD optimizer had trouble learning with L1 regularization on and also without it on.
  • Epochs remain to be the same.

25

Parameter

Value

Optimizer

SDG

Loss Function

MSE

Alpha = 0.0 (Off)

RMSE(Seq Length 10-40): 0.1983, 0.172, 0.1942, 0.184

Alpha = 0.0001 (On)

RMSE(Seq Length 10-40) = 0.1888, 0.1924, 0.1869, 0.1712

Same performance with half the sequence than with L1 Reg. on.

26 of 50

GRU - SGD + Dropout

  • A Dropout value of 0.2 had the best performance.
  • The RMSE range is the range for the seq lengths of 10-40
  • Also, the RMSE is too large for dropout

26

Parameter

Value

RMSE

Dropout

0

0.19 - 0.2

Dropout

0.1

0.176-0.205

Dropout

0.2

0.133-0.198

27 of 50

GRU - Adam + L1 Regularization + MSE Loss

  • All the coins performed much better with Adam
  • Each coin is trained separately and the key findings were interesting; Etherum performed the best

27

Coin

10 Seq

20 Seq

30 Seq

40 Seq

Bitcoin

0.02731

0.0302

0.0246

0.0291

Etherum

0.0203

0.01923

0.0247

0.0186

BNB

0.0247

0.02348

0.0267

0.02949

TIA

0.0381

0.0471

0.0437

0.0375

APT

0.0415

0.0408

0.0432

0.0421

28 of 50

Best GRU

Best Achieved RMSE: 0.01793

28

Parameter

Value

# of Layers

2

Hidden Size

20

# of Epochs

100

Initial LR

0.01

L1 Reg Lambda

0.0001

Batch Size

# of Samples

Sequence Length

30

Dropout Prob.

0.0

Loss

L1

Optimizer

Adam

29 of 50

Data Comparison

  • One thing we could do for future predictions is removing Volume and only looking at Open, Close, Low, and High variables
  • Volume had a large number of outliers which led to difficulty in predicting the volume price with our models.
    • Preprocessing the dataset to transform the volume data to the same working region as the other 4 features.

29

30 of 50

SGD vs. Adam

  • Adam might work better than SGD for our dataset due to noise. Adam is more robust against noise in the gradient computations as it uses adaptive learning rates.

  • SGD when added L1 regularization might not work well for our dataset as the data size is not big.
    • The variance introduced by SGD and added L1 regularization can drive coefficients to zero and potentially lead to underfitting.

30

31 of 50

Model Application and Challenges in Practice

  • Crypto coin price prediction analysis can be used by many people for different purposes, such as risk management and investment analysis.

  • For our analysis, the models have very good accuracies. This is probably because we are using the daily data and the data is not as stochastic. (The coin price can only change so much within a day)

31

32 of 50

Model Summary

  • GRU achieved the best performance by 0.003 with the least parameters.
  • All networks trained better with the Adam optimizer.
  • Dropout severely degraded performance in all networks.
  • Learning rate schedule not as important with Adam optimizer.

32

Model

Best RMSE

# Param

# Epochs

Seq Len

RNN

0.02099

1280

100

20

LSTM

0.02116

750

100

30

GRU

0.01793

300

100

30

33 of 50

References

33

34 of 50

Initial EDA Presentation

(BACKUP)

34

35 of 50

Outline

  • Background
  • Exploratory Data Analysis (EDA)
  • Assumptions
  • Project Goals
  • Predictive Power
  • Summary

35

36 of 50

Background Info

This dataset is found on Kaggle and it contains 234 Crypto Coins/Altcoins with historical Open, High, Low, Close, and Volume (OHLCV) prices traded in the Binance Exchange. This dataset is around 7GB and is the direct market data from the past 8 years (2024 included).

The data contains total 7 different sets of data that tracks daily, hourly (hours and minutes), and weekly rate changes. For our analysis, D1 folder data we will be our main resource, which has the information regarding the price and trading volume changes on the daily basis. However, if needed, we will also be referring to data from other timestamp folders for more precise predictions.

36

37 of 50

Background Basic EDA

The D1 dataset (concat all datasets in D1 folder) has 263,800 entries with no missing values. The data covers from date 2017-07-14 to 2024-02-13.

37

When we look at the max open value for each coin, the top ten coins with the most open values were all in year 2021, with the following distribution (YFI and BTC being the largest):

And if we remove the top two values, here is the distribution:

38 of 50

Background Basic EDA

We can see that most coins had their max opening price in year 2021 and two peaks happening during early summer and winter time.

38

39 of 50

Background Basic EDA

We can see the same trend coins’ max closing price.

39

40 of 50

Background Basic EDA

Below are the scatter plots of the top ten max openings vs. their closing value of the day:

40

Top 10

Removed top 2

41 of 50

We can see that prices were soaring between early 2021 to early 2022 for Bitcoin. There was a steep decline between early 2022 and mid-2022. Afterwords, the prices steadily started to increase again till 2024. The highest closing price Bitcoin reached is 67525.83 per share.

41

42 of 50

We can see that prices were soaring between early 2021 to early 2022 for Etherum. There was a steep decline between early 2022 and mid-2022. Afterwords, the prices steadily started to increase again till 2024. The highest closing price Etherum reached is 4807.98 per share.

42

43 of 50

Assumptions

  • Crypto price values are exactly like stocks - think of closing price as a share owned of crypto
  • The most important coins are the most expensive ones
  • The future price of a coin is based on past performance of a coin

43

44 of 50

Goal of the project

  • Predict the daily closing price of bitcoin
  • These predictions will leverage 4 sources of information.
    1. The trading price and volume of bitcoin.
    2. The trading price and volume of altcoins.
    3. The various timestep granularity (daily, weekly, hourly, etc), with a focus on the daily closing price.
  • Only the altcoins which show a strong correlation to bitcoin will be used.
    • However, there could be relationships with weakly correlation altcoins which yield better prediction accuracy.
  • Given this is a time sequence of data, we will use RNN, LTSMs or GRUs as the network model.

44

45 of 50

Altcoin Correlation (1/2)

Many altcoins are closely tied to the price of bitcoin (BTC).

The figure on the right shows the normalized closing price of the top 10 cryptocurrencies (according to average price).

Many exhibit similar trends to bitcoin.

45

46 of 50

Alt-Coin Correlation

  • Of the 234 Alt-Coins (Not Bitcoin) several coins exhibited strong correlation between closing price with bitcoins.
  • To keep the input vector smaller, we will try to train the algorithm with weakly correlated (>.3) and strongly correlated (>.6) altcoins.

46

Strong Negative Correlation

Token

Pearson Score

EDU

-0.74

LQTY

-0.68

SSV

-0.65

AGIX

-0.63

Strong Positive Correlation

Token

Pearson Score

ETH

.93

BNB

.87

TIA

.79

APT

.66

47 of 50

The importance of cryptocurrency forecasting (1/2)

  • Investors and Trading
    • The SEC has approved several bitcoin exchange traded funds (ETFs) which give institutional stability to cryptocurrencies while also enabling a large new pool of people to start trading cryptocurrencies without the concern of exchanges going bankrupt (FTX for example).
  • Market Analysis
    • With hundreds of billions of dollars flowing through cryptocurrencies, it is prudent that regulatory bodies such as the SEC have methods for understanding volatility and risk of the underlying security (BTC) in order to enact measures to ensure market stability.
    • If BTC were to drop to 0, hundreds of billions would be wiped out. Predicting moments of extreme volatility would allow the SEC to enact countermeasures, such as pausing trading of bitcoin ETFs, to allow the volatility to pass.

47

48 of 50

The importance of cryptocurrency forecasting (2/2)

  • Policy and Regulation
    • As of Sept 7th 2021, El Salvador was the first country to adopt bitcoin as legal tender, tying an entire countries currency stability to bitcoin.
    • Predicting the price of bitcoin would allow policies and regulations to be either stabilize BTC or implement other methods to protect the nation’s population’s assets.
      • These policies and regulations could be similar to a variety of regulations enacted by the U.S. Federal Government after the great depression, such as FDIC insured bank accounts.
    • However, given the volatility of cryptocurrencies, carefully monitoring and forecasting would be required to prevent the consequences of large movements in price.
      • Imagine how the population of El Salvador would feel if their legal tender value dropped by 50% in a single day.

48

49 of 50

Summary

  • The dataset contains 234 Crypto Coins/Altcoins with historical Open, High, Low, Close, and Volume (OHLCV) prices traded in the Binance Exchange.
  • Will utilize the daily price information for Bitcoin which contains 263,800 entries with no missing values.
  • The goal of this project will be to predict the price of Bitcoin.
  • Predictions will take place by exploring a variety of inputs given the large amount of data contained within the dataset.

49

50 of 50

Dataset Background Info

  • Where is the dataset from?
  • Who is this this impacting?
  • How is this dataset created?
  • When was this dataset created and timeline?
  • We have 236800 rows of D1(one day) data for all 235 coins/altcoins
  • There are no missing dataset values: in particular for Bitcoin and Ethereum
  • We will only be using D1 data - this is the most efficient way to analyze the data
  • The highest closing prices for the coins will be between early 2021 to early 2022
  • Most coins started gaining value in 2018

50