Crypto Coins Prediction Using Neural Networks
Group 4: The Deadpool Squad
By Ying Yu, Jeffrey Barton, and Anurag Aiyer
Brief Dataset Recap
Dataset Overview
Goal of the Dataset
2
Data Preprocessing
Data cleaning
Data Cleaning was not needed as the dataset was populated with the correct data types and no missing values were found
Data Normalization
Data (all fields numerical) is normalized prior to training. Data normalization is important when training recurrent neural networks for model stability.
3
Sample data information of bitcoin dataset from D1 folder
Sequence Separation
4
Seq K_n
K_(n+1)
K_(n+2)
Dataset Length N
…
Training Sequence
Output
Train/Test Split
5
Dataset | Train # | Test # | Val # |
1 Day | 1,753 | 335 | 216 |
4 Hour | 10,614 | 2,107 | 1,397 |
30 Min | 84,954 | 16,975 | 11,309 |
Model Evaluation
6
Model Regularization
Two regularization methods were used:
7
Model Optimizer
Two optimizers were evaluated for learning:
8
Learning Schedule Rate
Three learning rates were selected for SGD optimization [5]:
9
Model Training
Model Training
Recurrent Neural Network (RNN)
Long Short Term Memory (LSTM)
Gated Recurrent Unit
(GRU)
Single hidden linear layer
Gated hidden linear layers and cell state vectors
Gated hidden linear layer
10
[1]
[2]
[3]
Hyper Parameter Tuning (RNN)
11
Parameter | Value |
# of Layers | 3 |
Hidden Size | 1028 |
# of Epochs | 250 |
Initial LR | 0.01 |
L1 Reg Lambda | 0.001 |
Batch Size | # of Samples |
Activation Function | ReLU |
Sequence Length | 20 |
Dropout Prob. | 0.0 |
SGD + L1 Regularization
12
Parameter | Value |
Optimizer | SDG |
Loss Function | MSE |
LR Schedule | Cosine Annealing |
T Max | 10 |
Eta Min | 0 |
Alpha = 0.0 (Off)
RMSE: 0.03139
Alpha = 0.001 (On)
RMSE = Too Large
SGD + L1 Regularization
(OFF) Alpha = 0.0
(ON) Alpha = 0.001
L1 Regularization introduced instabilities that made it difficult to train with SGD.
13
SGD + Dropout
14
Parameter | Value | RMSE |
Dropout | 0.00 | 0.0295 |
Dropout | 0.10 | 0.0266 |
Dropout | 0.25 | 0.0281 |
Adam Optimizer
Avg RMSE was 0.0238, an improvement over SGD.
Lessoned Learned:
15
Parameter | Value |
Loss Function | L1 Loss |
Optimizer | Adam |
# of Layers | 3 |
Hidden Size | 256 |
Dropout | 0.0 |
LR Scheduler | Cosine Annealing (10, 0) |
Best RNN
Best Achieved RMSE: 0.02099
16
Parameter | Value |
# of Layers | 2 |
Hidden Size | 64 |
# of Epochs | 100 |
Initial LR | 0.01 |
L1 Reg Lambda | 0.001 |
Batch Size | # of Samples |
Activation Function | ReLU |
Sequence Length | 20 |
Dropout Prob. | 0.0 |
LR Schedule | ReduceLROnPlateau(‘min’, 0.5, 5) |
Loss | L1 |
Optimizer | Adam |
Data Output
17
Timescale
The error decreased as smaller time increments were used. Some possible reasons are below:
18
Daily RMSE: 0.02099
4 Hour RMSE: 0.0104
30 Min RMSE: 0.005
Hyperparameter Tuning - LSTM
19
Parameter | Value |
# of Layers | 1 |
Hidden Size | 50 |
# of Epochs | 100 |
Initial LR | 0.01 |
L1 Reg Lambda | 0.0001 |
Batch Size | # of Samples |
Sequence Length | 30 |
Dropout Prob. | 0.0 |
Hyperparameter Tuning - LSTM
20
Parameter | Value |
# of Layers | 1 |
Hidden Size | 50 |
# of Epochs | 100 |
Initial LR | 0.01 |
L1 Reg Lambda | 0.0001 |
Batch Size | # of Samples |
Sequence Length | 30 |
Dropout Prob. | 0.0 |
Reason of Choice
The value for each parameter is very basic. This is because our data is not complex and data size is not big. Adding more layers/units might cause overfitting and consume more computational power.
Hyperparameter Tuning - LSTM
21
Sequence Length Comparison
Since our data is not big and complex, I tried three short sequence lengths – 10, 20, and 30. For our problem, as we are predicting coin prices based on the historical data, it might be better to have longer sequence length. However, since our data is limited, it might cause overfitting and I decided to stop at 30. Also, since we are using the daily data, using shorter sequence length should be fine. (For hourly dataset we might need longer sequence to capture for daily patterns.)
Sequence Length | Test RMSE |
10 | 0.022302017 |
20 | 0.021161662 |
30 | 0.021013899 |
The training speed slowed down noticeably at 30
Model Generalization- LSTM
22
Data size smaller compared to the other coins. Data was underfitting a bit using the same parameters. Increased number of layers and epochs.
Model Output - LSTM
For Aptos Coin:
23
Parameter | Value |
# of Layers | 2 |
Hidden Size | 50 |
# of Epochs | 500 |
Initial LR | 0.01 |
L1 Reg Lambda | 0.0001 |
Batch Size | # of Samples |
Sequence Length | 30 |
Dropout Prob. | 0.0 |
Test RMSE from 0.047582 -> 0.037924
Hyper Parameter Tuning (GRU)
24
Parameter | Value |
# of Layers | 2 |
Hidden Size | 50 |
# of Epochs | 100 |
Initial LR | 0.01 |
L1 Reg Lambda | 0.0001 |
Batch Size | # of Samples |
Sequence Length | 10-40 |
Dropout Prob. | 0.0 |
GRU - SGD + L1 Regularization
25
Parameter | Value |
Optimizer | SDG |
Loss Function | MSE |
Alpha = 0.0 (Off)
RMSE(Seq Length 10-40): 0.1983, 0.172, 0.1942, 0.184
Alpha = 0.0001 (On)
RMSE(Seq Length 10-40) = 0.1888, 0.1924, 0.1869, 0.1712
Same performance with half the sequence than with L1 Reg. on.
GRU - SGD + Dropout
26
Parameter | Value | RMSE |
Dropout | 0 | 0.19 - 0.2 |
Dropout | 0.1 | 0.176-0.205 |
Dropout | 0.2 | 0.133-0.198 |
GRU - Adam + L1 Regularization + MSE Loss
27
Coin | 10 Seq | 20 Seq | 30 Seq | 40 Seq |
Bitcoin | 0.02731 | 0.0302 | 0.0246 | 0.0291 |
Etherum | 0.0203 | 0.01923 | 0.0247 | 0.0186 |
BNB | 0.0247 | 0.02348 | 0.0267 | 0.02949 |
TIA | 0.0381 | 0.0471 | 0.0437 | 0.0375 |
APT | 0.0415 | 0.0408 | 0.0432 | 0.0421 |
Best GRU
Best Achieved RMSE: 0.01793
28
Parameter | Value |
# of Layers | 2 |
Hidden Size | 20 |
# of Epochs | 100 |
Initial LR | 0.01 |
L1 Reg Lambda | 0.0001 |
Batch Size | # of Samples |
Sequence Length | 30 |
Dropout Prob. | 0.0 |
Loss | L1 |
Optimizer | Adam |
Data Comparison
29
SGD vs. Adam
30
Model Application and Challenges in Practice
31
Model Summary
32
Model | Best RMSE | # Param | # Epochs | Seq Len |
RNN | 0.02099 | 1280 | 100 | 20 |
LSTM | 0.02116 | 750 | 100 | 30 |
GRU | 0.01793 | 300 | 100 | 30 |
References
[1] https://en.wikipedia.org/wiki/Recurrent_neural_network
[2] https://en.wikipedia.org/wiki/Long_short-term_memory
[3] https://en.wikipedia.org/wiki/Gated_recurrent_unit
[4] https://arxiv.org/abs/1412.6980
[5] https://pytorch.org/docs/stable/optim.html#module-torch.optim.lr_scheduler
[6]
https://paperswithcode.com/method/cosine-annealing
[7] https://wiki.cloudfactory.com/docs/mp-wiki/scheduler/reducelronplateau
33
Initial EDA Presentation
(BACKUP)
34
Outline
35
Background Info
This dataset is found on Kaggle and it contains 234 Crypto Coins/Altcoins with historical Open, High, Low, Close, and Volume (OHLCV) prices traded in the Binance Exchange. This dataset is around 7GB and is the direct market data from the past 8 years (2024 included).
The data contains total 7 different sets of data that tracks daily, hourly (hours and minutes), and weekly rate changes. For our analysis, D1 folder data we will be our main resource, which has the information regarding the price and trading volume changes on the daily basis. However, if needed, we will also be referring to data from other timestamp folders for more precise predictions.
36
Background Basic EDA
The D1 dataset (concat all datasets in D1 folder) has 263,800 entries with no missing values. The data covers from date 2017-07-14 to 2024-02-13.
37
When we look at the max open value for each coin, the top ten coins with the most open values were all in year 2021, with the following distribution (YFI and BTC being the largest):
And if we remove the top two values, here is the distribution:
Background Basic EDA
We can see that most coins had their max opening price in year 2021 and two peaks happening during early summer and winter time.
38
Background Basic EDA
We can see the same trend coins’ max closing price.
39
Background Basic EDA
Below are the scatter plots of the top ten max openings vs. their closing value of the day:
40
Top 10
Removed top 2
We can see that prices were soaring between early 2021 to early 2022 for Bitcoin. There was a steep decline between early 2022 and mid-2022. Afterwords, the prices steadily started to increase again till 2024. The highest closing price Bitcoin reached is 67525.83 per share.
41
We can see that prices were soaring between early 2021 to early 2022 for Etherum. There was a steep decline between early 2022 and mid-2022. Afterwords, the prices steadily started to increase again till 2024. The highest closing price Etherum reached is 4807.98 per share.
42
Assumptions
43
Goal of the project
44
Altcoin Correlation (1/2)
Many altcoins are closely tied to the price of bitcoin (BTC).
The figure on the right shows the normalized closing price of the top 10 cryptocurrencies (according to average price).
Many exhibit similar trends to bitcoin.
45
Alt-Coin Correlation
46
Strong Negative Correlation | |
Token | Pearson Score |
EDU | -0.74 |
LQTY | -0.68 |
SSV | -0.65 |
AGIX | -0.63 |
Strong Positive Correlation | |
Token | Pearson Score |
ETH | .93 |
BNB | .87 |
TIA | .79 |
APT | .66 |
The importance of cryptocurrency forecasting (1/2)
47
The importance of cryptocurrency forecasting (2/2)
48
Summary
49
Dataset Background Info
50