09.27.2022
Course No. : CE 6290
Course Name : Hydroinformatics
Binata Roy�PhD Student, Civil Engineering
Department of Engineering Systems and Environment�br3xk@virginia.edu
Application of Deep Learning (LSTM) in Flood Forecasting
Organization of the Lecture
Deep Learning
Artificial Intelligence > Machine Learning > Deep Learning
Machines perform tasks typically requiring �human intelligence
2. Machine Learning�Machines learn from own experience
3. Deep Learning�Machines learn using ‘Artificial Neural Networks’ – inspired by structure and function of human brain
Machine Learning vs. Deep Learning
Deep Learning Models
Artificial Neural Networks (ANN)
Forward Propagation
Backward Propagation
Recurrent Neural Networks (RNN)
RNN Limitation
Where the gap between the relevant information and the place that it’s needed is small, RNNs can learn to use the past information
Consider what happens if we unroll the loop with time
As that gap grows, RNNs become unable to learn to connect the information from past due to vanishing and exploding gradient
RNN typically remembers prior 10 time-steps
Time: T1 T2 T3 ..…… Tn
unroll
LSTM Architecture
Long Short Term Memory (LSTM)
RNN
LSTM
In standard RNNs, repeating module has a very simple structure - a single tanh layer.
In LSTMs, the repeating module has a different structure. Instead of having a single neural network layer, there are four, interacting in a very special way.
Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term temporal dependencies by overcoming gradient vanish and exploding problem of traditional RNN [Hochreiter&Schmidhuber. 1997]
LSTM Architecture
Forget Gate
Input Gate
Output Gate
Cell State from timestamp t -1
Hidden State from timestamp t -1
Cell State to timestamp t + 1
Hidden State output to timestamp t + 1
Input Data timestamp t
Forget Gate :
It takes cell & hidden state from previous timestep and input from current timestep and decides to “keep it” or “forget it.” and update the cell state
Input Gate :
It decides which values we’ll update and then add it with the decision made by forget gate
Output Gate :
It decides which we’ll output to next timestep
Forget Gate :
It takes ht−1 and xt, and outputs a number between 0 and 1 for each number in the cell state. 1 = “keep it” while 0 = “forget it.” and update the cell state accordingly.
Input Gate :
The “input gate layer” decides which values we’ll update. Then we add it with the decision made by the forget gate.
Output Gate :
The “output gate layer” decides which we’ll output. We put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided .
LSTM at t timestep
RNN vs. LSTM with Example
Dave eats Pasta everyday, hence his favorite cuisine is (………..)
RNN vs. LSTM with Example
Traditional RNN
LSTM
Dave eats Pasta everyday, hence his favorite cuisine is (………..)
Dave
eats
Pasta
Dave
Dave, eats
eats, Pasta
is
Cuisine, is
???
……
Dave
eats
Pasta
-------
…….
Pasta
is
……
Pasta
Italian
LSTM in Flood Forecasting
Flood forecasting at flood-prone streets using LSTM
LSTM Forecasting Model
Using Inputs – P at [(t-n)…(t+m)] and WL at [(t-n)…(t)], Forecast Label - WL at [(t+1)….(t+m)]
WLt+1…t+m
Pt-n…t+m
WLt-n…t
LSTM
Model
Input
Forecast
Time (hrs) | Rainfall (P) | WL |
1/1/2010 0:00 | Known | Known |
1/1/2010 1:00 | Known | Known |
1/1/2010 2:00 | Known | known |
1/1/2010 3:00 | Known | ? |
1/1/2010 4:00 | Known | ? |
1/1/2010 5:00 | Known | ? |
1/1/2010 6:00 | Known | ? |
T=t-2 T=t-1 *T=t
T=t+1
T=t+2
Study Area : Norfolk City Virginia
Taken from Faria et al 2021.
LSTM Model: Input Features and Storm Events
Input Features
6/5/2016 |
7/31/2016 |
8/9/2016 |
9/3/2016 |
9/19/2016 |
9/20/2016 |
9/21/2016 |
10/8/2016 |
10/9/2016 |
1/2/2017 |
7/15/2017 |
8/7/2017 |
8/29/2017 |
6/22/2018 |
7/30/2018 |
8/20/2018 |
Train Events (16)
10/29/2017 |
5/6/2018 |
5/28/2018 |
8/11/2018 |
Test Events (4)
Input Features
Environmental input features
Topographic Input features
Output Features
Storm Events (20)
LSTM Model: 3D Input Data Processing
LSTM 3D Data Preparation
LSTM needs a 3D tensor with shape [batch, timesteps, feature]
Features : number of input features considered e.g., rainfall, tide level etc.
Timesteps : how many timestep considered for prediction
Batch size : how many samples in each batch
LSTM Model : Model Initialization
#load python library
import Keras
# define model
model = Sequential()
# add hidden layer(s)
model.add(LSTM(units= 50, activation='tanh', input_shape=(None, train_X.shape[2]), use_bias=True,
bias_regularizer=L1L2(l1=0.01, l2=0.01)))
# add dropout between hidden and output layer to improve generalization/ reduce overfitting
model.add(Dropout(.355))
# add output layer with linear activation
model.add(Dense(activation='linear', units=n_ahead-1, use_bias=True))
# set optimizer
adam = keras.optimizers.Adam(lr=0.001)
# compile model
model.compile(loss=rmse, optimizer=adam)
#model fit with train data
model.fit(train_X, train_y, batch_size=n_batch, epochs=n_epochs, verbose=2, shuffle=False, callbacks=[earlystop])
# predictions with test data
Test_yhat = model.predict(Test_X)
LSTM Parameters
Finally, compare between observed ‘Test_y’ and model ‘Test_yhat’
Performance Metrics
LSTM Model : Accuracy Assessment
Obs.
Sim.
Preliminary LSTM Results
Preliminary LSTM Model Results
n_lags = 3 hr
n_ahead = 2 hr
Input Features
Output Features
8/7/2017 |
8/29/2017 |
5/6/2018 |
Testing
Training
LSTM Model Improvement: Hyper parameterization
Hyperparameter | Options Explored |
Number of Neurons | 10, 15, 20, 40, 50, 75 |
Activation Functions | relu, tanh, sigmoid |
Optimization Function | Adam, Stochastic gradient descent (SGD), RMSProp |
Learning Rate | 1x10-3, 1 x 10-2, 1x10-1 |
Dropout Rate | 0.1-0.5 |
Hyperparameter tuning means choosing a set of optimal hyperparameters for the Model
Hyper tuning Methods
References
Thank you
Thank You
Any Questions?!