1 of 27

09.27.2022

Course No. : CE 6290

Course Name : Hydroinformatics

Binata Roy�PhD Student, Civil Engineering

Department of Engineering Systems and Environment�br3xk@virginia.edu

Application of Deep Learning (LSTM) in Flood Forecasting

2 of 27

Organization of the Lecture

Deep Learning
Deep Learning Models
LSTM Architecture
LSTM in Flood Forecasting
Preliminary LSTM Results

3 of 27

Deep Learning

4 of 27

Artificial Intelligence > Machine Learning > Deep Learning

Artificial Intelligence

Machines perform tasks typically requiring �human intelligence

2. Machine Learning�Machines learn from own experience

3. Deep Learning�Machines learn using ‘Artificial Neural Networks’ – inspired by structure and function of human brain

5 of 27

Machine Learning vs. Deep Learning

Feature Extraction: Extract all the required features for problem statement
Classification: Categorize data into different classes based on feature extracted

In ML, features in data are defined manually by users
In DL, features in data are defined by deep learning algorithms themselves

6 of 27

Deep Learning Models

7 of 27

Artificial Neural Networks (ANN)

ANN is a group of multiple neurons at each layer

ANN consists of 3 layers – Input, Hidden and Output

Input layer accepts inputs
Hidden layer processes inputs
Output layer produces result

Advantages:

Activation functions introduce nonlinear properties to the network

Limitations:

ANN loses the spatial features of an image => solved by CNN – won’t cover
Vanishing and Exploding Gradient as it propagates backward => solved by RNN

Forward Propagation

Backward Propagation

8 of 27

Recurrent Neural Networks (RNN)

RNN - A looping constraint on the hidden layer of ANN turns to RNN

Looping constraint ensures that sequential information is captured in input data
Especially prevalent in Sequential time-series dataset

Advantages

RNN captures sequential information present in input data i.e. dependency between the data while making predictions

Challenges

RNNs with a large number of time steps suffer from the vanishing and exploding gradient problem

9 of 27

RNN Limitation

Where the gap between the relevant information and the place that it’s needed is small, RNNs can learn to use the past information

Consider what happens if we unroll the loop with time

As that gap grows, RNNs become unable to learn to connect the information from past due to vanishing and exploding gradient

RNN typically remembers prior 10 time-steps

Time: T1 T2 T3 ..…… Tn

unroll

10 of 27

LSTM Architecture

11 of 27

Long Short Term Memory (LSTM)

RNN

LSTM

In standard RNNs, repeating module has a very simple structure - a single tanh layer.

In LSTMs, the repeating module has a different structure. Instead of having a single neural network layer, there are four, interacting in a very special way.

Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term temporal dependencies by overcoming gradient vanish and exploding problem of traditional RNN [Hochreiter&Schmidhuber. 1997]

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

12 of 27

LSTM Architecture

Forget Gate

Input Gate

Output Gate

Cell State from timestamp t -1

Hidden State from timestamp t -1

Cell State to timestamp t + 1

Hidden State output to timestamp t + 1

Input Data timestamp t

The key to LSTMs is the CELL STATE, the horizontal line running through the top of the diagram
It runs straight down the entire chain, with only some minor linear interactions to flow information along it unchanged

Forget Gate :

It takes cell & hidden state from previous timestep and input from current timestep and decides to “keep it” or “forget it.” and update the cell state

Input Gate :

It decides which values we’ll update and then add it with the decision made by forget gate

Output Gate :

It decides which we’ll output to next timestep

Forget Gate :

It takes ht−1 and xt, and outputs a number between 0 and 1 for each number in the cell state. 1 = “keep it” while 0 = “forget it.” and update the cell state accordingly.

Input Gate :

The “input gate layer” decides which values we’ll update. Then we add it with the decision made by the forget gate.

Output Gate :

The “output gate layer” decides which we’ll output. We put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided .

https://colah.github.io/posts/2015-08-Understanding-LSTMs/

LSTM at t timestep

13 of 27

RNN vs. LSTM with Example

Dave eats Pasta everyday, hence his favorite cuisine is (………..)

14 of 27

RNN vs. LSTM with Example

Traditional RNN

LSTM

Dave eats Pasta everyday, hence his favorite cuisine is (………..)

Dave

eats

Pasta

Dave

Dave, eats

eats, Pasta

is

Cuisine, is

???

……

Dave

eats

Pasta

-------

…….

Pasta

is

……

Pasta

Italian

RNNs - looping constraint captures sequential information in input data while predictions – not used much
RNNs with large time steps suffer vanishing and exploding gradient problem
RNN can only remember prior 10 time-steps
LSTM – a special RNN capable of learning long-term temporal information overcoming gradient vanishing and exploding problem
LSTM retain more information in memory than RNN
LSTM is the state of art ML algorithm

15 of 27

LSTM in Flood Forecasting

16 of 27

Flood forecasting at flood-prone streets using LSTM

Flooding is a major concern trigger by projected increases in rainfall frequency/volume and relative SLR in future
Street-level flood forecasting is major non-structural measure to mitigate disruptions by urban flooding
Flood forecasting provides flood information with lead-time to prepare counter-measures against impending floods
High fidelity physics-based models – TUFlow, MIKE, HEC-RAS require high computation time, power and cost which are impractical for instantaneous street-level flood forecasting
Data-driven - LSTM model can come to rescue

17 of 27

LSTM Forecasting Model

Using Inputs – P at [(t-n)…(t+m)] and WL at [(t-n)…(t)], Forecast Label - WL at [(t+1)….(t+m)]

Let’s say, n_lagged = 3 and n_ahead = 2

Time lag of 3 hours used to forecast the WL at 2 hours ahead
At t hr timestep, P from previous 3 timesteps (t-2, t-1, t) and future 2 timestep (t+1, t+2) and WL from previous 3 timesteps (t-2, t-1, t) to forecast WL (t+1) and (t+2)

WL_t+1…t+m

P_t-n…t+m

WL_t-n…t

LSTM

Model

Input

Rainfall (P)

Forecast

Water Depth (WL)

Time (hrs)	Rainfall (P)	WL
1/1/2010 0:00	Known	Known
1/1/2010 1:00	Known	Known
1/1/2010 2:00	Known	known
1/1/2010 3:00	Known	?
1/1/2010 4:00	Known	?
1/1/2010 5:00	Known	?
1/1/2010 6:00	Known	?

T=t-2 T=t-1 *T=t

T=t+1

T=t+2

18 of 27

Study Area : Norfolk City Virginia

Taken from Faria et al 2021.

City of Norfolk, Virginia, located along US east coast
It is the 2^nd most vulnerable coastal cities to SLR in the U.S. after New Orleans
It serves important economic and national security roles with one of the largest commercial ports in USA and the largest naval base in world
Considering its vulnerability to flooding, the vital role in national economy and security, Norfolk is selected as study area

19 of 27

LSTM Model: Input Features and Storm Events

Input Features

6/5/2016
7/31/2016
8/9/2016
9/3/2016
9/19/2016
9/20/2016
9/21/2016
10/8/2016
10/9/2016
1/2/2017
7/15/2017
8/7/2017
8/29/2017
6/22/2018
7/30/2018
8/20/2018

Train Events (16)

10/29/2017
5/6/2018
5/28/2018
8/11/2018

Test Events (4)

Input Features

Environmental input features

Total hourly rainfall
Maximum 15 min rainfall in an hour
Cumulative rainfall in previous 2 hr
Cumulative rainfall in previous 72 hr
Hourly tide level

Topographic Input features

Elevation
Topographic wetness index
Depth to water index

Output Features

Water depth

Storm Events (20)

20 of 27

LSTM Model: 3D Input Data Processing

LSTM 3D Data Preparation

LSTM needs a 3D tensor with shape [batch, timesteps, feature]

Features : number of input features considered e.g., rainfall, tide level etc.

Timesteps : how many timestep considered for prediction

Batch size : how many samples in each batch

21 of 27

LSTM Model : Model Initialization

#load python library

import Keras

# define model

model = Sequential()

# add hidden layer(s)

model.add(LSTM(units= 50, activation='tanh', input_shape=(None, train_X.shape[2]), use_bias=True,

bias_regularizer=L1L2(l1=0.01, l2=0.01)))

# add dropout between hidden and output layer to improve generalization/ reduce overfitting

model.add(Dropout(.355))

# add output layer with linear activation

model.add(Dense(activation='linear', units=n_ahead-1, use_bias=True))

# set optimizer

adam = keras.optimizers.Adam(lr=0.001)

# compile model

model.compile(loss=rmse, optimizer=adam)

#model fit with train data

model.fit(train_X, train_y, batch_size=n_batch, epochs=n_epochs, verbose=2, shuffle=False, callbacks=[earlystop])

# predictions with test data

Test_yhat = model.predict(Test_X)

LSTM Parameters

Number of Neurons
Activation function
Optimization function
Learning Rate
Dropout

Finally, compare between observed ‘Test_y’ and model ‘Test_yhat’

https://keras.io/api/layers/recurrent_layers/lstm/

22 of 27

Performance Metrics

Root Mean Squared Error (RMSE)

Measure of Error
0 to infinity (the lesser, the better)

Coefficient of Determination (R²)

Proportion of variance that is captured by model predictions
0 to 1 (the higher, the better)

LSTM Model : Accuracy Assessment

Obs.

Sim.

23 of 27

Preliminary LSTM Results

24 of 27

Preliminary LSTM Model Results

n_lags = 3 hr

n_ahead = 2 hr

Input Features

RH
Rmax15

Output Features

WL

8/7/2017
8/29/2017

5/6/2018

Testing

Training

25 of 27

LSTM Model Improvement: Hyper parameterization

Hyperparameter	Options Explored
Number of Neurons	10, 15, 20, 40, 50, 75
Activation Functions	relu, tanh, sigmoid
Optimization Function	Adam, Stochastic gradient descent (SGD), RMSProp
Learning Rate	1x10-3, 1 x 10-2, 1x10-1
Dropout Rate	0.1-0.5

Hyperparameter tuning means choosing a set of optimal hyperparameters for the Model

Hyper tuning Methods

Grid search - cross-validates all combinations
Random Search - choose combinations randomly
Bayesian Optimization - instead of searching blindly, learns form prior search and uses own intelligence to select the next

26 of 27

References

https://www.smlease.com/entries/technology/machine-learning-vs-deep-learning-what-is-the-difference-between-ml-and-dl/
Colah’s Blog. Understanding LSTM Networks. https://colah.github.io/posts/2015-08-Understanding-LSTMs/
https://keras.io/api/layers/recurrent_layers/lstm/
Bowes, B. D., Sadler, J. M., Morsy, M. M., Behl, M., & Goodall, J. L. (2019). Forecasting groundwater table in a flood prone coastal city with long short-term memory and recurrent neural networks. Water, 11(5), 1098.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.Hochreiter, S.; Schmidhuber, U. Long short-term memory. Neural Comput. 1997, 9, 1735–1780.
Zahura, F. T., Goodall, J. L., Sadler, J. M., Shen, Y., Morsy, M. M., & Behl, M. (2020). Training machine learning surrogate models from a high‐fidelity physics‐based model: Application for real‐time street‐scale flood prediction in an urban coastal community. Water Resources Research, 56(10).

27 of 27

Thank you

Thank You

Any Questions?!