1 of 27

09.27.2022

Course No. : CE 6290

Course Name : Hydroinformatics

Binata Roy�PhD Student, Civil Engineering

Department of Engineering Systems and Environment�br3xk@virginia.edu

Application of Deep Learning (LSTM) in Flood Forecasting

2 of 27

Organization of the Lecture

  • Deep Learning
  • Deep Learning Models
  • LSTM Architecture
  • LSTM in Flood Forecasting
  • Preliminary LSTM Results

3 of 27

Deep Learning

4 of 27

Artificial Intelligence > Machine Learning > Deep Learning

  1. Artificial Intelligence

Machines perform tasks typically requiring �human intelligence

2. Machine Learning�Machines learn from own experience

3. Deep Learning�Machines learn using ‘Artificial Neural Networks’ – inspired by structure and function of human brain

5 of 27

Machine Learning vs. Deep Learning

  • Feature Extraction: Extract all the required features for problem statement
  • Classification: Categorize data into different classes based on feature extracted

  • In ML, features in data are defined manually by users
  • In DL, features in data are defined by deep learning algorithms themselves

6 of 27

Deep Learning Models

7 of 27

Artificial Neural Networks (ANN)

  • ANN is a group of multiple neurons at each layer

  • ANN consists of 3 layers – Input, Hidden and Output

      • Input layer accepts inputs
      • Hidden layer processes inputs
      • Output layer produces result

  • Advantages:
    • Activation functions introduce nonlinear properties to the network

  • Limitations:
    • ANN loses the spatial features of an image => solved by CNN – won’t cover
    • Vanishing and Exploding Gradient as it propagates backward => solved by RNN

Forward Propagation

Backward Propagation

8 of 27

Recurrent Neural Networks (RNN)

  • RNN - A looping constraint on the hidden layer of ANN turns to RNN
    • Looping constraint ensures that sequential information is captured in input data
    • Especially prevalent in Sequential time-series dataset

  • Advantages
    • RNN captures sequential information present in input data i.e. dependency between the data while making predictions
  • Challenges
    • RNNs with a large number of time steps suffer from the vanishing and exploding gradient problem

9 of 27

RNN Limitation

Where the gap between the relevant information and the place that it’s needed is small, RNNs can learn to use the past information

Consider what happens if we unroll the loop with time

As that gap grows, RNNs become unable to learn to connect the information from past due to vanishing and exploding gradient

RNN typically remembers prior 10 time-steps

Time: T1 T2 T3 ..…… Tn

unroll

10 of 27

LSTM Architecture

11 of 27

Long Short Term Memory (LSTM)

RNN

LSTM

In standard RNNs, repeating module has a very simple structure - a single tanh layer.

In LSTMs, the repeating module has a different structure. Instead of having a single neural network layer, there are four, interacting in a very special way.

Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term temporal dependencies by overcoming gradient vanish and exploding problem of traditional RNN [Hochreiter&Schmidhuber. 1997]

12 of 27

LSTM Architecture

Forget Gate

Input Gate

Output Gate

Cell State from timestamp t -1

Hidden State from timestamp t -1

Cell State to timestamp t + 1

Hidden State output to timestamp t + 1

Input Data timestamp t

  • The key to LSTMs is the CELL STATE, the horizontal line running through the top of the diagram
  • It runs straight down the entire chain, with only some minor linear interactions to flow information along it unchanged

Forget Gate :

It takes cell & hidden state from previous timestep and input from current timestep and decides to “keep it” or “forget it.” and update the cell state

Input Gate :

It decides which values we’ll update and then add it with the decision made by forget gate

Output Gate :

It decides which we’ll output to next timestep

Forget Gate :

It takes ht−1 and xt, and outputs a number between 0 and 1 for each number in the cell state. 1 = “keep it” while 0 = “forget it.” and update the cell state accordingly.

Input Gate :

The “input gate layer” decides which values we’ll update. Then we add it with the decision made by the forget gate.

Output Gate :

The “output gate layer” decides which we’ll output. We put the cell state through tanh (to push the values to be between −1 and 1) and multiply it by the output of the sigmoid gate, so that we only output the parts we decided .

LSTM at t timestep

13 of 27

RNN vs. LSTM with Example

Dave eats Pasta everyday, hence his favorite cuisine is (………..)

14 of 27

RNN vs. LSTM with Example

Traditional RNN

LSTM

Dave eats Pasta everyday, hence his favorite cuisine is (………..)

Dave

eats

Pasta

Dave

Dave, eats

eats, Pasta

is

Cuisine, is

???

……

Dave

eats

Pasta

-------

…….

Pasta

is

……

Pasta

Italian

  • RNNs - looping constraint captures sequential information in input data while predictions – not used much
  • RNNs with large time steps suffer vanishing and exploding gradient problem
  • RNN can only remember prior 10 time-steps
  • LSTM – a special RNN capable of learning long-term temporal information overcoming gradient vanishing and exploding problem
  • LSTM retain more information in memory than RNN
  • LSTM is the state of art ML algorithm

15 of 27

LSTM in Flood Forecasting

16 of 27

Flood forecasting at flood-prone streets using LSTM

  • Flooding is a major concern trigger by projected increases in rainfall frequency/volume and relative SLR in future
  • Street-level flood forecasting is major non-structural measure to mitigate disruptions by urban flooding
  • Flood forecasting provides flood information with lead-time to prepare counter-measures against impending floods
  • High fidelity physics-based models – TUFlow, MIKE, HEC-RAS require high computation time, power and cost which are impractical for instantaneous street-level flood forecasting
  • Data-driven - LSTM model can come to rescue

17 of 27

LSTM Forecasting Model

Using InputsP at [(t-n)…(t+m)] and WL at [(t-n)…(t)], Forecast Label - WL at [(t+1)….(t+m)]

  • Let’s say, n_lagged = 3 and n_ahead = 2
    • Time lag of 3 hours used to forecast the WL at 2 hours ahead
    • At t hr timestep, P from previous 3 timesteps (t-2, t-1, t) and future 2 timestep (t+1, t+2) and WL from previous 3 timesteps (t-2, t-1, t) to forecast WL (t+1) and (t+2)

 

WLt+1…t+m

Pt-n…t+m

WLt-n…t

LSTM

Model

Input

    • Rainfall (P)

Forecast

    • Water Depth (WL)

Time (hrs)

Rainfall (P)

WL

1/1/2010 0:00

Known

Known

1/1/2010 1:00

Known

Known

1/1/2010 2:00

Known

known

1/1/2010 3:00

Known

?

1/1/2010 4:00

Known

?

1/1/2010 5:00

Known

?

1/1/2010 6:00

Known

?

T=t-2 T=t-1 *T=t

T=t+1

T=t+2

 

 

18 of 27

Study Area : Norfolk City Virginia

Taken from Faria et al 2021.

  • City of Norfolk, Virginia, located along US east coast
  • It is the 2nd most vulnerable coastal cities to SLR in the U.S. after New Orleans
  • It serves important economic and national security roles with one of the largest commercial ports in USA and the largest naval base in world
  • Considering its vulnerability to flooding, the vital role in national economy and security, Norfolk is selected as study area

19 of 27

LSTM Model: Input Features and Storm Events

Input Features

6/5/2016

7/31/2016

8/9/2016

9/3/2016

9/19/2016

9/20/2016

9/21/2016

10/8/2016

10/9/2016

1/2/2017

7/15/2017

8/7/2017

8/29/2017

6/22/2018

7/30/2018

8/20/2018

Train Events (16)

10/29/2017

5/6/2018

5/28/2018

8/11/2018

Test Events (4)

Input Features

Environmental input features

  • Total hourly rainfall
  • Maximum 15 min rainfall in an hour
  • Cumulative rainfall in previous 2 hr
  • Cumulative rainfall in previous 72 hr
  • Hourly tide level

Topographic Input features

  • Elevation
  • Topographic wetness index
  • Depth to water index

Output Features

  • Water depth

Storm Events (20)

20 of 27

LSTM Model: 3D Input Data Processing

LSTM 3D Data Preparation

LSTM needs a 3D tensor with shape [batch, timesteps, feature]

Features : number of input features considered e.g., rainfall, tide level etc.

Timesteps : how many timestep considered for prediction

Batch size : how many samples in each batch

21 of 27

LSTM Model : Model Initialization

#load python library

import Keras

# define model

model = Sequential()

# add hidden layer(s)

model.add(LSTM(units= 50, activation='tanh', input_shape=(None, train_X.shape[2]), use_bias=True,

bias_regularizer=L1L2(l1=0.01, l2=0.01)))

# add dropout between hidden and output layer to improve generalization/ reduce overfitting

model.add(Dropout(.355))

# add output layer with linear activation

model.add(Dense(activation='linear', units=n_ahead-1, use_bias=True))

# set optimizer

adam = keras.optimizers.Adam(lr=0.001)

# compile model

model.compile(loss=rmse, optimizer=adam)

#model fit with train data

model.fit(train_X, train_y, batch_size=n_batch, epochs=n_epochs, verbose=2, shuffle=False, callbacks=[earlystop])

# predictions with test data

Test_yhat = model.predict(Test_X)

LSTM Parameters

  • Number of Neurons
  • Activation function
  • Optimization function
  • Learning Rate
  • Dropout

Finally, compare between observed ‘Test_y’ and model ‘Test_yhat’

22 of 27

Performance Metrics

  • Root Mean Squared Error (RMSE)
    • Measure of Error
    • 0 to infinity (the lesser, the better)

  • Coefficient of Determination (R2)
    • Proportion of variance that is captured by model predictions
    • 0 to 1 (the higher, the better)

LSTM Model : Accuracy Assessment

Obs.

Sim.

23 of 27

Preliminary LSTM Results

24 of 27

Preliminary LSTM Model Results

n_lags = 3 hr

n_ahead = 2 hr

Input Features

  • RH
  • Rmax15

Output Features

  • WL

8/7/2017

8/29/2017

5/6/2018

Testing

Training

25 of 27

LSTM Model Improvement: Hyper parameterization

Hyperparameter

Options Explored

Number of Neurons

10, 15, 20, 40, 50, 75

Activation Functions

relu, tanh, sigmoid

Optimization Function

Adam, Stochastic gradient descent (SGD), RMSProp

Learning Rate

1x10-3, 1 x 10-2, 1x10-1

Dropout Rate

0.1-0.5

Hyperparameter tuning means choosing a set of optimal hyperparameters for the Model

Hyper tuning Methods

  • Grid search - cross-validates all combinations
  • Random Search - choose combinations randomly
  • Bayesian Optimization - instead of searching blindly, learns form prior search and uses own intelligence to select the next

26 of 27

References

27 of 27

Thank you

Thank You

Any Questions?!