1 of 30

Transformer Encoder for Meteorological applications

SparkMET: Transformer-based Meteorological and Environmental Tool

0

Disclaimer, Any opinion, finding, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

2 of 30

Outline:

Part 1:

    • Meteorological Data Structure for AI Modeling
    • Spatial correlation modelling
      • What’s wrong with CNNs?
      • On the relationship between self-attention and convolution layers I
    • Spectral and/or temporal correlation modeling
      • What’s wrong with CNNs and RNNs?
      • On the relationship between self-attention and convolution layers II
    • Scalability and flexibility of transformers for multivariate data modeling

Part 2:

    • SparkMET tool
    • Transformer implementation for fog forecasting

SparkMET: Transformer-based Meteorological and Environmental Tool

1

3 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

2

  • Meteorological and Atmospheric data mostly has 4D structure (x, y, z, t):

    • x, y : spatial domain of map
    • z : can be a variable in different altitude and/or different variables
    • t : time

Yang, Z., Zhang, P., Gu, S., Hu, X., Tang, S., Yang, L., Xu, N., Zhen, Z., Wang, L., Wu, Q. and Dou, F., 2019. Capability of Fengyun-3D satellite in earth system observation. Journal of meteorological research33(6), pp.1113-1130.

Part 1

Meteorological Data Structure for AI Modeling

4 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

3

Data Structure

Pros/Cons

Modeling Approach

1D: tabular data

Do NOT model the spatial intercorrelation between pixels of maps.

MLs (RF, GB, MLP, SVM etc.), RNNs, LSTMs

2D: multispectral image data

Do NOT model the intercorrelation between input variables

2D CNNs,

ConvLSTMs,

3D: spatio-spectral data or spatio-temporal data

Restricted modeling in depth

3D CNNs,

ConvLSTMs

Part 1

Meteorological Data Structure for AI Modeling

5 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

4

Part 1

Spatial correlation modeling

In 2D map-based modelling the relative orientation of physical patterns of meteorological variables and their locations are important for some applications

6 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

5

There are two main drawbacks of CNNs:

    • CNN do not encode the position and orientation of object
    • Lack of location sensitivity

Geoffrey Hinton talk titled as:

“what is wrong with CNNs?”

Sabour, S., Frosst, N. and Hinton, G.E., 2017. Dynamic routing between capsules. Advances in neural information processing systems30.

Hinton, G.E., Sabour, S. and Frosst, N., 2018, February. Matrix capsules with EM routing. In International conference on learning representations.

Part 1

What is wrong with CNNs?

CNNs

7 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

6

  • CNN do not encode the position and orientation of object

https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b

a

b

Part 1

What is wrong with CNNs?

We need to know how the physical patterns and features in the meteorological variables are oriented relative to each other.

CNNs

8 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

7

    • Lack of location sensitivity
  • CNNs replicate the same kernel’s weights across the entire input volume (weight sharing). 2D matrix as output of kernel’s convolution is an output of replicated feature detector with a portion of the input volume.
  • Max pooling (e.g., 2*2) consecutively looking for regions in stacked 2D matrices and select the largest number in each region (invariance of activities!)

The fact that max pooling is working so well is a big mistake and a disaster. (Hinton)

Part 1

What is wrong with CNNs?

M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial Transformer Networks.” ArXiv:1506.02025v3 [cs.CV], Feb. 2016.

Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H. and Wei, Y., 2017. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 764-773)

CNNs

9 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

8

Filters (weight matrices: query, key and value) in self-attention are dynamically calculated instead of static filter used in CNNs.

Part 1

On the relationship between self-attention and convolution layers I

  • Means the self-attention dynamically generates sample-specific filter parameters conditioned on the network’s input. Note that these are not fixed after training, like regular model parameters.

  • This is one of the main reasons that transformers have been strongly suggested for self-supervised modeling (pre-training on large (un)labeled data and then train on small labeled data)

Transformer

10 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

9

Spectral/temporal correlation on meteorological dataset provide a rich source of information for short/long-term forecasting

Part 1

Spectral and/or temporal correlation modeling

11 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

10

Part 1

What is wrong with CNNs and RNNs?

RNNs

  • Sequence-by-Sequence processing
  • Short dependencies problem
  • Order is matter

2D CNNs

  • Sum of feature maps of channels using exact the same kernel weight
  • No interaction between channels modelling (entities)

3D CNNs

  • Restricted to the kernel size in depth
  • Order is matter

CNNs/RNNs

12 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

11

Self-attention layer explicitly models the interaction between all entities of a sequence

Part 1

On the relationship between self-attention and convolutional layer II

Transformer

13 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

12

Self-attention is invariant to permutations and changes in the number of input data

(order does NOT matter)

Part 1

On the relationship between self-attention and convolutional layer II

14 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

13

Cordonnier, J.B., Loukas, A. and Jaggi, M., 2019. On the relationship between self-attention and convolutional layers. arXiv preprint arXiv:1911.03584.

Layer 1

Layer 2

Layer 3

Layer 4

Layer 5

Layer 6

Centers of attention of each attention head at layer 4 during training

Centers of attention of each attention head at different layers.

  • The central black square is the query pixel.
  • The heads attend on specific pixel on the image forming a grid around the query pixel.

Part 1

On the relationship between self-attention and convolutional layer II

query pixel

15 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

14

Part 1

Scalability and flexibility of transformers for multivariate data modeling

  • Straightforward design of transformers allows processing multiple modalities
  • Flexibility on having data with different spatial resolution
  • Attention can be also calculated on categorical dataset along with meteorological data (mainly for generalization cases)

16 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

15

Video Vision Transformer (ViViT)

Part 1

Scalability of Transformers

Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M. and Schmid, C., 2021. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6836-6846).

17 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

16

Video Vision Transformer (ViViT)

Part 1

Scalability of Transformers

Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M. and Schmid, C., 2021. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6836-6846).

18 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

17

Video Vision Transformer (ViViT)

Part 1

Scalability of Transformers

Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M. and Schmid, C., 2021. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6836-6846).

19 of 30

Q&A

Part 2

SparkMET: Transformer-based Meteorological and Environmental Tool

18

20 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

19

SparkMET: An end-to-end Transformer-based (Spark) Meteorological and Environmental Tool (MET)

Part 2

SparkMET

21 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

20

Part 2

SparkMET

NetCDF files

NetCDF files

NetCDF files

Dict {input_data_path: str, target_data_path:str, start_time: str, finish_time: str, data_structure: str (defults: ‘2D’), lead_time_pred: int (defults: 24) , list_input_variable: list;data_split_dict: dict}

DATA GENERATION QUERY

Train Data

Test Data

Transformer Model Selection

Train

Predict

XAI Tool

Report

22 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

21

Part 2

SparkMET

How it works?

23 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

22

Part 2

SparkMET

Requirements

  • DataLoader
    • Already set if the data sources are NetCDF grid files
    • For custom data set need to create appropriate DataLoader
  • Transformer-based model
    • Already set 4 different transformer encoder models corresponding to 1D to 4D data structure for classification task.
    • Need develop transformer-based model for semantic segmentation and object detection tasks

24 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

23

Part 2

SparkMET

Implementation for coastal fog forecasting

25 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

24

    • Self-attention layer

Part 1

Self-attention layers processing manner

Input Feature Maps (X)

 

 

 

 

 

 

 

Transpose

Softmax

 

Self-attention feature maps

 

Attention Score

26 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

25

    • Self-attention layer

Part 1

Self-attention layers processing manner

 

 

 

27 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

26

    • The multi-head self-attention layer

Multi-head focus on different parts of the input by using different query, key, and value matrices.

    • Positional encoding (P)

 

 

Part 1

Self-attention layers processing manner

28 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

27

Part 2

Transformer implementation for fog forecasting

Model

Hit

Miss

FA

CR

POD

F

FAR

CSI

PSS

HSS

CSS

T/epoch(s)

T_2D_Patch

7

60

2

2159

0.10

0.009

0.22

0.11

0.10

0.18

0.75

~17s

T_2D_Channel

29

38

133

2028

0.43

0.06

0.82

0.14

0.37

0.22

0.16

~17s

T_2D_Ch_Shuffle

18

59

85

2066

0.23

0.04

0.81

0.11

0.19

0.17

0.15

~17s

Test Dataset

Samples: 2228

Fog Cases: 67

29 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

28

Part 2

Transformer implementation for fog forecasting

Pros:

    • Transformers are more flexible and straightforward for modelling data with higher dimensionality
    • Transformers theoretically better take the spectral/temporal correlation into account
    • Transformers are more straightforward for post-processing and XAI analysis

Cons:

    • Computationally expensive
    • Still tokenize method the input data is a question

30 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

29

Invitation for collaboration

  • Collaboration on expanding SparkMET to test on different tasks and different datasets
  • Publications

Thanks!