1 of 30

Transformer Encoder for Meteorological applications

SparkMET: Transformer-based Meteorological and Environmental Tool

0

Disclaimer, Any opinion, finding, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

2 of 30

Outline:

Part 1:

Meteorological Data Structure for AI Modeling
Spatial correlation modelling

What’s wrong with CNNs?
On the relationship between self-attention and convolution layers I

Spectral and/or temporal correlation modeling

What’s wrong with CNNs and RNNs?
On the relationship between self-attention and convolution layers II

Scalability and flexibility of transformers for multivariate data modeling

Part 2:

SparkMET tool
Transformer implementation for fog forecasting

SparkMET: Transformer-based Meteorological and Environmental Tool

1

3 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

2

Meteorological and Atmospheric data mostly has 4D structure (x, y, z, t):

x, y : spatial domain of map
z : can be a variable in different altitude and/or different variables
t : time

Yang, Z., Zhang, P., Gu, S., Hu, X., Tang, S., Yang, L., Xu, N., Zhen, Z., Wang, L., Wu, Q. and Dou, F., 2019. Capability of Fengyun-3D satellite in earth system observation. Journal of meteorological research, 33(6), pp.1113-1130.

Part 1

Meteorological Data Structure for AI Modeling

4 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

3

Data Structure	Pros/Cons	Modeling Approach
1D: tabular data	Do NOT model the spatial intercorrelation between pixels of maps.	MLs (RF, GB, MLP, SVM etc.), RNNs, LSTMs
2D: multispectral image data	Do NOT model the intercorrelation between input variables	2D CNNs, ConvLSTMs,
3D: spatio-spectral data or spatio-temporal data	Restricted modeling in depth	3D CNNs, ConvLSTMs

Part 1

Meteorological Data Structure for AI Modeling

5 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

4

Part 1

Spatial correlation modeling

In 2D map-based modelling the relative orientation of physical patterns of meteorological variables and their locations are important for some applications

6 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

5

There are two main drawbacks of CNNs:

CNN do not encode the position and orientation of object
Lack of location sensitivity

Geoffrey Hinton talk titled as:

“what is wrong with CNNs?”

Sabour, S., Frosst, N. and Hinton, G.E., 2017. Dynamic routing between capsules. Advances in neural information processing systems, 30.

Hinton, G.E., Sabour, S. and Frosst, N., 2018, February. Matrix capsules with EM routing. In International conference on learning representations.

Part 1

What is wrong with CNNs?

CNNs

7 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

6

CNN do not encode the position and orientation of object

https://medium.com/ai%C2%B3-theory-practice-business/understanding-hintons-capsule-networks-part-i-intuition-b4b559d1159b

a

b

Part 1

What is wrong with CNNs?

We need to know how the physical patterns and features in the meteorological variables are oriented relative to each other.

CNNs

8 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

7

Lack of location sensitivity

CNNs replicate the same kernel’s weights across the entire input volume (weight sharing). 2D matrix as output of kernel’s convolution is an output of replicated feature detector with a portion of the input volume.
Max pooling (e.g., 2*2) consecutively looking for regions in stacked 2D matrices and select the largest number in each region (invariance of activities!)

The fact that max pooling is working so well is a big mistake and a disaster. (Hinton)

Part 1

What is wrong with CNNs?

M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu, “Spatial Transformer Networks.” ArXiv:1506.02025v3 [cs.CV], Feb. 2016.

Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H. and Wei, Y., 2017. Deformable convolutional networks. In Proceedings of the IEEE international conference on computer vision (pp. 764-773)

CNNs

9 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

8

Filters (weight matrices: query, key and value) in self-attention are dynamically calculated instead of static filter used in CNNs.

Part 1

On the relationship between self-attention and convolution layers I

Means the self-attention dynamically generates sample-specific filter parameters conditioned on the network’s input. Note that these are not fixed after training, like regular model parameters.

This is one of the main reasons that transformers have been strongly suggested for self-supervised modeling (pre-training on large (un)labeled data and then train on small labeled data)

Transformer

10 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

9

Spectral/temporal correlation on meteorological dataset provide a rich source of information for short/long-term forecasting

Part 1

Spectral and/or temporal correlation modeling

11 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

10

Part 1

What is wrong with CNNs and RNNs?

RNNs

Sequence-by-Sequence processing
Short dependencies problem
Order is matter

2D CNNs

Sum of feature maps of channels using exact the same kernel weight
No interaction between channels modelling (entities)

3D CNNs

Restricted to the kernel size in depth
Order is matter

CNNs/RNNs

12 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

11

Self-attention layer explicitly models the interaction between all entities of a sequence

Part 1

On the relationship between self-attention and convolutional layer II

Transformer

13 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

12

Self-attention is invariant to permutations and changes in the number of input data

(order does NOT matter)

Part 1

On the relationship between self-attention and convolutional layer II

14 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

13

Cordonnier, J.B., Loukas, A. and Jaggi, M., 2019. On the relationship between self-attention and convolutional layers. arXiv preprint arXiv:1911.03584.

Layer 1

Layer 2

Layer 3

Layer 4

Layer 5

Layer 6

Centers of attention of each attention head at layer 4 during training

Centers of attention of each attention head at different layers.

The central black square is the query pixel.
The heads attend on specific pixel on the image forming a grid around the query pixel.

Part 1

On the relationship between self-attention and convolutional layer II

query pixel

15 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

14

Part 1

Scalability and flexibility of transformers for multivariate data modeling

Straightforward design of transformers allows processing multiple modalities
Flexibility on having data with different spatial resolution
Attention can be also calculated on categorical dataset along with meteorological data (mainly for generalization cases)

16 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

15

Video Vision Transformer (ViViT)

Part 1

Scalability of Transformers

Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M. and Schmid, C., 2021. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6836-6846).

17 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

16

Video Vision Transformer (ViViT)

Part 1

Scalability of Transformers

Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M. and Schmid, C., 2021. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6836-6846).

18 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

17

Video Vision Transformer (ViViT)

Part 1

Scalability of Transformers

Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M. and Schmid, C., 2021. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (pp. 6836-6846).

19 of 30

Q&A

Part 2

SparkMET: Transformer-based Meteorological and Environmental Tool

18

20 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

19

SparkMET: An end-to-end Transformer-based (Spark) Meteorological and Environmental Tool (MET)

Part 2

SparkMET

21 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

20

Part 2

SparkMET

NetCDF files

Dict {input_data_path: str, target_data_path:str, start_time: str, finish_time: str, data_structure: str (defults: ‘2D’), lead_time_pred: int (defults: 24) , list_input_variable: list;data_split_dict: dict}

DATA GENERATION QUERY

Train Data

Test Data

Transformer Model Selection

Train

Predict

XAI Tool

Report

22 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

21

Part 2

SparkMET

How it works?

23 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

22

Part 2

SparkMET

Requirements

DataLoader

Already set if the data sources are NetCDF grid files
For custom data set need to create appropriate DataLoader

Transformer-based model

Already set 4 different transformer encoder models corresponding to 1D to 4D data structure for classification task.
Need develop transformer-based model for semantic segmentation and object detection tasks

24 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

23

Part 2

SparkMET

Implementation for coastal fog forecasting

25 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

24

Self-attention layer

Part 1

Self-attention layers processing manner

Input Feature Maps (X)

Transpose

Softmax

Self-attention feature maps

Attention Score

In par with this, in transformer, the representations are decomposed into key, query, and value triplets,

where key and query are addressing vectors used to calculate the similarity between different parts of

the input and compute the attention distribution to find the extent to which different parts of the input

contribute to each others’ representations.

The self-attention mechanism is an integral component of Transformers, which explicitly models the

interactions between all entities of a sequence for structured prediction tasks. Basically, a self- attention

layer updates each component of a sequence by aggregating global information from the complete input sequence.

the self-attention basically computes the dot-product of the query with all keys, which is then normalized

using softmax operator to get the attention scores. Each entity then becomes the weighted sum of all

entities in the sequence, where weights are given by the attention scores

26 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

25

Self-attention layer

Part 1

Self-attention layers processing manner

27 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

26

The multi-head self-attention layer

Multi-head focus on different parts of the input by using different query, key, and value matrices.

Positional encoding (P)

Part 1

Self-attention layers processing manner

28 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

27

Part 2

Transformer implementation for fog forecasting

Model	Hit	Miss	FA	CR	POD	F	FAR	CSI	PSS	HSS	CSS	T/epoch(s)
T_2D_Patch	7	60	2	2159	0.10	0.009	0.22	0.11	0.10	0.18	0.75	~17s
T_2D_Channel	29	38	133	2028	0.43	0.06	0.82	0.14	0.37	0.22	0.16	~17s
T_2D_Ch_Shuffle	18	59	85	2066	0.23	0.04	0.81	0.11	0.19	0.17	0.15	~17s

Test Dataset

Samples: 2228

Fog Cases: 67

29 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

28

Part 2

Transformer implementation for fog forecasting

Pros:

Transformers are more flexible and straightforward for modelling data with higher dimensionality
Transformers theoretically better take the spectral/temporal correlation into account
Transformers are more straightforward for post-processing and XAI analysis

Cons:

Computationally expensive
Still tokenize method the input data is a question

30 of 30

SparkMET: Transformer-based Meteorological and Environmental Tool

29

Invitation for collaboration

Collaboration on expanding SparkMET to test on different tasks and different datasets
Publications

Thanks!