2 of 27

Flow-based model

Good/bad news? CaloFlow seems make it for complex dataset

Now it looks like the flagship model
Though DS2 and DS3 is not perfect (student will be even worse)

But we can learn very valuable things from them!

Fast-converging training method (large LR + annihilation)
Inductive training

Very similar to what we use for the layer+shape, they do even more aggressive, layer1+layer2+....+shape1+shape2+....

Speed is not a problem because of the teacher-student techniques

Can always train the fast model with the guide of slow one

Their missing parts:

The student model of DS2/3 not finished – will they finally complete this?

AUC: 0.5 is perfect (fully confuse the classifier)

JSD: 0 is perfect ( BCE loss reaches max, sqrt[2] )

3 of 27

Diffusion model

New player in this field

Advantage: successful example from industry
Difficulty: high computing power needed for training and tuning

No new results shown at ML4jets : all on previous caloScore paper
The AUC metric is not good (0.98)

This model need further optimization
Or understand the reason of bad AUC

How they will improve this model?

4 of 27

GAN model

ATLAS FastCaloGAN: first application to real detector

WGAN-GP model: lightweight and traditional design
Voxelized dataset (optimized by Atlas FastSim team)
Fine tuned model: 300GAN for each particle in each eta
Not for other dataset

IEA-GAN: well crafted GAN with physics motivated module/loss

Deep Metric Learning
Knowledge Distillation
Permutation Equivariant Relational Inductive Bias
(Still learning the details, more on the slides linked)
Mostly on pixel image, trying on caloChanllge but no results shown

5 of 27

Mixed model

Bib-AE and extension: they test multiple different models for high granularity ILD dataset

Bib-AE(VAE+)GAN : mostly like a GAN model
VAE+KDE: like what we do, but they just use it to extract the physics information
CaloFlow on ILD dataset
Bib-AE has poor classifier metric! So they probably moving to flow model
No plan for challenge

6 of 27

Is there still chance for VAE model?

7 of 27

VQVAE: Nice presentation from Chase!

Highlights:

First VQVAE model 🡪 new possibility other than flow or diffusion
Equivariance covolution and special latent space 🡪 physics inside
Two-step model: flexibility of training / tuning

Open questions:

DS1 still need optimization: now AUC is not good...
DS2 and DS3 need further development to complete if wish to join the competition

8 of 27

Recap of datasets

DS1: easy dataset, 368 dimensions. irregular geometry and similar to caloGAN dataset

DS2: medium dataset, 6480 dim., cubic (but [r,phi,z] cyclindrical coordinate)

DS3: hard dataset, 40500dim., cubic (but [r,phi,z] cyclindrical coordinate)

→ 3 sets of metrics used for the evaluation how “real” the generated image is

Avegage image
Physics variable similarity (chi2)
Classifier metric (max. AUC with fixed architecture and epoches)

9 of 27

More details about VQVAE on DS1

3 step model

(MLP-based) VAE generates layer energy (5 numbers)
RNN generates priori (16 dim * 512 codes)
VQVAE generates normalized pixel energy (0~1 in each layer), then multiply the layer energy to the final result

Average and individual image, physics metric both great
Classifier metric is not good: an MLP tagger trained 50 epochs could easily discriminate the true/fake (AUC=0.94~0.99 depedens on which inputs used)
Next problem is to improve the fidility → how to generate more real image to confuse the MLP classifier?

10 of 27

More details about VQVAE on DS1

Image looks like very “real”

avg.

individual

(normed)

11 of 27

Metrics :

Two sets of metrics to measure “hwo good the generation”:

Chi2 difference of distribution of high level features

High level feature means the average, std, … so on of the pixel energy

Classifier metric (AUC and JSD)

Train a MLP classifier to seperate the truth v.s. generated
The AUC and JSD of that classifier → metric

12 of 27

Chi2 Metric

Physics metric looks good (chi2 of two high-level variable)

sum of layer N

sum of all / condition (particle incident energy)

0 means perfect

13 of 27

More details about VQVAE on DS1

Some physics variable metric is not very good like

enrgy weighted avg(X)

enrgy weighted avg(Y)

enrgy weighted std(X)

enrgy weighted std(Y)

14 of 27

Classifier Metric

DS1:

DS2, DS3:

Flow model	AUC/JSD
low level classifier	0.739/0.131
high level classifier	0.556/0.015

Our model	AUC/JSD
low level classifier	0.995/0.873
high level classifier	0.947/0.579

Flow model	DS2 AUC/JSD	DS3 AUC/JSD
low level classifier	0.823/0.263	0.889/0.411
high level classifier	0.860/0.329	0.931/0.524

15 of 27

More details about VQVAE on DS1

Classifier using fixed MLP architecture and trained for 50 epochs to discriminate the generated and truth image

→ 3 model used different inputs:

Low level: all pixel energy feed into

Low level normed: nomalizaed pixel energy

High level: use the previous mentioned physics variable only

16 of 27

More details about VQVAE on DS2&3

Two-step model:

VQVAE with “cylindrical convolution”, 200 dim latent and 512 codes to learn the log1p(E/cond)
RNN to generate priori: flatten 400 sequence

Preliminary results:

We first trained an AE, then add VQ to become VQVAE→results not good → training is very slow and hard to tune
For RNN, it could learn the priori, but need several hours to 1 day

Difficulty:

With larger dataset, the training become slow and tuning the parameters become difficult (for DS1 a full trianing need tens minutes but DS2 a full trianing need up to 1day)
Strange pattern shown in the cylindrical conv
Besides: RNN now become a slow bottleneck: generation time is proportional to latent length!!! (and for long latent, the batch size has to be reduced → 10-1000 times slow!)

17 of 27

More details about VQVAE on DS2&3

AE performance

avg.

individual

18 of 27

More details about VQVAE on DS2&3

VQVAE performance

avg.

individual

19 of 27

More details about VQVAE on DS2&3

VQVAE performance

losses/metrics

encoded & quantized

code distribution

20 of 27

Another (similar) VAE model on ML4jets

21 of 27

Manifold learning VAE

From “Layer 6 AI” company (author with physics background)
Very similar idea as our VQVAE:

Two step model, VAE to compress and NF to learn the latent
“Manifold” 🡨🡪 “Quantized latent space” ??
They use VAE to learn the manifold, and We use VQVAE to learn the quantized latent space

Similar encoder/decoder architecture

They use NF to learn the latent manifold, and We use RNN to learn the discrete codes in quantized latent

Similar dimension in latent

They got better result in DS1 than us:

AUC is 0.78 – which is very good!

Can we learn somethings from them or even collaborate to deal with the DS2&3?

22 of 27

Manifold?�~latent?

23 of 27

Encoder/decoder arch

We choose 3 layers with 500 units

24 of 27

Dimension of latent

We choose 16, from experiments

We choose 200, but seems can be reduced

26 of 27

Is there still chance for VAE model?

VAE model is a classical model

Pro: classical enough and many tricks, understand of the behavior
Con: Not very competitive with other new model on its own

VAE is a simple model

Simple enough to generalize to complex dataset (but not guarantee the same performance)
Good playground to test new idea

Like manifold, equivariance, vector quantization/two step model, and more

Where we go?

Performance? (see red in p9 and p14)
Adding more “physics”?
Lightweight and fastest model?
Or just end here and look for other direction?

(My timeline)

<=50% time on this project due to other ATLAS analyses – till next year March.
Long term I will have more time but now let’s focus on the competition timeline

1 of 27

2 of 27

3 of 27

4 of 27

5 of 27

6 of 27

7 of 27

8 of 27

9 of 27

10 of 27

11 of 27

12 of 27

13 of 27

14 of 27

15 of 27

16 of 27

17 of 27

18 of 27

19 of 27

20 of 27

21 of 27

22 of 27

23 of 27

24 of 27

25 of 27

26 of 27

27 of 27