1 of 27

Wrap up of caloSim project�-- and next plan?

2 of 27

  • Good/bad news? CaloFlow seems make it for complex dataset
    • Now it looks like the flagship model
    • Though DS2 and DS3 is not perfect (student will be even worse)
  • But we can learn very valuable things from them!
    • Fast-converging training method (large LR + annihilation)
    • Inductive training
      • Very similar to what we use for the layer+shape, they do even more aggressive, layer1+layer2+....+shape1+shape2+....
    • Speed is not a problem because of the teacher-student techniques
      • Can always train the fast model with the guide of slow one
  • Their missing parts:
    • The student model of DS2/3 not finished – will they finally complete this?

AUC: 0.5 is perfect (fully confuse the classifier)

JSD: 0 is perfect ( BCE loss reaches max, sqrt[2] )

3 of 27

  • New player in this field
    • Advantage: successful example from industry
    • Difficulty: high computing power needed for training and tuning
  • No new results shown at ML4jets : all on previous caloScore paper
  • The AUC metric is not good (0.98)
    • This model need further optimization
    • Or understand the reason of bad AUC
  • How they will improve this model?

4 of 27

GAN model

  • ATLAS FastCaloGAN: first application to real detector
    • WGAN-GP model: lightweight and traditional design
    • Voxelized dataset (optimized by Atlas FastSim team)
    • Fine tuned model: 300GAN for each particle in each eta
    • Not for other dataset
  • IEA-GAN: well crafted GAN with physics motivated module/loss
    • Deep Metric Learning
    • Knowledge Distillation
    • Permutation Equivariant Relational Inductive Bias
    • (Still learning the details, more on the slides linked)
    • Mostly on pixel image, trying on caloChanllge but no results shown

5 of 27

Mixed model

  • Bib-AE and extension: they test multiple different models for high granularity ILD dataset
    • Bib-AE(VAE+)GAN : mostly like a GAN model
    • VAE+KDE: like what we do, but they just use it to extract the physics information
    • CaloFlow on ILD dataset
    • Bib-AE has poor classifier metric! So they probably moving to flow model
    • No plan for challenge

6 of 27

Is there still chance for VAE model?

7 of 27

VQVAE: Nice presentation from Chase!

  • Highlights:
    • First VQVAE model 🡪 new possibility other than flow or diffusion
    • Equivariance covolution and special latent space 🡪 physics inside
    • Two-step model: flexibility of training / tuning
  • Open questions:
    • DS1 still need optimization: now AUC is not good...
    • DS2 and DS3 need further development to complete if wish to join the competition

8 of 27

Recap of datasets

DS1: easy dataset, 368 dimensions. irregular geometry and similar to caloGAN dataset

DS2: medium dataset, 6480 dim., cubic (but [r,phi,z] cyclindrical coordinate)

DS3: hard dataset, 40500dim., cubic (but [r,phi,z] cyclindrical coordinate)

→ 3 sets of metrics used for the evaluation how “real” the generated image is

  • Avegage image
  • Physics variable similarity (chi2)
  • Classifier metric (max. AUC with fixed architecture and epoches)

9 of 27

More details about VQVAE on DS1

  • 3 step model
    • (MLP-based) VAE generates layer energy (5 numbers)
    • RNN generates priori (16 dim * 512 codes)
    • VQVAE generates normalized pixel energy (0~1 in each layer), then multiply the layer energy to the final result
  • Average and individual image, physics metric both great
  • Classifier metric is not good: an MLP tagger trained 50 epochs could easily discriminate the true/fake (AUC=0.94~0.99 depedens on which inputs used)
  • Next problem is to improve the fidility → how to generate more real image to confuse the MLP classifier?

10 of 27

More details about VQVAE on DS1

Image looks like very “real”

avg.

individual

(normed)

11 of 27

Metrics :

Two sets of metrics to measure “hwo good the generation”:

  • Chi2 difference of distribution of high level features
    • High level feature means the average, std, … so on of the pixel energy
  • Classifier metric (AUC and JSD)
    • Train a MLP classifier to seperate the truth v.s. generated
    • The AUC and JSD of that classifier → metric

12 of 27

Chi2 Metric

Physics metric looks good (chi2 of two high-level variable)

sum of layer N

sum of all / condition (particle incident energy)

0 means perfect

13 of 27

More details about VQVAE on DS1

Some physics variable metric is not very good like

enrgy weighted avg(X)

enrgy weighted avg(Y)

enrgy weighted std(X)

enrgy weighted std(Y)

14 of 27

Classifier Metric

DS1:

DS2, DS3:

Flow model

AUC/JSD

low level classifier

0.739/0.131

high level classifier

0.556/0.015

Our model

AUC/JSD

low level classifier

0.995/0.873

high level classifier

0.947/0.579

Flow model

DS2 AUC/JSD

DS3 AUC/JSD

low level classifier

0.823/0.263

0.889/0.411

high level classifier

0.860/0.329

0.931/0.524

15 of 27

More details about VQVAE on DS1

Classifier using fixed MLP architecture and trained for 50 epochs to discriminate the generated and truth image

→ 3 model used different inputs:

Low level: all pixel energy feed into

Low level normed: nomalizaed pixel energy

High level: use the previous mentioned physics variable only

16 of 27

More details about VQVAE on DS2&3

  • Two-step model:
    • VQVAE with “cylindrical convolution”, 200 dim latent and 512 codes to learn the log1p(E/cond)
    • RNN to generate priori: flatten 400 sequence
  • Preliminary results:
    • We first trained an AE, then add VQ to become VQVAE→results not good → training is very slow and hard to tune
    • For RNN, it could learn the priori, but need several hours to 1 day
  • Difficulty:
    • With larger dataset, the training become slow and tuning the parameters become difficult (for DS1 a full trianing need tens minutes but DS2 a full trianing need up to 1day)
    • Strange pattern shown in the cylindrical conv
    • Besides: RNN now become a slow bottleneck: generation time is proportional to latent length!!! (and for long latent, the batch size has to be reduced → 10-1000 times slow!)

17 of 27

More details about VQVAE on DS2&3

AE performance

avg.

individual

18 of 27

More details about VQVAE on DS2&3

VQVAE performance

avg.

individual

19 of 27

More details about VQVAE on DS2&3

VQVAE performance

losses/metrics

encoded & quantized

code distribution

20 of 27

Another (similar) VAE model on ML4jets

21 of 27

  • From “Layer 6 AI” company (author with physics background)
  • Very similar idea as our VQVAE:
    • Two step model, VAE to compress and NF to learn the latent
    • “Manifold” 🡨🡪 “Quantized latent space” ??
    • They use VAE to learn the manifold, and We use VQVAE to learn the quantized latent space
      • Similar encoder/decoder architecture
    • They use NF to learn the latent manifold, and We use RNN to learn the discrete codes in quantized latent
      • Similar dimension in latent
  • They got better result in DS1 than us:
    • AUC is 0.78 – which is very good!
  • Can we learn somethings from them or even collaborate to deal with the DS2&3?

22 of 27

Manifold?�~latent?

23 of 27

Encoder/decoder arch

We choose 3 layers with 500 units

24 of 27

Dimension of latent

We choose 16, from experiments

We choose 200, but seems can be reduced

25 of 27

Discussion

26 of 27

Is there still chance for VAE model?

  • VAE model is a classical model
    • Pro: classical enough and many tricks, understand of the behavior
    • Con: Not very competitive with other new model on its own
  • VAE is a simple model
    • Simple enough to generalize to complex dataset (but not guarantee the same performance)
    • Good playground to test new idea
      • Like manifold, equivariance, vector quantization/two step model, and more
  • Where we go?
    • Performance? (see red in p9 and p14)
    • Adding more “physics”?
    • Lightweight and fastest model?
    • Or just end here and look for other direction?
  • (My timeline)
    • <=50% time on this project due to other ATLAS analyses – till next year March.
    • Long term I will have more time but now let’s focus on the competition timeline

27 of 27

More interesting things