2 of 17

Deep Chatterjee, MIT

What do you work on?

Real-time detection of gravitational waves with�Laser Interferometer Gravitational-wave Observatory (LIGO)

What do you want to collaborate on with people outside your are of expertise?

Algorithms; Theory; Software & computing infrastructure
Running realtime ML inference - tools for our use case

What do you like to learn more about?

Other Likelihood-free Inference techniques.
Representation learning. Building good data summaries.
What others are doing…

3 of 17

2015

Also in 2017

Today, we detect a merger every 1-2 days.

Share information with other observatories to follow-up if interesting

2020s

4 of 17

Rapid Inference Faster coordination More Science

GW from binaries involve 15 parameters.

Well modeled
Bayesian Parameter estimation via MCMC/Nested sampling.

Repeated likelihood evaluations forms the expensive step

Today, updates from bayesian PE is several hours after event detection
Scaling to more events in the future is difficult.

Likelihood-free Inference

Learn an approximator for the posterior from simulations,�instead of sampling the posterior.
During inference, draw samples from the approximator.

Posterior

Likelihood

Prior

In MCMC: Sample posterior

In LFI: Learn the distribution from simulations

5 of 17

Overall idea of posterior estimation

Learn an approximator for the posterior from simulations��
Use pairs to learn the approximator.�
In our case the approximator is a normalizing flow.

Flexible neural network transforms, that are learned during training
Transforms the parameters conditioned on data(-summary) into a simple base distribution.

Detector noise

Toy Example: �Linear Reg.

From GR

Simulate

6 of 17

Normalizing flows

��

Have two components:

Simpler, base distribution, like a normal dist. - easy to sample and density est.
Transform - A neural network that learns to transform complex variable to base distribution��
We use affine autoregressive transforms [Germain+ (2015), Kingma+ (2017)]

Trained by maximizing likelihood

This is easy since base distribution is easy to density estimate from
Transform is fast to evaluate*

7 of 17

Details about Normalizing Flow

�

Practical details involve how the transform is implemented

Affine transforms
Splines
Integral based
…

Autoregressive property ensures efficient determinant computation

Transforms are lower triang.; determinant equal product of diag. elem.

We use a masked autoregressive flow - affine transform between layers

8 of 17

Waveform generation on Accelerated hardware

Re-implement some CBC waveforms as a part of ml4gw.
Current work uses IMRPhenomD.

On-the-fly waveform generation.
Batch of 1000 ~ 0.15s on A40 GPU.
IMRPhenomPv2 recently added, to be used for subsequent analyses.

9 of 17

Data generation

Use real noise from the HL detectors

Stretches of ANALYSIS_READY segments from O3

Background transferred to GPU

This study uses 20K sec @ 2048 Hz

Data loader

Lazily loads batch of 4s segments
Samples points from prior; �generates, injects, and whiten the data.

Whitened-data is summarized in a�low-dim rep. and passed to the normalizing flow

10 of 17

Embedding net pretraining

Make data summary insensitive to time of arrival differences.
Jointly embed two ‘’views’’ of data.

Use a ResNet with 2-channel HL time domain data as input
Minimize VICReg.��

Obtain 8-dim summary vector after hyper-parameter tuning.
Condition our parameters on this data summary

11 of 17

Model config and training

Implement normalizing flow using pyro.

Embedding net contains 2.6 million trainable parameter
We use affine autoregressive transforms totaling 3.2 million parameters; total ~ 6 million trainable parameters.

Model and datasets implemented in pytorch-lightning.

Training converges (around 250 epochs) with early-stopping in 20-24 hrs on A100/A40 GPU
During a training run, model sees�250 ep. x 200 batch per ep. x 800 batch size ~ 40M unique parameter/data combinations passed to the model
Sampling 20K samples on new data takes 0.05s on A40 GPU.

HP tuning done via ray.tune.

~ 100 model configurations tuned in ∼ 15 hours with 4 A40 GPUs.

Example HPO runs

Best config

12 of 17

Testing

Parameter recovery is consistent with injections
Posterior widths wider w.r.t. stochastic sampling

Especially for sky-localization, though broad features are learned
See representative Mc = 45 sol. mass @ at 1 Gpc injection done on different background segments (optimal SNR ~ 20)
Blues: AMPLFI; Orange: bilby/dynesty

PP plot

13 of 17

Example skymaps

AMPLFI

BAYESTAR

14 of 17

Example skymaps

AMPLFI

BAYESTAR

15 of 17

Example skymaps

AMPLFI

BAYESTAR

16 of 17

GW150914

Inference after re-training from previous checkpoint

Jitters in time leave result unaffected

17 of 17

Current developments

Improvements in posterior widths

Especially in extrinsic parameters like sky-location distance-inclination
Use joint embeddings of both time and frequency domain data for simulation
Provide an embedding of noise power spectra as context�

Online testing

Both Aframe + AMPLFI can be run on a NVIDIA A30.

Aframe runs as a service, snapshots data in memory
Upon triggers, data chunk passed to AMPLFI for inference.

Preliminary latency benchmarks on playground show search + alert data products can be computed in ~6 seconds of data acquisition.