1 of 17

November 4, 2024

2 of 17

Deep Chatterjee, MIT

What do you work on?

  • Real-time detection of gravitational waves with�Laser Interferometer Gravitational-wave Observatory (LIGO)

What do you want to collaborate on with people outside your are of expertise?

  • Algorithms; Theory; Software & computing infrastructure
  • Running realtime ML inference - tools for our use case

What do you like to learn more about?

  • Other Likelihood-free Inference techniques.
  • Representation learning. Building good data summaries.
  • What others are doing…

3 of 17

2015

Also in 2017

Today, we detect a merger every 1-2 days.

Share information with other observatories to follow-up if interesting

2020s

4 of 17

Rapid Inference Faster coordination More Science

  • GW from binaries involve 15 parameters.
    • Well modeled
    • Bayesian Parameter estimation via MCMC/Nested sampling.
  • Repeated likelihood evaluations forms the expensive step
    • Today, updates from bayesian PE is several hours after event detection
    • Scaling to more events in the future is difficult.
  • Likelihood-free Inference
    • Learn an approximator for the posterior from simulations,�instead of sampling the posterior.
    • During inference, draw samples from the approximator.

Posterior

Likelihood

Prior

In MCMC: Sample posterior

In LFI: Learn the distribution from simulations

5 of 17

Overall idea of posterior estimation

  • Learn an approximator for the posterior from simulations����
  • Use pairs to learn the approximator.�
  • In our case the approximator is a normalizing flow.
    • Flexible neural network transforms, that are learned during training
    • Transforms the parameters conditioned on data(-summary) into a simple base distribution.

Detector noise

Toy Example: �Linear Reg.

From GR

Simulate

6 of 17

Normalizing flows

��

  • Have two components:
    • Simpler, base distribution, like a normal dist. - easy to sample and density est.
    • Transform - A neural network that learns to transform complex variable to base distribution����
    • We use affine autoregressive transforms [Germain+ (2015), Kingma+ (2017)]
  • Trained by maximizing likelihood
    • This is easy since base distribution is easy to density estimate from
    • Transform is fast to evaluate*

7 of 17

Details about Normalizing Flow

  • Practical details involve how the transform is implemented
    • Affine transforms
    • Splines
    • Integral based
  • Autoregressive property ensures efficient determinant computation
    • Transforms are lower triang.; determinant equal product of diag. elem.

  • We use a masked autoregressive flow - affine transform between layers

8 of 17

Waveform generation on Accelerated hardware

  • Re-implement some CBC waveforms as a part of ml4gw.
  • Current work uses IMRPhenomD.
    • On-the-fly waveform generation.
    • Batch of 1000 ~ 0.15s on A40 GPU.
    • IMRPhenomPv2 recently added, to be used for subsequent analyses.

9 of 17

Data generation

  • Use real noise from the HL detectors
    • Stretches of ANALYSIS_READY segments from O3
  • Background transferred to GPU
    • This study uses 20K sec @ 2048 Hz
  • Data loader
    • Lazily loads batch of 4s segments
    • Samples points from prior; �generates, injects, and whiten the data.
  • Whitened-data is summarized in a�low-dim rep. and passed to the normalizing flow

10 of 17

Embedding net pretraining

  • Make data summary insensitive to time of arrival differences.
  • Jointly embed two ‘’views’’ of data.
    • Use a ResNet with 2-channel HL time domain data as input
    • Minimize VICReg.���
  • Obtain 8-dim summary vector after hyper-parameter tuning.
  • Condition our parameters on this data summary

11 of 17

Model config and training

  • Implement normalizing flow using pyro.
    • Embedding net contains 2.6 million trainable parameter
    • We use affine autoregressive transforms totaling 3.2 million parameters; total ~ 6 million trainable parameters.
  • Model and datasets implemented in pytorch-lightning.
    • Training converges (around 250 epochs) with early-stopping in 20-24 hrs on A100/A40 GPU
    • During a training run, model sees�250 ep. x 200 batch per ep. x 800 batch size ~ 40M unique parameter/data combinations passed to the model
    • Sampling 20K samples on new data takes 0.05s on A40 GPU.
  • HP tuning done via ray.tune.
    • ~ 100 model configurations tuned in ∼ 15 hours with 4 A40 GPUs.

Example HPO runs

Best config

12 of 17

Testing

  • Parameter recovery is consistent with injections
  • Posterior widths wider w.r.t. stochastic sampling
    • Especially for sky-localization, though broad features are learned
    • See representative Mc = 45 sol. mass @ at 1 Gpc injection done on different background segments (optimal SNR ~ 20)
    • Blues: AMPLFI; Orange: bilby/dynesty

PP plot

13 of 17

Example skymaps

AMPLFI

BAYESTAR

14 of 17

Example skymaps

AMPLFI

BAYESTAR

15 of 17

Example skymaps

AMPLFI

BAYESTAR

16 of 17

GW150914

Inference after re-training from previous checkpoint

Jitters in time leave result unaffected

17 of 17

Current developments

  • Improvements in posterior widths
    • Especially in extrinsic parameters like sky-location distance-inclination
    • Use joint embeddings of both time and frequency domain data for simulation
    • Provide an embedding of noise power spectra as context�
  • Online testing
    • Both Aframe + AMPLFI can be run on a NVIDIA A30.
      • Aframe runs as a service, snapshots data in memory
      • Upon triggers, data chunk passed to AMPLFI for inference.
    • Preliminary latency benchmarks on playground show search + alert data products can be computed in ~6 seconds of data acquisition.