1 of 42

ESR12: Accelerated Anomaly Detection

Pratik Jawahar*

Supervisors:�Caterina Doglioni, Jiri Masik, Alex Oh,�Maurizio Pierini

1

Overview:

  • PMI: AD for DQM
  • BEAD: AD for new physics
  • PHAZE: Fast ML-Inference
  • [Side Quest] TARP: Twikis for ATLAS via RAG-based Protocols

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

2 of 42

Pointwise Mutual Information Profiles as Anomaly Detectors for DQM

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

3 of 42

DQM in LAr Calorimeters

3

SMARTHEP Milan ‘24

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

4 of 42

ANNOTATOR: LSTM-AE

4

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

5 of 42

xLSTM-AE

  • Motivation: solves catastrophic forgetting
    • Specialized memory handling
  • Still memory intensive and non-parallelizable
    • Train a student network to predict xLSTM loss
    • Accuracy gained over the baseline LSTM is traded off in the student network for speed
  • xLSTM is a better 1-1 comparison to attention based models such as transformers
    • (Along lines of Laura Boggia’s work)
  • However xLSTM inference is much more compute intensive than LSTM

5

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

6 of 42

Pointwise Mutual Information

  • Definition:

  • Intuition: deviation from dependence/independence; high PMI ⇒ the pair (x,y) co-occurs more than expected and vice versa.
  • Why it helps: pairwise shifts are interpretable and often spike early.
  • How the pipeline gets probabilities:
    • Quantize features (KBinsDiscretizer, n_bins=10).
    • Slide a window of length WIN.
    • Count joint bin hits in the window → estimate probabilities.
    • Note: we use a non-negative score downstream (clipping or using −log⁡p-\log p−logp) so it aggregates cleanly

6

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

7 of 42

Results (Nikhil Jangid - HSF Summer intern)

  • Replicated plots from pub note
  • PMI matches the anomalies flagged by LSTM-AE
  • Granularity is lower due to discretized windows
  • Algorithm itself is much faster qualitatively
    • Much fewer computations than LSTM-AE
    • Quantitative benchmarking TBD
  • Future: On larger datasets, identify specific pairs of features prone to higher PMI sensitivity to specific anomalies for even faster inference

7

LSTM-AE

PMI

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

8 of 42

BEAD: Background Enrichment for Anomaly Detection

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

9 of 42

Unsupervised AD

  • ML based AD for new physics searches has been under dev for many years now in HEP
    • Started with CWoLA-like methods
      • Moved onto VAEs
        • Deeper studies on VAE architectures (deep sets, graphs, normalizing flows etc.)
          • Tested on large benchmarks (Darkmachines Challenge, LHC Olympics etc.)
            • Multi-background representation learning (Use of broader range of train_bkg)
  • One potentially missing piece:
    • Representation learning on background from different MC generators
      • Question: Do popular generators like Pythia have specific algorithmic quirks that the model converges on during training?
        • If yes, can we disentangle these? Does that improve performance?
          • Presenting: BEAD!

9

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

10 of 42

BEAD

  • We enrich the background representations learnt:
    • By training on a set of bkg processes from:
      • Pythia, Herwig, Sherpa
    • We compare:
      • {Vanilla VAE} - {VAE+NF} - {Dirichlet VAE} - {Supervised Contrastive VAE}
  • BEAD [Framework]: Python package (FAIR design)
    • Findable
      • Available on Zenodo, Github, PyPI (soon)
    • Accessible
      • Detailed README (GitDiagram, Installation instructions, usage examples); Docs (published to RTD website from github)
    • Interoperable
      • Codebase is a VAE-training and testing toolki; not confined to VAE-AD in HEP
    • Reusable
      • Modular code; Comprehensive test suite; Usage example scripts

10

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

11 of 42

BEAD - Latent Representations

11

VAE Gaussian

VAE + Planar Flows

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

12 of 42

BEAD - Latent Representations

12

VAE + Householder Flows

Dirichlet VAE

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

13 of 42

BEAD - Latent Representations

13

Contrastive VAE (SupCon)

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

14 of 42

PHAZE: Fast ML Inference

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

15 of 42

Fast ML Inference

  • Use specialized hardware
    • GPUs
    • FPGAs
      • Requires special model management driven by hardware specs
        • Pruning
        • Quantization
  • Knowledge Distillation
    • A large network is trained on the required task
    • A much smaller network is trained to predict the loss of the larger one
    • Only the smaller network is deployed on the Edge device
  • PHAZE: Probabilistic Hashing and Zero-Knowledge-ML based Early-Exit

15

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

16 of 42

PHAZE (Offline)

  • Probabilistic Hashing and Zero-Knowledge-ML based Early Exit
  • Split into Offline and Online phases
    • Focus on moving most of the computation offline
      • Online computation becomes low latency
  • Given: A large, performant classifier, `M_full` (eg. foundational model) trained on a representative dataset across multiple trigger classes
  • Train a small adapter, `M_early` prepended to M_full
    • Classifier performance drops to acceptable level
  • Quantize; Interpolate; Hash
  • Generate zk-proof
  • Map the hash to the M_full decision and populate the Verifiable Decision Map (VDM)

16

Offline phase

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

17 of 42

PHAZE (Online)

  • Only `M_early` is deployed online
  • OTF hashing directly generates the hash evaluated at a specific constant (Polynomial interpolation skipped)
  • The hash is then used to perform a VDM lookup
  • VDM gives the trigger decision stored for map-hits
  • Map-misses can be further treated using AD strategies

17

Online phase

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

18 of 42

What the ZK Proof Certifies:

  • This is the main guarantee for the sanity of the VDM
    • If not, a single bit flip can ruin decisions
  • The proof verification is also much faster than full inference for DQM, offline reconstructions etc.
  • The proofs also allow the online VDM to be dynamic down the line

18

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

19 of 42

PHAZE: Initial Feasibility

  • Currently have a working toolkit of
    • 5 Probabilistic hashing schemes [Rust]
    • 1 full ZKML system (Ezkl) [Python]
    • 1 full ZKVM system (RISC Zero) [Rust-Python]
    • 3 Mock ZKML systems (Groth16, Plonky, Halo) [Rust]
  • Currently M_early strategies are being left to future work
    • ParticleNet feasibility test for POC
      • Training vanilla KD as simple test

19

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

20 of 42

PHAZE: Ultimate Vision

  • A dynamic trigger map that updates itself based on downstream decisions on repeatedly encountered Map-Misses
    • These could be related to:
      • Data drift over time
      • Detector noise/effects
      • Data-MC discrepancies
      • New physics anomalies
  • The ZKP is pivotal in these updates because the existing VDM cannot be trusted without it (unless its static)

20

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

21 of 42

Side Quest: TARP

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

22 of 42

TARP: Twikis for ATLAS via RAG Protocols

  • Along the lines of
    • ChATLAS
    • AccGPT
  • Both the above tools use only RAG
  • The goal was to show that RAG only throws a `tarp` over the underlying mess (Twikis) to cover it up as opposed to making it useful
    • Twikis - low quality, less reliable
    • Papers, pub notes etc. - higher quality and reliability
  • TARP ended up becoming an intro to Fine-Tuning LLMs, which will be the next integration step for ChATLAS, AccGPT etc.

22

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

23 of 42

RAG vs LLM Fine-tuning

23

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

24 of 42

Thank you! Gonna miss you all :(

But hopefully our paths cross again! :)

24

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

25 of 42

Input Data

  • Source of Input Data:
    • The data comes from topocluster moments, which are aggregated features of clusters of calorimeter cells.
    • The two primary topocluster properties used are:
      • Q-factor: Indicates how well the signal pulse shape matches the expected ideal shape.
      • Timing (𝜏): Refers to the timing of the signal relative to the event, helping detect out-of-time signals or anomalies.
    • For each of these properties, we consider the mean and std. dev as the AE inputs
  • Two regions considered for both Barrel and End Cap resp.:
    • Barrel C: −1.5 ≤ η ≤ 0
    • Barrel A: 0 < η ≤ 1.5
    • Endcap C: −3.2 ≤ η < −1.5
    • Endcap A: 1.5 < η ≤ 3.2
  • As a result each input point to the AE is 16 dimensional considering p-p collisions

25

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

26 of 42

Input Data

  • Source of Input Data:
    • The two primary topocluster properties used are:
      • Q-factor: Indicates how well the signal pulse shape matches the expected ideal shape.
      • Timing (𝜏): Refers to the timing of the signal relative to the event, helping detect out-of-time signals or anomalies.
    • For each of these properties, we consider the mean and std. dev as the AE inputs
  • Two regions considered for both Barrel and End Cap resp.:
  • Each input point to the AE is 16 dimensional considering p-p collisions

26

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

27 of 42

27

Joint-PMI

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

28 of 42

Joint-PMI

28

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

29 of 42

(Tiny bullet: training discretizers on (361862 Main + 462542 UPC/HP) is critical for stability across streams.)

29

Preprocessing (exact recipe)

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

30 of 42

Fig 11 (run 361862, LB 802

— CosmicCalo)

What we plot (paper style):

  • Top 16: raw channels (blue)
  • Bottom: Joint-PMI score, PMI anomaly (thresholded), Reference algo.

What we see:

  • Long anomalous window across most of the LB shows up as sustained high JPMI; PMI anomaly row stays high and aligns with Reference algo.
  • Confirms our method does capture Fig 11 (updated finding). (note: event count ≈ 2k; ΔT≈60 s.)

30

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

31 of 42

Fig 11

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

32 of 42

Fig 12 (run 430897, LB 504 — HVNONNOMINAL/Main)

Expectation from paper:

  • spike around event ~11 k.
  • Our result: JPMI peak and PMI anomaly bar align with that window.

(Event count ≈ 45 k; ΔT≈60 s.)

32

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

33 of 42

Fig 12

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

34 of 42

“Fig 13 — partial alignment, not exact”

What we see:

  • A JPMI spike in a similar region to the reference algo.
  • But alignment is not exact → emphasizes JPMI as a quick flagger, not a precise per-event detector.

Possible reasons:pairwise binning granularity, window length \(W\), threshold quantile, and pair-averaging can blur exact timing.

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

35 of 42

BEAD Samples - Bkg

35

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

36 of 42

BEAD Samples - Signal

36

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

37 of 42

BEAD: 2-Class Training VAE

37

Herwig-Pythia

Herwig-Sherpa

Pythia-Sherpa

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

38 of 42

BEAD: 2-Class Training SC-VAE

38

Herwig-Pythia

Herwig-Sherpa

Pythia-Sherpa

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

39 of 42

BEAD: 1-Class Training VAE

39

Pythia

Sherpa

Herwig

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

40 of 42

Phaze Latency Estimates

40

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

41 of 42

How do you fine-tune?

  • Unsupervised:
    • Masked Language Models
  • Supervised:
    • Generate Labeled Dataset
  • Reinforcement:
    • Setup a feedback loop with reward chains

41

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

42 of 42

TARP Tool-Suite

  • Datascraping:
    • CDS text:
      • Crawl4AI
      • Firecrawl
    • CDS plots:
      • ColiVara
  • Orchestration:
    • LlamaIndex
  • LLM serving:
    • Ollama
  • Hyper-fast fine-tuning:
    • Sloth
  • Tracing and Evals:
    • Comet Opik
  • Potential TARP serving:
    • LoRAX
  • Rough UI:
    • Streamlit

42

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar

SMARTHEP is funded by the European Union’s Horizon 2020 research and innovation programme, call H2020-MSCA-ITN-2020, under Grant Agreement n. 956086

ESR12: Pratik Jawahar