1 of 43

OPTIONAL SUBHEAD HERE

Jefferson Lab

Cullan Bedwell, Abhijeet Chawhan, Julie Crowe, Diana McSpadden

Spring 2023

AIEC Capstone

2 of 43

Jefferson Lab (JLab)

2

11/05/21

Newport News, Virginia, Continuous Electron Beam Accelerator

  • Experimental halls known as A, B, C, D (and the EIC):
  • We are focused on Hall D
  • Calibration of the Forward Calorimeter detector (FCAL)
    • 2800 4 x 4 x 45 cm3 lead blocks and photomultiplier tubes

3 of 43

Fig. 1: The GlueX Spectrometer in Hall D at Jefferson Lab, viewed from the downstream side, in October 2017.

The GlueX Detector

4 of 43

  • 2800 individual lead glass modules == 2800 photomultiplier tubes (PMTs).
  • PMT's vary in measurement of light emitted by electromagnetic showers
  • (when the FCAL was installed, no reference PMT installed measuring a known light source)

  • A gain calibration, a unitless scaling factor (multiplier), can be applied to data to minimize variance
    • The gain calibration may be unique per tube (2800), per experimental time period
      • (usually an ~2 hour chunk of time known as a "run").

  • TARGETS: gain calibration for each PMTs == the PMT gain calibration for each run.

FCAL Problem Statement

Model Input:

  • experimental conditions
    • beam current (luminosity/radiation damage to blocks)
    • temperature - possibly
    • atmospheric pressure - possibly
  • block characteristics
    • only obtainable through their relative measurements of 5 different colored LED pulses
  • controls
    • PMT high voltage control - (vmon)

Model Output:

5 of 43

Traditional FCAL calibration method:

  • Uses reconstructed from LED
    • π0 = type of subatomic particle.
  • π0’s not evenly distributed over the FCAL.
  • Reconstruction of particles takes time and more data

Machine learning method:

Predict gain calibrations using experimental conditions and equipment measurements (LED pulses) using previously collected, well-calibrated data?

Why Use Machine Learning?

6 of 43

Timeline

7 of 43

Timeline

Feb 2023

Data Mining, Data Cleaning, Data Exploration, Data Visualization commenced

March 2023

Data Mining, EDA complete.

Model creation complete.

Jan 2023

Gathering Domain knowledge in progress.

Feb 2023

EDA and Data Visualization in progress.

Feature Engineering + Model creation in Progress.

April 27, 2023

Final Model

Conclusions

Iterative approach for solution design.

Gathering Domain knowledge is continuous process.

April 2023

Model Evaluation

Gain Calibration Evaluation

8 of 43

Three Separate Research Questions

RQ 1: Predict Inner Block Radiation Damage

RQ 2: Run-to-Run Gain Calibration

RQ 3: Block-to-Block Gain Calibration

Predict a block’s radiation damage due to beam exposure based on block’s radius and integrated beam exposure from experiment (different based on different experiments)

Predict the Gain(of all blocks) to calibrate the entire FCAL (all blocks) to measure the “same” as the FCAL for all runs in a run period, based on a reference run

“Blind Drift Calibration” of sensors - calibration of sensors without a reference sensors

9 of 43

Questions:

  • What do we have?
    • What does it mean?
    • What is useful?
  • How trustworthy is it?

We have:

  • Gains → Target Variable
  • Mapping
    • X, Y, Radius, Ring Number
  • LED Amplitudes
    • Peak, Mean, Width, Integral, Yield, Chi-Squared, Errors
  • Voltages
    • Min, Max, Mean, Sigma
  • Timing Offsets
  • Temperature
  • Time
  • Integrated Beam Current

Understand the Complex Data Ecosystem

10 of 43

Data Quality is Key

11 of 43

Gains Are a Function of Time and Radius

Normalized Gain Calibration By Run Index - PrimeX Run Period

12 of 43

Next Steps

Feature engineering

  • Examine gains per ring, per run
  • Divide this by average gain per ring from baseline run
  • Plot to visualize anomalies

Implement Convolutional Neural Net

  • Predict gains using amplitude data
  • Convert 5 LED pulse data in channels

Combine work

  • Use data as input for model
    • add integrated beam exposure for radiation damage effects
    • possibly add voltage

13 of 43

Related Works

https://arxiv.org/pdf/1707.03682.pdf

Proposes a novel deep learning method named projection-recovery network (PRNet) to blindly calibrate sensor measurements online.

The PRNet first projects the drifted data to a feature space, and uses a powerful deep convolutional neural network to estimate drift- free measurements.

Below mentioned is the list of the earlier works that we referred to get insights into the solutions implemented for similar problems of calibration.

Literature scraped for Solution Design

Promising Read

14 of 43

  • 2800 individual lead glass modules == 2800 photomultiplier tubes (PMTs).
  • PMT's vary in measurement of light emitted by electromagnetic showers
    • (when the FCAL was installed, there was no reference PMT installed measuring a known light source)

  • A gain calibration, a unitless scaling factor (multiplier), can be applied to data to minimize variance
    • The gain calibration may be unique per tube (2800), per experimental time period
      • (usually an ~2 hour chunk of time known as a "run").

  • THESE ARE OUR TARGETS: gain calibrations for the PMTs == the PMT gain calibration for each run.

Reminder of FCAL Problem Statement

Model Input:

  • ?????
  • experimental conditions
    • beam current (luminosity/radiation damage to blocks)
    • temperature?
    • atmospheric pressure?
  • block characteristics
    • only obtainable through their relative measurements of 5 different colored LED pulses
  • controls
    • PMT high voltage control

Model Output:

15 of 43

t = time == run by run is our time

r = radius, ring

I_beam = integrated beam current

j = ring number (different number of blocks per ring)

G_i: Gain for block i (primex_gains.csv) - function of time, function of beam intensity.

G_i = g_PMT_i(t) * g_RD(r_i, I_beam)

  • g_PMT_i(t) (think of this as the noise component of the gain. This is the time dependent component.
  • g_RD(r_i, I_beam) is a function of radius and integrated beam current

Q_j(t) = average(G_i in ring j)

JLab can provide us with:

  • Integrated beam current by run (basically, the increased exposure to beam current for each run in a run period. First run will have 0 beam current exposure) - 1 value per run
    • 61321 is first primeX run with g and LED
  • g_RD_j : the per ring, per run contribution towards the gain caused by radiation damage - 1 value per ring per run
  • LED - 5 values per block, per run

/n_G(t)blocks

/n_G(0)blocks

16 of 43

START OF UNUSED SLIDES FOR 02/23 PRESENTATION

17 of 43

Early Challenges in feature engineering

06/06/22

17

Discussion: Behavior of gains in outer rings during primex (2019): Ratio to t(0) gradius

18 of 43

Early Challenges in feature engineering

06/06/22

18

Discussion: negative chi2/ndf values from primex (2019)

19 of 43

Ratio to t(0) gradius

61352, 61353, 61357,61358, 61359, 61360, 61361, 61362,61363, 61364, 61365, 61366, 61367, 61368, 61369, 61371,

61493, 61495, 61496, 61498, 61499, 671500, 61501, 61505

61580, 61581,61582, 61583, 61584, 61585, 61602, 61603, 61606, 61607, 61608, 61609, 61610, 61611

61670, 61671

20 of 43

Related to vmon changes?

61352, 61353, 61357,61358, 61359, 61360, 61361, 61362,61363, 61364, 61365, 61366, 61367, 61368, 61369, 61371,

61493, 61495, 61496, 61498, 61499, 671500, 61501, 61505

61580, 61581,61582, 61583, 61584, 61585, 61602, 61603, 61606, 61607, 61608, 61609, 61610, 61611

61670, 61671

21 of 43

22 of 43

23 of 43

LED Flashes

24 of 43

JLAB Capstone Problem:

Forward Calorimeter (FCAL) Calibration

The GlueX FCAL consists of 2800, 4 cm x 4 cm x 45 cm lead glass blocks stacked in a circular array. Each block is optically coupled to an FEU 84-3 PMT which will be instrumented with flash ADC electronics. GlueX-doc 985,988 and 989 document the GlueX Fcal as presented in the February 2008 Calorimetry Review.

25 of 43

FCAL Problem Statement

The Forward Calorimeter (FCAL) is a component of the GlueX spectrometer made of 2800 individual lead glass modules, each coupled to its own photomultiplier tube (PMT). The FCAL provides timing and energy measurements for photon showers. The Cherenkov light emitted by the electromagnetic showers produced within the lead glass blocks is detected by PMTs. The resulting PMT pulses are digitized using Flash analog-to-digital converters (fADCs) and the timing resolution is measured using a pulsed LED source.

However, PMT's vary in measurement (sensitive to temperature, humidity, magnetic fields????, over time? distance from center? other conditions?), and when the FCAL was installed, there was no reference PMT installed.

A gain calibration, or scaling factor, can be applied to minimize the event-to-event variance of the sum over all modules that scales the amplitude measured by a module dependent gain factor. The gain calibration may be unique per tube, per experimental time period (usually an ~2 hour chunk of time known as a "run"). Using the gain correction factors, the HV of each module is adjusted. The gain correction is essential to obtaining optimal resolution.

We are attempting the find a model to solve for gain calibrations for the PMT gain calibrations by run. The assumption is that the conditions that lead to changing gain are stable enough throughout an approximate 2-hour run.

26 of 43

Expected Outcome

  • One of the outcomes is to extract the relevant data pieces and convert them into an appropriate format for analysis.
  • The expected outcome for this project is to utilize AI/ML techniques to develop a model that helps prediction of the gain correction for the PMT gain calibration per tube, per experimental time period.

27 of 43

FCAL Data

Location of data files: /work/epsci/roark/FCAL/cpp/

with EPICS/calibrations: cpp_epics_quad0_df.csv

/work/epsci/roark/FCAL/2019_primex

In case interested: CPP is the “charged pion polarizability” experiment. Data was taken in 2022. Dr. Jeske’s (Torri’s) notebook will explain many of the features/columns in the data. I believe there are 733 “runs”/observations in the CPP data.

https://halldweb.jlab.org/wiki/index.php/Charged_Pion_Polarizability_Experiment_in_Hall-D

HOWEVER… we found out today we need the PrimeX experiment, not the CPP experiment to compare with the physicist that is doing a physics data-intensive method.

Descriptions of the data can be found in: https://halldweb.jlab.org/DocDB/0027/002770/001/FCAL_Manual.pdf

28 of 43

ADC amplitude

gain

coupling

LED amplitude

single module

all modules

vector notation

What is ADC amplitude?

  • ADC amplitude = flash analog-to-digital converter. “The analog signal from the PMT is digitized with a newly developed flash analog to digital converter (fADC) from JLAB” - https://halldweb.jlab.org/DocDB/0009/000988/001/fcal.pdf

David Lawrence

i is 1:~700

or 2800

j is 1:30

29 of 43

(700)

(700)

(30)

(700)

=

elementwise inverses

-1

-1

Train assuming known amplitudes for L and all gains=1 so that we get an α close to the actual coupling matrix.

Retrain with small learning rate and allow gains to vary, but keep them in reasonable range (e.g. add loss term like (g/1.5)^6.

Idea: Train encoder on its own to begin.

Freeze network.

Then train inverse of g to get the back half.

01/26/23 IDEA: Load up keras/tensorflow.

first group: try out the autoencoder example “images”/matrices example A targets - take a look at weights for “gains”. Recreate one image for one quadrant. experience with an autoencoder.

second group: train Encoder side.

Is it really just a perceptron?

Is this an optimization problem?

David Lawrence

(unknown coupling connectivity)

L: 1-10 Violet, 11-20 Blue, 21-30 Green (see small and large V in speaker notes)

5 inputs of ~700 (2 for blue and violet, 1 for green)

30 of 43

Adapting the Projection-Recovery Network (PRNet): from ‘A Deep Learning Approach for Blind Drift Calibration of Sensor Networks’

For our FCAL implementation:

  • could would add “fake blocks” to turn the FCAL into a square. At the corners impute from neighbors row/column value for amplitude and assume perfect gain, i.e. 1.
    • would need to determine strategy for fake blocks during production.
  • no time component., just space.
  • how will we represent == v,b,g,V,B (small_violet, small_blue, etc) as an extra dimension?
  • the Gain is from all 5 pulse types but with unknown contribution
    • Use Hadamard product and Hadamard division for amplitude and gain to “create” A_drift, and A_clean for our “training” data.
    • We can use a Lambda function to impute the gain values with the Hadamard division (i.e. output A_clean Hadamard division: A_output /. A_input == g if we need to produce the gain constants for consistent CCDC/GlueX processes.

31 of 43

(2800 x 5)

or (60 x 60 x 5) for a CNN AutoEncoder

(?)

=

How to recover the gains from the multi input?

\

(2800)

or (60 x 60)

(2800)

or (60 x 60)

(2800)

or (60 x 60)

(2800)

or (60 x 60)

(2800)

or (60 x 60)

32 of 43

2:31

I think Igal cares about the color of the LED to mainly monitor for radiation damage (that was the initial point of this system). He only recently realized he could use the LEDs for calibration

2:32

there will be quadrant to quadrant differences because the glass was sanded by hand, by humans. we do not know if Igal truly needs 100s of evio files for calibration, so a first step would be to see if I could refit the amplitudes using one file (like the first file as we do for the CDC) and then just looking at the color during that time.

2:33

They have also not looked at the gain correction distribution per run period per block, which is great that you have already started to look at that.

2:34

they are also unsure the "physics" benefit (if you want to call it that) for stabilizing the gain, i.e. if we are off by 1, 5, 10% etc how does that affect the resolution or energy

2:34

which leads me to my final point, if we can improve the timing resolution, we automatically win. apparently that's a big deal in the world of calorimeters.

2:35

the cycling of the LEDs is all the same and the voltages change rarely (hasn't happened in the 2 years Malte has been working with the FCAL).

33 of 43

Example of inner ring radiation degradation (2018 run period)

34 of 43

35 of 43

Research Questions

Three Research Questions Have Been Proposed:

  • David would like us to focus on: Predict a block’s radiation damage due to beam exposure
    • based on block’s ring position and beam intensity
  • Run-By-Run Calibration of the FCAL
    • Predict the Gain(of all block) to calibrate the entire FCAL (all blocks) to measure the “same” as the FCAL for all runs in a run period
  • Block-to-Block Calibration
    • Blind Drift Calibration Paper Related Problem

36 of 43

Reference Method

t = time == run by run is our time

r = radius, ring

I_beam = integrated beam current

j = ring number (different number of blocks per ring)

G_i: Gain for block i (primex_gains.csv) - function of time, function of beam intensity.

G_i = g_PMT_i(t) * g_RD(r_i, I_beam)

  • g_PMT_i(t) (think of this as the noise component of the gain. This is the time dependent component.
  • g_RD(r_i, I_beam) is a function of radius and integrated beam current

Q_j(t) = average(G_i in ring j)

JLab can provide us with:

  • Integrated beam current by run (basically, the increased exposure to beam current for each run in a run period. First run will have 0 beam current exposure) - 1 value per run
    • 61321 is first primeX run with g and LED
  • g_RD_j : the per ring, per run contribution towards the gain caused by radiation damage - 1 value per ring per run
  • LED - 5 values per block, per run

/n_G(t)blocks

/n_G(0)blocks

37 of 43

TODO: For the Reference Method

David Asked For - hopefully as soon as we can, because we will need the values as inputs:

Both:

  • plotted
  • And the values saved by run and by ring

For all rings, j, average ring gain for the run over the average gain for reference run - plot over the run period (on x axis) - y-axis is the ratio.

We hope this is 1 for the "outside" rings.

Hope not we are finding the g_RD for later runs for inner rings.

This is similar to Cullan's plots

These are the input values seen in the bottom flow that we will need g_RD_j=1, j=(ring)

SAID ANOTHER WAY

step 1) define the rings (this is sort of arbitrary but keep it consistent)

step 2) average the gains per ring per run

step 3) divide step 2 by the average gain per ring from run 61321 (this is like our t_0 or “Reference”, if you will)

step 4) plot (and save the value) as a function of run number (aka "time")

We “hope” for outer rings, this should be one. for inner rings, it should not be one. This is the g_RD, or the contribution to G_i of the g_RD. We will use this as input to our model (the part inside the dotted line on slide 15).

38 of 43

Generative Method

t = time == run by run is our time (a run is a “time”)

r = radius, ring

I_beam = integrated beam current

Uses r >= R as a non radiation damaged reference

G_i(t) = total Gain (primex_gains.csv) is some function of (r, LED, integrated beam, and “t”)

some convolution of g_i and g_RD_i

g_i = g_i(LED,t) - function of the LED pulses and time

g_RD(r, I_beam)

@ r >= R g_RD(r, I_beam) = 1

@ some radius/ring the contribution towards total gain of the radiation gain is just 1, i.e. there is no contribution. These are past the inner ring area.

So, we could have two models:

MODEL ONE:

  • inputs
    • “time” - one value per run
    • LED (5 peak values for each of the 2800 blocks) for each run
  • output
    • g_i(LED,t) - the non-radiation component of the gain for each block

MODEL TWO:

  • inputs
    • g_i(LED,t)
    • I_beam
    • radius/ring
  • output
    • G_i(t)

And you can see how these could work together into one “flow” in the bottom flowchart.

39 of 43

40 of 43

Prep & Backup

41 of 43

Nov/Dec/Jan: Intro to JLab and Data

  • Access to Jefferson Lab Computing
  • Examine git repo with example code from previous research: https://github.com/dianamJLAB/AIEC_UVaCapstone
  • Examine 2018 (alcohol corrected), 2020, 2021 data from data sets (above).
    • do EDA
    • Past work used pressure (pressure_mean), temp (d1_max) and current (mean_a_mean) (3 features). We would like to examine the difference when using pressure/temp as a single feature and current (2 features).
      • See whether we have better “coverage” across features
      • Compare accuracy
      • Compare uncertainty calibration
      • Stretch task could be to simulate a UQ-informed digital twin of CDC
  • Read about the GlueX FCAL TODO: Add links that David think are important

42 of 43

About the intro to JLAB data

  • run number: no data value: id for the observation
  • PRESSURE_MEAN: mean atmospheric pressure for the run
  • D1_MAX: max temperature for the run (Kelvin)
  • MEAN_A_MEAN: mean high voltage board current of the 8 high voltage boards on the A high voltage board.
  • There are many other features available..
  • https://wiki.jlab.org/epsciwiki/index.php/AI_For_Experimental_Controls/Datasets

43 of 43

useful definitions

calorimeter (in physics): detector that measures the energy of particles

photo-multiplier tube (PMT): super sensitive detector used to measure tiny amounts of light

Light emitting diode (LED): device that emits light when current flows through it