1 of 43

OPTIONAL SUBHEAD HERE

Jefferson Lab

Cullan Bedwell, Abhijeet Chawhan, Julie Crowe, Diana McSpadden

Spring 2023

AIEC Capstone

2 of 43

Jefferson Lab (JLab)

2

11/05/21

Newport News, Virginia, Continuous Electron Beam Accelerator

Experimental halls known as A, B, C, D (and the EIC):
We are focused on Hall D
Calibration of the Forward Calorimeter detector (FCAL)

2800 4 x 4 x 45 cm3 lead blocks and photomultiplier tubes

INTRO: Diana

Our project sponsor if the Thomas Jefferson National Accelerator Facility, or Jefferson Lab or, JLab, located in Newport New, Virginia. Home of the Continuous Electron Beam Accelerator and four experimental halls known as A, B , C and D. Our work is focused on Hall D, home of the GlueX detector system seen in the schematic on the left.

We are specifically focused on calibration of the Forward Calorimeter detector, or FCAL.

The FCAL consists of 2800 4 x 4 x 45 cm3 lead-glass blocks stacked in a circular frame.

FCAL is the last stop for forward particles in the GlueX detector. - the last gray box in the schematic on the right.

You can see if being built in the center image, and complete on the right.

Each block is individually read out by a PMT (Photo-Multiplier Tube).

3 of 43

Fig. 1: The GlueX Spectrometer in Hall D at Jefferson Lab, viewed from the downstream side, in October 2017.

https://en.wikipedia.org/wiki/GlueX

The GlueX Detector

4 of 43

2800 individual lead glass modules == 2800 photomultiplier tubes (PMTs).
PMT's vary in measurement of light emitted by electromagnetic showers
(when the FCAL was installed, no reference PMT installed measuring a known light source)

A gain calibration, a unitless scaling factor (multiplier), can be applied to data to minimize variance

The gain calibration may be unique per tube (2800), per experimental time period

(usually an ~2 hour chunk of time known as a "run").

TARGETS: gain calibration for each PMTs == the PMT gain calibration for each run.

FCAL Problem Statement

Model Input:

experimental conditions

beam current (luminosity/radiation damage to blocks)
temperature - possibly
atmospheric pressure - possibly

block characteristics

only obtainable through their relative measurements of 5 different colored LED pulses

controls

PMT high voltage control - (vmon)

Model Output:

5 of 43

Traditional FCAL calibration method:

Uses reconstructed from LED

π⁰= type of subatomic particle.

π⁰’s not evenly distributed over the FCAL.
Reconstruction of particles takes time and more data

Machine learning method:

Predict gain calibrations using experimental conditions and equipment measurements (LED pulses) using previously collected, well-calibrated data?

Why Use Machine Learning?

6 of 43

Timeline

https://www.sudeep.co/data-science/2018/02/09/Understanding-the-Data-Science-Lifecycle.html

7 of 43

Timeline

Feb 2023

Data Mining, Data Cleaning, Data Exploration, Data Visualization commenced

March 2023

Data Mining, EDA complete.

Model creation complete.

Jan 2023

Gathering Domain knowledge in progress.

Feb 2023

EDA and Data Visualization in progress.

Feature Engineering + Model creation in Progress.

April 27, 2023

Final Model

Conclusions

Iterative approach for solution design.

Gathering Domain knowledge is continuous process.

April 2023

Model Evaluation

Gain Calibration Evaluation

8 of 43

Three Separate Research Questions

RQ 1: Predict Inner Block Radiation Damage

RQ 2: Run-to-Run Gain Calibration

RQ 3: Block-to-Block Gain Calibration

Predict a block’s radiation damage due to beam exposure based on block’s radius and integrated beam exposure from experiment (different based on different experiments)

Predict the Gain(of all blocks) to calibrate the entire FCAL (all blocks) to measure the “same” as the FCAL for all runs in a run period, based on a reference run

“Blind Drift Calibration” of sensors - calibration of sensors without a reference sensors

9 of 43

Questions:

What do we have?

What does it mean?
What is useful?

How trustworthy is it?

We have:

Gains → Target Variable
Mapping

X, Y, Radius, Ring Number

LED Amplitudes

Peak, Mean, Width, Integral, Yield, Chi-Squared, Errors

Voltages

Min, Max, Mean, Sigma

Timing Offsets
Temperature
Time
Integrated Beam Current

Understand the Complex Data Ecosystem

10 of 43

Data Quality is Key

11 of 43

Gains Are a Function of Time and Radius

Normalized Gain Calibration By Run Index - PrimeX Run Period

12 of 43

Next Steps

Feature engineering

Examine gains per ring, per run
Divide this by average gain per ring from baseline run
Plot to visualize anomalies

Implement Convolutional Neural Net

Predict gains using amplitude data
Convert 5 LED pulse data in channels

Combine work

Use data as input for model

add integrated beam exposure for radiation damage effects
possibly add voltage

13 of 43

Related Works

https://arxiv.org/pdf/1707.03682.pdf

Proposes a novel deep learning method named projection-recovery network (PRNet) to blindly calibrate sensor measurements online.

The PRNet first projects the drifted data to a feature space, and uses a powerful deep convolutional neural network to estimate drift- free measurements.

Below mentioned is the list of the earlier works that we referred to get insights into the solutions implemented for similar problems of calibration.

Literature scraped for Solution Design

Promising Read

14 of 43

2800 individual lead glass modules == 2800 photomultiplier tubes (PMTs).
PMT's vary in measurement of light emitted by electromagnetic showers

(when the FCAL was installed, there was no reference PMT installed measuring a known light source)

A gain calibration, a unitless scaling factor (multiplier), can be applied to data to minimize variance

The gain calibration may be unique per tube (2800), per experimental time period

(usually an ~2 hour chunk of time known as a "run").

THESE ARE OUR TARGETS: gain calibrations for the PMTs == the PMT gain calibration for each run.

Reminder of FCAL Problem Statement

Model Input:

?????
experimental conditions

beam current (luminosity/radiation damage to blocks)
temperature?
atmospheric pressure?

block characteristics

only obtainable through their relative measurements of 5 different colored LED pulses

controls

PMT high voltage control

Model Output:

15 of 43

t = time == run by run is our time

r = radius, ring

I_beam = integrated beam current

j = ring number (different number of blocks per ring)

G_i: Gain for block i (primex_gains.csv) - function of time, function of beam intensity.

G_i = g_PMT_i(t) * g_RD(r_i, I_beam)

g_PMT_i(t) (think of this as the noise component of the gain. This is the time dependent component.
g_RD(r_i, I_beam) is a function of radius and integrated beam current

Q_j(t) = average(G_i in ring j)

…

JLab can provide us with:

Integrated beam current by run (basically, the increased exposure to beam current for each run in a run period. First run will have 0 beam current exposure) - 1 value per run

61321 is first primeX run with g and LED

g_RD_j : the per ring, per run contribution towards the gain caused by radiation damage - 1 value per ring per run
LED - 5 values per block, per run

/n_G(t)blocks

/n_G(0)blocks

16 of 43

START OF UNUSED SLIDES FOR 02/23 PRESENTATION

17 of 43

Early Challenges in feature engineering

06/06/22

17

Discussion: Behavior of gains in outer rings during primex (2019): Ratio to t(0) g_radius

18 of 43

Early Challenges in feature engineering

06/06/22

18

Discussion: negative chi2/ndf values from primex (2019)

19 of 43

Ratio to t(0) g_radius

61352, 61353, 61357,61358, 61359, 61360, 61361, 61362,61363, 61364, 61365, 61366, 61367, 61368, 61369, 61371,

61493, 61495, 61496, 61498, 61499, 671500, 61501, 61505

61580, 61581,61582, 61583, 61584, 61585, 61602, 61603, 61606, 61607, 61608, 61609, 61610, 61611

61670, 61671

20 of 43

Related to vmon changes?

61352, 61353, 61357,61358, 61359, 61360, 61361, 61362,61363, 61364, 61365, 61366, 61367, 61368, 61369, 61371,

61493, 61495, 61496, 61498, 61499, 671500, 61501, 61505

61580, 61581,61582, 61583, 61584, 61585, 61602, 61603, 61606, 61607, 61608, 61609, 61610, 61611

61670, 61671

21 of 43

22 of 43

23 of 43

LED Flashes

https://halldweb.jlab.org/DocDB/0022/002251/001/FCAL-User%27s_Guide.pdf

Monitoring LEDs are place along the two outer edges of each pane. Light diffuses in the panes and is transmitted to the lead glass blocks behind them. The whole set up is in a light-tight cover (think like a photography dark room) because you want to detect really really really faint amounts of light. Another analogy is looking at stars in the middle of nowhere vs NYC, if there is a lot of light pollution, this is much harder. The calorimeter and PMTs themselves are in an actual dark room.

The acrylic and cover was shown to not affect the energy resolution of the calorimeter (via simulation). This is probably not important for us but could be useful background info in case anyone asks you.

The LED relative gain monitoring system is designed for two main purposes: tracking the stability of the lead glass output signals between calibrations with particles AND check for radiation damage. The radiation damage comes from the beam of photons. We use LEDs of different colors (GREEN (574 nm), BLUE (470 nm), and VIOLET (390 nm)) specifically). The different colors ensure we continuously monitor the whole spectrum where the PMTs are sensitive. We also want to monitor the radiation damage as a function of wavelength (i.e. color).

There’s a hole in the center of the 4x4 acrylic planes so that beam particles can pass through without generating a huge amount of unwanted background, and to reduce radiation damage.

The monitoring system is comprised of a global controller, a local controller for each acrylic quadrant, and 10 4-LED pulser boards per local controller. The 4 LED pulser uses the same circuit, repeated 4 times with the different LEDs. Each has its own trigger but they share a common bias voltage. The ten 4-LED boards of a quadrant are arranged in two sets of five and the boards are clamped on the side of each acrylic pane.

A cable connects each FCAL controller to the global controller carrying with it some useful information like the trigger signals, the LED bias voltage, and the power supply. Each trigger signal identifies one LED type (i.e. color) to be pulsed. Each trigger signal is buffered by the local controller and relayed to all ten LED boards using ribbon cables. This means that all ten LEDs of the same color are always triggered simultaneously on an acrylic quadrant.

It is possible to trigger different colors simultaneously.

During data taking, the LED pulsers follow this pattern:

Violet at 12 V (0-9 minutes)

Blue at 10 V (10-19 minutes)

Green at 29 V (20-29 minutes)

Violet at 22 V (30-39 minutes)

Blue at 15 V (40-49 minutes)

No pulsing (50-59 minutes)

** i’m sure the sequence, voltages, and time lengths can and do vary but I am not sure what would prompt that to happen. I can ask Igal/Mark/Malte.

24 of 43

JLAB Capstone Problem:

Forward Calorimeter (FCAL) Calibration

The GlueX FCAL consists of 2800, 4 cm x 4 cm x 45 cm lead glass blocks stacked in a circular array. Each block is optically coupled to an FEU 84-3 PMT which will be instrumented with flash ADC electronics. GlueX-doc 985,988 and 989 document the GlueX Fcal as presented in the February 2008 Calorimetry Review.

GlueX Experiment wiki FCAL Page: https://halldweb.jlab.org/wiki/index.php/FCAL
Some FCAL Photos: https://halldweb.jlab.org/wiki/index.php/FCALPhotos
About the FCAL: https://halldweb.jlab.org/DocDB/0009/000988/001/fcal.pdf
A measurement of the energy and timing resolution of GlueX Forward Calorimeter using an electron beam: https://arxiv.org/pdf/1304.4999.pdf
David’s neural network idea: https://docs.google.com/presentation/d/1vNmSSKvgRWWSwVynnnnV7vjsxN3wNOTmWaS4qZV1yqI/edit#slide=id.g1ba89161a95_0_0

https://inspirehep.net/files/96d8804ae69dee3eca20725d0af1eaef

25 of 43

FCAL Problem Statement

The Forward Calorimeter (FCAL) is a component of the GlueX spectrometer made of 2800 individual lead glass modules, each coupled to its own photomultiplier tube (PMT). The FCAL provides timing and energy measurements for photon showers. The Cherenkov light emitted by the electromagnetic showers produced within the lead glass blocks is detected by PMTs. The resulting PMT pulses are digitized using Flash analog-to-digital converters (fADCs) and the timing resolution is measured using a pulsed LED source.

However, PMT's vary in measurement (sensitive to temperature, humidity, magnetic fields????, over time? distance from center? other conditions?), and when the FCAL was installed, there was no reference PMT installed.

A gain calibration, or scaling factor, can be applied to minimize the event-to-event variance of the sum over all modules that scales the amplitude measured by a module dependent gain factor. The gain calibration may be unique per tube, per experimental time period (usually an ~2 hour chunk of time known as a "run"). Using the gain correction factors, the HV of each module is adjusted. The gain correction is essential to obtaining optimal resolution.

We are attempting the find a model to solve for gain calibrations for the PMT gain calibrations by run. The assumption is that the conditions that lead to changing gain are stable enough throughout an approximate 2-hour run.

26 of 43

Expected Outcome

One of the outcomes is to extract the relevant data pieces and convert them into an appropriate format for analysis.
The expected outcome for this project is to utilize AI/ML techniques to develop a model that helps prediction of the gain correction for the PMT gain calibration per tube, per experimental time period.

27 of 43

FCAL Data

Location of data files: /work/epsci/roark/FCAL/cpp/

with EPICS/calibrations: cpp_epics_quad0_df.csv

/work/epsci/roark/FCAL/2019_primex

In case interested: CPP is the “charged pion polarizability” experiment. Data was taken in 2022. Dr. Jeske’s (Torri’s) notebook will explain many of the features/columns in the data. I believe there are 733 “runs”/observations in the CPP data.

https://halldweb.jlab.org/wiki/index.php/Charged_Pion_Polarizability_Experiment_in_Hall-D

HOWEVER… we found out today we need the PrimeX experiment, not the CPP experiment to compare with the physicist that is doing a physics data-intensive method.

Descriptions of the data can be found in: https://halldweb.jlab.org/DocDB/0027/002770/001/FCAL_Manual.pdf

28 of 43

ADC amplitude

gain

coupling

LED amplitude

single module

all modules

vector notation

What is ADC amplitude?

ADC amplitude = flash analog-to-digital converter. “The analog signal from the PMT is digitized with a newly developed flash analog to digital converter (fADC) from JLAB” - https://halldweb.jlab.org/DocDB/0009/000988/001/fcal.pdf

David Lawrence

i is 1:~700

or 2800

j is 1:30

ADCs allow for precise measurements of signal arrival times - this is needed for the “timing resolution” (mentioned on the FCAL Problem Statement slide). Precise timing is important because particle identification is the purpose of these detectors and how fast something moves tells you a bit about what type of particle it is. The gain measure is important because the gain measurement also tells you about what type of particle it is.

In the FCAL

2800 lead blocks

2800 PMTs

140 LEDs (30 per quadrant)

4 quadrants

~700 lead blocks/PMTs per quadrant

30 LEDs per quadrant

30 LEDs affect each module, they are a different 40 LEDs depending on which quadrant.

Top Line

The top line is a way to think about the gain of a single PMT.

“A” represents the ADC (analog to digital converter) amplitude (V?) measured from the PMT.

“g” is the scalar multiplier we are trying to discover for the PMT module.

“j” is 1 to 40 (see the picture on the bottom, there are 40 LEDs that flash, but only 10 flash at a time, but it is the SUM of the 40 flashes that are measured, i.e. the sum amplitude)

“alpha” is the coupling vector (30 values, 1 for each LED)

“L” is the LED vector - the measured LED amplitude (30 values per quadrant == 30 values per PMT module) - voltage.

Middle Line

The middle line is notation for thinking about all the modules in the FCAL where “i” is the block/module number (1:2800) (or 0:2799), j is still 1:30

Bottom Line

Vector Notation for middle line

29 of 43

(700)

(30)

(700)

=

elementwise inverses

-1

Train assuming known amplitudes for L and all gains=1 so that we get an α close to the actual coupling matrix.

Retrain with small learning rate and allow gains to vary, but keep them in reasonable range (e.g. add loss term like (g/1.5)^6.

Idea: Train encoder on its own to begin.

Freeze network.

Then train inverse of g to get the back half.

01/26/23 IDEA: Load up keras/tensorflow.

first group: try out the autoencoder example “images”/matrices example A targets - take a look at weights for “gains”. Recreate one image for one quadrant. experience with an autoencoder.

second group: train Encoder side.

Is it really just a perceptron?

Is this an optimization problem?

David Lawrence

(unknown coupling connectivity)

L: 1-10 Violet, 11-20 Blue, 21-30 Green (see small and large V in speaker notes)

5 inputs of ~700 (2 for blue and violet, 1 for green)

QUESTION: How can we learn the gain vector?

From the previous slide we have knowns:

A: ADC amplitudes - digital amplitudes from the PMTs - almost have these (01/26/2023: Torri to gather all the hits stored in the LED skims, plot them, fit with gaussian or whatever, and get the peak position)

L: the LED amplitudes - Torri is currently retrieving

gains: # /work/epsci/roark/FCAL/cpp_gains_v0.csv'

This looks like an Encoder Decoder model.

We have a known TRUTH for our encoded latent space: the LED amplitudes - these will be the EPICS values for the blue, green and violet pulses. Torri will be able to grab min/mean/max/sigma for the 160 LEDs. (Each block/PMT “sees” 40 LEDs, but only 10 at a time).

David’s idea is to train assuming known amplitudes for L and with all gains==1 (create a “reference FCAL”) to get an alpha clas to the actual coupling matrix. No bias terms.

Then using that alpha, retrain with a small learning rate and allow the gains to vary, but keep them in a reasonable range (we need to determine what a “reasonable range” is for the gains and how to keep them in a reasonable range).

LED (L) values need to have the ratio of the mean based on the Voltages (small and large for Violet and Blue) see reference:

Violet at 12 V - Violet at 22 V

Blue at 10 V - Blue at 15 V

Green at 29 V

Outstanding Questions:

What are the timing offsets, I (Diana) don’t understand how the timing offsets play in here. - backburner.
I guess I (Diana) am also unclear on different between software gains and gains. - functionally the same thing. either on hardware or software
Are there aspects of the geometry of the FCAL that we should consider in our connections between g and L?
Are there aspects of the LED colors/timing that we need to think about? I need to add Igal’s slides showing how the different color pulses affected gains.

30 of 43

Adapting the Projection-Recovery Network (PRNet): from ‘A Deep Learning Approach for Blind Drift Calibration of Sensor Networks’

https://download.arxiv.org/pdf/1707.03682v1.pdf

For our FCAL implementation:

could would add “fake blocks” to turn the FCAL into a square. At the corners impute from neighbors row/column value for amplitude and assume perfect gain, i.e. 1.

would need to determine strategy for fake blocks during production.

no time component., just space.
how will we represent == v,b,g,V,B (small_violet, small_blue, etc) as an extra dimension?
the Gain is from all 5 pulse types but with unknown contribution

Use Hadamard product and Hadamard division for amplitude and gain to “create” A_drift, and A_clean for our “training” data.
We can use a Lambda function to impute the gain values with the Hadamard division (i.e. output A_clean Hadamard division: A_output /. A_input == g if we need to produce the gain constants for consistent CCDC/GlueX processes.

31 of 43

(2800 x 5)

or (60 x 60 x 5) for a CNN AutoEncoder

(?)

=

How to recover the gains from the multi input?

\

(2800)

or (60 x 60)

(2800)

or (60 x 60)

(2800)

or (60 x 60)

(2800)

or (60 x 60)

(2800)

or (60 x 60)

QUESTION: How can we learn the gain vector?

From the previous slide we have knowns:

A: ADC amplitudes - digital amplitudes from the PMTs - almost have these (01/26/2023: Torri to gather all the hits stored in the LED skims, plot them, fit with gaussian or whatever, and get the peak position)

L: the LED amplitudes - Torri is currently retrieving

gains: # /work/epsci/roark/FCAL/cpp_gains_v0.csv'

This looks like an Encoder Decoder model.

We have a known TRUTH for our encoded latent space: the LED amplitudes - these will be the EPICS values for the blue, green and violet pulses. Torri will be able to grab min/mean/max/sigma for the 160 LEDs. (Each block/PMT “sees” 40 LEDs, but only 10 at a time).

David’s idea is to train assuming known amplitudes for L and with all gains==1 (create a “reference FCAL”) to get an alpha clas to the actual coupling matrix. No bias terms.

Then using that alpha, retrain with a small learning rate and allow the gains to vary, but keep them in a reasonable range (we need to determine what a “reasonable range” is for the gains and how to keep them in a reasonable range).

LED (L) values need to have the ratio of the mean based on the Voltages (small and large for Violet and Blue) see reference:

Violet at 12 V - Violet at 22 V

Blue at 10 V - Blue at 15 V

Green at 29 V

Outstanding Questions: