1 of 9

AI: Data Quality Monitoring

Thomas Britton

David Lawrence

Naomi Jarvis

2 of 9

From DAQ to ANA

6/16/20

Rootspy

Shift Takers

Incoming data

Detector experts

Brief

Online Monitoring

coordinator

Monitoring launch

Analysis launch

Reconstruction

Launch/test

Analysis coordinator

Minutes

Hours

Day

Weeks+

Offline Monitoring

coordinator

was Thomas

3 of 9

The Challenge

Every run produces an initial 22 plots. More thorough monitoring is performed offline and produces 109 plots. With a run lasting ~3 hours every day there are between ~175 and 875 plots to look at.

To preserve sanity I looked at closer to 175 plots, but there is no reason a machine couldn’t aid in looking at all of them...

Often times a single plot being “off” is not an indication of problems. Need to look at all the plots to determine cause and severity

Trigger studies: Often look like big problems but are not. Can be hard to catch when shift logs have scant details

6/16/20

4 of 9

Introducing Hydra

Hydra aims to be an extensible framework for training and managing A.I. for near real time monitoring

If you need it to tell a dog from cat�I can have hydra do that, without�system modification, now

Most importantly, Hydra allows me to embrace�my inner sloth:

6/16/20

Koboldpress.com

5 of 9

An Anecdote

Both of these look “good” at first glance (both initially labeled good)

The one on the left is actually bad (the A.I. caught it)

A.I. seems to be able to look at subtle differences in shape and maybe even “read” the y-axis

6/16/20

6 of 9

Another anecdote

The labeler was instructed by the detector expert to label any plot containing fewer than 100k events as “NoData”. This is one example of several in which the labeler labeled as “Good” and the A.I. predicted “NoData”...the true label given the number of events

6/16/20

7 of 9

Hydra Run

8/4/20

HydraRun

near real time look
auto-updating
borders turn red when there might be a problem

8 of 9

Hydra Log

8/4/20

Shows a trailing 24 view of all unconfirmed (confidence below tunable parameter) or “bad” plots

9 of 9

Conclusion

8/4/20

Modular design of Hydra allows for a bifurcation of inference and action

0MQ reports

Deployed in hall-D

looking at 8 plots on ~1 min time frame
produces pared down “logs” of only things it isn’t certain on and “bad” plots

Polishing deployment, adding in data (more EPICS data), work on the “body of hydra”

non-trivial: Making determinations with asynchronous and/or incomplete data