1 of 9

AI: Data Quality Monitoring

Thomas Britton

David Lawrence

Naomi Jarvis

2 of 9

From DAQ to ANA

6/16/20

2

Rootspy

Shift Takers

Incoming data

Detector experts

Brief

Online Monitoring

coordinator

Monitoring launch

Analysis launch

Reconstruction

Launch/test

Analysis coordinator

Minutes

Hours

Day

Weeks+

Offline Monitoring

coordinator

was Thomas

3 of 9

The Challenge

  • Every run produces an initial 22 plots. More thorough monitoring is performed offline and produces 109 plots. With a run lasting ~3 hours every day there are between ~175 and 875 plots to look at.
    • To preserve sanity I looked at closer to 175 plots, but there is no reason a machine couldn’t aid in looking at all of them...
  • Often times a single plot being “off” is not an indication of problems. Need to look at all the plots to determine cause and severity
    • Trigger studies: Often look like big problems but are not. Can be hard to catch when shift logs have scant details

6/16/20

3

4 of 9

Introducing Hydra

  • Hydra aims to be an extensible framework for training and managing A.I. for near real time monitoring
    • If you need it to tell a dog from catI can have hydra do that, withoutsystem modification, now
  • Most importantly, Hydra allows me to embracemy inner sloth:

6/16/20

4

Koboldpress.com

5 of 9

An Anecdote

  • Both of these look “good” at first glance (both initially labeled good)
    • The one on the left is actually bad (the A.I. caught it)
  • A.I. seems to be able to look at subtle differences in shape and maybe even “read” the y-axis

6/16/20

5

6 of 9

Another anecdote

  • The labeler was instructed by the detector expert to label any plot containing fewer than 100k events as “NoData”. This is one example of several in which the labeler labeled as “Good” and the A.I. predicted “NoData”...the true label given the number of events

6/16/20

6

7 of 9

8/4/20

7

  • HydraRun
    • near real time look
    • auto-updating
    • borders turn red when there might be a problem

8 of 9

8/4/20

8

  • Shows a trailing 24 view of all unconfirmed (confidence below tunable parameter) or “bad” plots

9 of 9

Conclusion

8/4/20

9

  • Modular design of Hydra allows for a bifurcation of inference and action
    • 0MQ reports
  • Deployed in hall-D
    • looking at 8 plots on ~1 min time frame
    • produces pared down “logs” of only things it isn’t certain on and “bad” plots
  • Polishing deployment, adding in data (more EPICS data), work on the “body of hydra”
    • non-trivial: Making determinations with asynchronous and/or incomplete data