Data Knowledge Base
Torre Wenaus, BNL
PanDA Workshop
Apr 21-22 2016
CERN
DKB - what for?
This is what we have been calling the “data knowledge base” or “data product catalog” for a little over a year (Feb 2015 ADC weekly presentation by TW), with “a lot of effort” indeed being the sticking point on making a lot of progress
But there has been effort and even some progress
2
A concrete example of the need
Question posed this week: There are MC15c dijet samples with the 2015 mu profile (r7773)... However, I cannot find the request for these in the many Google Docs... so I cannot tell who requested this sample, and thus I don’t know there to look for the JIRA to get the link to the BigPanda table with the status... Would someone tell me the procedure?
Answered thus: There is no procedure. It needs to be done by hand. I either look through all the CP spreadsheets for it (on google), or I get the taskID from the dataset and then use a bunch of panda pages to find the request.
And "It is not possible to back link from a given dataset to the MC request that make it."
The actual procedure laid out...
3
Should we be asking physicists to do this?
1) rucio list-dids mc15_13TeV:mc15_13TeV.361025.Pythia8EvtGen_A14NNPDF23LO_jetjet_JZ5W.merge.AOD.e3668_s2576_s2132_r7773_r7676
2) Write down one of the dataset's tid's. I took tid07978260
3) Go to https://prodtask-dev.cern.ch/prodtask/task_table/#/?time_period=all&task_type=production
4) Type 7978260 into "Task ID" box and hit "Update table"
5) See the request is 6416
6) Go to https://prodtask-dev.cern.ch/prodtask/inputlist_with_request/6416/
7) You see it was CP FTAG group; the sample is at slice 24 and 100% done.
4
MC Production
Campaign Monitoring
5
Twiki based production
summaries (autogenerated)
6
Manually maintained
spreadsheets
7
ATLAS S&C should improve on this:
8
“Want to know the status of your samples? Ask us.”
“A standard analysis needs ~300-500 samples for their analysis.
We will now follow your recommendations and advise our group to contact atlas-phys-mcprod-team@cern.ch for each sample where they want to get information about the production status.
Maybe you are lucky and won't get any email, maybe you will get 500 mails per day with inquiries- hard to predict ;-)”
DKB Inputs & Elements
9
Current work
10
KB Prototype Applications
11
Proposed DKB objectives, deliverables
12
DKB deliverable ideas
13
The long view:
DKB and the Event Streaming Service
14
Once we have a DKB it can serve as the information gathering point and hub behind intelligent efficient data delivery through the event streaming service
Supplementary
15
The Event Service 2016
16
The 2015 Event Service is missing
its dataflow component, the
Event Streaming Service
The Event Streaming Service (ESS)
17
Building the ESS
Two primary components: Data Streaming Service
Informed by the Data Knowledge Base providing the intelligence on
18