Proio: YAIO!
David Blyth
Introduction
Why create YAIO (Yet Another IO scheme)?
In descending order of importance:
Pros and Cons of Protobuf
Pros
Cons
Options for IO Formats
Consider...
IO Features in a Venn Diagram
Assert: blue features are desired for forward-looking IO scheme
⇒ proio
LCIO
EicMC
ProMC
A E
B
C
F G
D
Utilizing Protobuf in proio
Thin proio wrappers
Data Models
LCIO
ProMC
...
Go
Python
C++
Java
Protobuf generated code
Protobuf generated code
Protobuf generated code
Protobuf generated code
Protobuf compiler
Data format managed by thin wrappers
Magic Number (synchronization)
EVent Header Size
Payload Size
Event Header
Table of Contents
Lists names, types, and sizes of protobuf messages in payload
Event Payload
MCParticle Collection (e.g.)
SimCalorimeterHit Collection (e.g.)
SimTrackerHit Collection (e.g.)
Magic Number (synchronization)
EVent Header Size
Payload Size
Event Header
Table of Contents
Lists names, types, and sizes of protobuf messages in payload
Event Payload
MCParticle Collection (e.g.)
SimCalorimeterHit Collection (e.g.)
SimTrackerHit Collection (e.g.)
Data Stream
New Event
Proio data structure vs. ProMC/EicMC
ProMC/EicMC
MC event oriented
Entire event is deserialized at once
Contains specific structure for evgens
Proio
Collection oriented
Collections are deserialized at once
No specific structure beyond basic events/collections
→
→
→
Example of Random Collection Access
Scenario:
Event Header
Table of Contents
Lists names, types, and sizes of protobuf messages in payload
Event Payload
MCParticle Collection (e.g.)
SimCalorimeterHit Collection (e.g.)
SimTrackerHit Collection (e.g.)
Proio Streams and Files
Piping:
lcio2proio sample.slcio | proio-ls -
Concatenating:
$ cat sample1.proio sample2.proio > allsamples.proio
Cutting:
dd if=all.proio of=roughtCut.proio bs=1M count=1 skip=1
proio-strip -o cleanCut.proio roughCut.proio
Proio Base Tools
Tools
Proio Data Models
model/
lcio/
promc/
lcio.proto
proio.proto
promc.proto
Proio Data Models
model/
lcio/
promc/
lcio.proto
proio.proto
promc.proto
eicio/
eicio.proto
anl.proto
...
Go installation
$ go get github.com/decibelcooper/proio/go-proio/…
This single command acquires and builds the Go library along with most of the base proio tools. Provided that $GOPATH and $PATH are set up appropriately, the tools are then immediately available.
Installation for other languages
Canonical build systems chosen for each language:
Please see the appropriate subdirectory in https://github.com/decibelcooper/proio for more details
Python example
Install with…
$ pip install --user proio
Python write example
Python write example, cont’d
Python read example
Python read example
File Size Benchmarks
Data set | Proio size | LCIO size | ProMC size | Comments |
Pythia8 35 GeV DIS MC (50K events) | 24 MiB | 67 MiB | 37 MiB | Sparse information (zero-vector position, e.g.) |
Lepto 35 GeV DIS MC (50K events) | 27 MiB | 56 MiB | 33 MiB | “” |
Pythia8 35 GeV DIS Recon. (500 events) | 24 MiB | 22 MiB | NA | Dense information |
Pythia8 14 TeV t tbar (10K events) | 482 MiB | 390 MiB | 308 MiB | Elaborate ancestry. Many parents/children for some particles. |
Performance Benchmarks (Go only)
Scenario:
Analysis routine for calculating track efficiency. Reading from file with full reconstruction data
Time / Event is dominated by event read time in .proio and .slcio cases for this scenario
Caveat: Go LCIO library is likely not as optimized as C++ LCIO library.
File Format | Time / Event |
.proio | ~200 𝛍s |
.proio.gz | ~2 ms |
.slcio | ~45 ms |
Future Work