1 of 12

ProIO

David Blyth

2 of 12

The Project

  • Inspired by works from S. Chekanov and A. Kiselev
  • Lives at https://github.com/decibelcooper/proio
  • Ooh, shiny badges!
    • Continuous Integration: no code merges without sufficient testing.
    • Unit test coverage goal is to maintain > 90%
    • Automated code “quality” checks
  • Contributions of all kinds are welcome.

3 of 12

ProIO Key Concepts

  • ProIO is for PROS! It’s right in the name…

…J.K., the name has nothing to do with that, and everything to do with Google’s Protocol Buffers (Protobuf)

4 of 12

ProIO Key Concepts

  • Language-neutral I/O for streaming events
  • Thin, native containers for protobuf messages, simply adding the concept of an event
  • protobuf + event structure = ProIO
  • Serialized output can be accessed effectively in archival file, or in a stream

5 of 12

Event Data Models in ProIO

Thin proio wrappers

EIC

LCIO

...

Go

Python

C++

Java

Protobuf generated code

Protobuf generated code

Protobuf generated code

Protobuf generated code

Protobuf compiler

Data Models

6 of 12

Data Model Messages

  • Pure protobuf messsages
  • Written in a syntax that is simple and familiar
  • Can be modified and added to without writing any language-specific code
  • Does NOT have to be part of ProIO repo!

7 of 12

Event Structure

  • Entries
    • Each entry is an arbitrary protobuf message with a unique, persistent ID
  • Tags
    • Primary means of (non-linear) event data organization
    • Each tag is a mapping from a string to a list of entry IDs
  • Metadata
    • Key-value pairs that are shared among events

8 of 12

Bucket Structure

  • Buckets are the quantum of ProIO data “on the wire”
  • Configurable for payload size and compression type (gzip, lz4, or none)
  • Carries metadata to be attached to events
    • Metadata stored as key-value pairs
    • Each key-value pair is associated with all future events until it is overridden
  • Provides resynchronization in the case of corrupt data

9 of 12

Metadata

  • Intended to support things like attaching MC parameters, GDML, and magnetic field configurations
  • Like with event entry tagging, adoption of conventions for EIC is encouraged.
  • E.g., GDML may be injected into the ProIO stream with the “geometry” key.
  • Reconstruction should watch for this key to be attached to events.

“geometry” A

Event N

Event N

Event N

Event N

Event N

“geometry” B

10 of 12

Notes on MPI

  • Any HPC administrator will push the use of message passing.
    • They have good reasons for this.
  • MPI can benefit from an event container that is self-serializing.
  • Protobuf and ProIO provide, IMO, an elegant solution to this
    • ProIO events have value even if we don’t use ProIO streams.

11 of 12

Command Line Tools

  • Written in Go
    • proio-summary
    • proio-ls
    • proio-strip
    • lcio2proio

Try these out by pulling

docker://electronioncollider/anl-base,

or by setting up a simple Go environment and doing a “go get”:

go get github.com/decibelcooper/proio/go-proio/...

12 of 12

Future Work

  • Last bits of APIs are being added in near future, but are nearly stable right now.
    • Note: ProIO data are already stable! Last bits of API functionality will not break this!
  • Proposed JLab LDRD may put ProIO to the test in a streaming readout context
  • Summer student will work on
    • Graphical data browser implemented in Python
    • Generating MC events directly into ProIO