1 of 24

The

BaBar Long Term Data Preservation

and

Computing Infrastructure

Marcus Ebert

BaBar Computing Coordinator

on behalf of

BaBar

2 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 2

The BaBar Experiment

BaBar

collab.

1993 - ?

  • collider experiment at

  • BaBar founded 1993

  • data taking 1999-2008

3 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 3

BaBar Status

  • BaBar stopped data taking in 2008, anticipated to do data analyses until 2018
    • but still actively doing analyses (local, no Grid usage)
      • 223 active authors from 14 countries
    • 27 new analyses publications since 2018 (more than 60 incl. conference proceedings)
    • 5 analyses published in 2023 (not incl. conference proceedings)

  • Beginning of 2021: support for infrastructure at SLAC finally stopped
    • support extended from 2018 to beginning of 2021

  • To be able to continue, everything still needed had to be moved away from SLAC
    • very tightly integration of SLAC services and BaBar services, grown over years

  • Analysis system and documentation moved to University of Victoria

4 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 4

What is needed for new system?

  • Data
    • collected collision data and generated MC events (~1.5PB)
      • all in root files
    • metadata stored in mysql database
      • number of events per root file, dataset,...

  • Analysis environment
    • software is 32bit, users usually write C++ code and compile their analysis modules
      • does not compile on 64bit-only systems
    • depends on older software releases, e.g. perl, xrootd,...
      • latest verified system: SL6.3, gcc 4.4.x, kernel 2.6,...

  • Documentation
    • new users still join, sometimes just for a single analysis
    • preserving documentation only way to have someone successfully started

  • Collaboration tools
    • calendar, analysis review, mailing lists, meeting organizer, ...

5 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 5

Analysis Environment - Overview

  • User need to compile locally -> user accounts/management

  • Users need to run over thousands of data files -> batch system

  • Batch system jobs need to access local user environment -> shared file system

  • All needs to run in an outdated, unsecured environment -> isolation

  • Users want to take their job output home -> data transfer machine

  • Jobs need to access data in root files -> XRootD system

  • Hardware replacement uncertain -> redundancy needs to be built in

6 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 6

Isolation

  • OS and tools frozen, since long time without security updates
    • BaBar already used a VM based system at SLAC

  • login node reachable from the outside
    • current OS that gets security updates
  • interactive VM can only be accessed from the login node; limited access to outside
    • interactive VM based on BaBar’s approved image

login node

interactive BaBar VM

(no access to it from anywhere else)

BaBar-To-Go is alternative to use UVic system.

7 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 7

Batch System

  • BaBar used Torque/Maui/LSF before leaving SLAC
  • HEP-RC group uses HTCondor to start Openstack VMs on demand as worker nodes
    • needed to write wrapper scripts
      • framework/users -> torque/maui/LSF commands -> wrapper script -> HTCondor commands
      • HTCondor command output->wrapper script->torque/maui/LSF style output->framework/users

  • Openstack VMs also isolated,

very limited access to anything

outside, no public IP address

login node

and

batch system head node

Openstack cloud providing worker nodes using

BaBar VM image

Openstack worker node VMs are started on demand by cloudscheduler

(https://csv2.heprc.uvic.ca/public/)

8 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 8

Shared File System

  • AFS at SLAC
    • all of BaBar’s software in a well defined directory structure
    • use NFS on new system

login node

and

batch system head node

Openstack cloud providing worker nodes using

BaBar VM image

interactive BaBar VM

(no access to it from anywhere else)

NFS mounts using AFS paths

9 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 9

User Accounts and Management

  • everyone in BaBar had an account at SLAC
    • can’t do that here, most people do not need it anymore
    • local accounts only for active analyses
    • using local NIS

login node

and

batch system head node

Openstack cloud providing worker nodes using

BaBar VM image

interactive BaBar VM

(no access to it from anywhere else)

NFS mounts using AFS paths

NIS server

10 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 10

Data Access

  • access needed from interactive machine and from worker nodes
    • BaBar uses XRootD, built into the framework (users do not need to care where the data is)
      • XRootD redirector needs to be specified in BaBar environment
      • data input: streamed

login node

and

batch system head node

Openstack cloud providing worker nodes using

BaBar VM image

interactive BaBar VM

(no access to it from anywhere else)

NFS mounts using AFS paths

NIS server

xrootd redirector

xrootd

server

11 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 11

Data

  • GridKa offered to store data and MC files from the latest processing run (AllEvents, skims, conditions db,...) for active usage

  • GridKa also continues to host the metadata db (mariadb)

  • IN2P3 hosts since a long time a second copy of all BaBar data, incl. raw data, as backup (not for active usage) and agreed to continue to do that

  • CERN offered to also host a copy of all data

12 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 12

Data Access

  • Data available to analyses: ~1.5PB

  • Framework at UVic needs to access

data at GridKa via streaming...

  • works surprisingly well for normal event data
    • workflow: read event, process, read event, process,...
  • but conditions data is also read via streaming
    • large amount of data each job needs to read...

Doable but very slow processing

---> use XRootD proxy system

13 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 13

  • direct access to GridKa ---> access via local cache system

Data Access

GridKa XRootD access point

interactive BaBar VM

worker nodes on Openstack

XRootD redirector

XRootD disk proxy

XRootD disk proxy

XRootD disk proxy

XRootD disk proxy

14 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 14

Documentation

  • different systems used:
    • html web pages: in AFS within well defined directory structure, r/w rights via ACL, every BaBar user had a SLAC account; edit html files directly in AFS

    • Wiki: added ~2012 to have self contained system editable by anyone in the collaboration via web browser

  • html web pages: visible to public or specific groups via .htaccess files, difficult to

maintain content, for historic purpose

  • Wiki: visible only to BaBar members, easy to maintain content,

main BaBar documentation

  • two new web servers at UVic and a single public web page (rest got access restricted to BaBar members)

15 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 15

Collaboration Tools

  • SLAC based mailing lists ---> Caltech mailing lists
    • only created what is still needed

  • old meeting agendas were HTML pages, registration based on SLAC systems

---> switch to use CERN Indico

  • Hypernews was deeply integrated into SLAC
    • sending emails for posts to SLAC emails, notify SLAC systems in case of issues, people joining need SLAC UNIX account,... - but all content of posts in text files

---> moved Hypernews to UVic, made read-only, and removed any mailing

feature -> still readable and archive of any communication happened in the past

---> replacement: CERN egoups

  • also nicely integrated with CERN Indico for accessing BaBar meetings

16 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 16

Redundancy/Reliability

Hardware overview:

  • XRootD proxy server: old machines old machine==out of warranty
  • XRootD redirector: VM on an old machine
  • login machine: VM on an old machine
  • BaBar interactive VM: VM on an old machine
  • NIS server: VM on an old machine
  • web server: on VM on an old machine
  • babar wiki: VM on an old machine
  • babar Hypernews: VM on an old machine
  • NFS server: one new server, multiple old machines

Redundancy/Reliability:

  • protect against disk failure
  • protect against server failure

17 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 17

Redundancy/Reliability

login machine VM

NIS Server VM

interactive VM

XRootD redirector VM

-----------------------------

hardware raid1 OS

ZFS mirror data disks

  • spare server setup the same way

  • ZFS send/receive

18 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 18

Redundancy/Reliability

login machine VM

NIS Server VM

interactive VM

XRootD redirector VM

-----------------------------

hardware raid1 OS

ZFS mirror data disks

XRootD proxy server

-----------------------------

hardware raid1 for OS

ZFS raidz3 data disks

  • multiple servers available

  • just cache, loose no data
  • spare server setup the same way

  • ZFS send/receive

19 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 19

Redundancy/Reliability

login machine VM

NIS Server VM

interactive VM

XRootD redirector VM

-----------------------------

hardware raid1 OS

ZFS mirror data disks

XRootD proxy server

-----------------------------

hardware raid1 for OS

ZFS raidz3 data disks

Web documentation VM

Wiki VM

Hypernews (HN) VM

---------------------------------------

hardware raid1 for OS

ZFS raidz3 for data disks

  • multiple servers available

  • just cache, loose no data
  • web content on NFS

  • HN content on NFS

  • images backed up

  • daily mysql dump to NFS
  • spare server setup the same way

  • ZFS send/receive

20 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 20

Redundancy/Reliability

login machine VM

NIS Server VM

interactive VM

XRootD redirector VM

-----------------------------

hardware raid1 OS

ZFS mirror data disks

XRootD proxy server

-----------------------------

hardware raid1 for OS

ZFS raidz3 data disks

Web documentation VM

Wiki VM

Hypernews (HN) VM

---------------------------------------

hardware raid1 for OS

ZFS raidz3 for data disks

4 NFS server:

NFS $HOME

NFS job output

NFS framework

NFS documentation

---------------------------

all use:

ZFS raidz2/3

hardware raid1 for OS

  • multiple servers available

  • just cache, loose no data
  • web content on NFS

  • HN content on NFS

  • images backed up

  • daily mysql dump to NFS
  • spare server setup the same way

  • ZFS send/receive

  • spare server setup in the same way

  • ZFS send/receive

  • extra backup of framework and documentation

21 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 21

Summary

To run an old and outdated analysis environment on current infrastructure:

  • Keep analysis and documentation framework in a well defined directory structure

  • Outdated analysis environments can be preserved in VM image

  • Running on clouds can make use of such VM image
  • Running on clouds to not depend on specific hardware/worker node machines

  • Data access via xrootd gives good choices for data server setups

  • Keeping a tape backup of framework and documentation
  • Keeping data backup at independent sites

  • Using mirrored server infrastructure to account for old hardware

22 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 22

Conclusion

Running analyses in an old and outdated environment is possible and can be done safely and very well using current infrastructure solutions like clouds.

Big Thanks

to the

GridKa, CERN, IN2P3, INSPIRE, Caltech, and UVic HEP-RC groups!

23 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 23

Other Collaboration Tools

  • Analysis documents, notes, and Analysis metadata
    • old content archived to INSPIRE
    • new documents will be added too for long term preservation

new system for active analyses and management:

    • Google drive folder for each analysis
      • for documents and other informations
    • Google sheets for metadata of each analysis
    • review done using CERN egroups (each analysis has its own)
    • specific folders for SpeakersBureau, PublicationBoard,...

24 of 24

CHEP 2024, October 21 Marcus Ebert (mebert@uvic.ca) 24

Open Data

  • making data openly available is possible, but not useful by itself

  • to make use of the data one also needs
    • Analysis framework
    • Documentation
    • Communication with collaboration members

‘BaBar Associates’ open-access:

  • anyone can join (== data access for anyone)
    • full access to communications and documentation tools and archives
    • analyses for publication to be done within BaBar publication framework
      • e.g. going through the full review process
    • https://babar.heprc.uvic.ca/www/join_BaBar.html

Access to BaBar framework: analysis system at UVic, BaBar-To-Go (VM) at home