ATLAS’ Wish List on Xrootd Proxy Cache

Wei Yang

The Xrootd Workshop 2016

What is it and Why?

Squid like cache proxy on surface

  • Use disk to cache data
  • Work around firewall
  • Easy to use : http_proxy, In the future: xroot_proxy=root://mycompany.edu:port/

Different under the hook:

  • For static, large files
  • Multi-thread to handle data intensive load
  • Capable of both whole file caching and file block caching (focus on the later)
  • Protocols to client: xroot, http, and (add your protocol plugin here)

Can be a transparent, (mostly) configuration free layer

At large scale, it is an unmanaged storage.

Goal: Improve data access efficiency, reduce data manage overhead

2

Wei Yang The Xrootd Workshop @ ICEPP, Tokyo Univ. 2016-11-08

Lesson learned from the FAX project

Need to tolerant infrastructure fluctuation

  • Do not build a rigid system
  • Consider scenarios: expected behaviors if data sources and/or disks fail
  • Most importantly, don’t use it as a rigid system

Better to have the baseline system be independent of the experiment’s DM and WFM

  • Make it transparent to ATLAS
  • Not quite (ATLAS) :A standalone product applicable outside of HEP

Interaction with DDM system needs to be modularized - add-ons

  • Being able to utilized the data in the cache is still a long way to go in DM and WFM

3

Wei Yang The Xrootd Workshop @ ICEPP, Tokyo Univ. 2016-11-08

How do we want to use the proxy cache?

We: ATLAS pilot, Athena, prun, facilities, etc. not users

  • For users: it should be transparent

An efficient cache of course

  • Can a proxy cache with a dozen of SSDs (or a cluster) match the performance of a Tier 2 HDD farm?

Cloudy Tier 2

  • Unmanaged storage space? Not limited to a Tier 2

Can the cache figure out where are the data sources

  • RUCIO provides metalink
  • Federation in another way
  • Complementary to the RUCIO/metalink “federation”
  • Made it possible to access via the gLFN - easier to use
    • Important: simultaneously support gLFN and PFN

4

Wei Yang The Xrootd Workshop @ ICEPP, Tokyo Univ. 2016-11-08

Some kind of (reverse) N2N or plugin?

How the cache identify the same file from difference data sources?

/pnfs/bnl/atlasdatadisk/rucio/data15_13TeV/91/c7/DAOD_SUSY1.0._1.pool.root.1

/gpfs/atlas/scratchdisk/rucio/data15_13TeV/91/c7/DAOD_SUSY1.0._1.pool.root.1

/dpm/lal/localgroupdisk/rucio/data15_13TeV/91/c7/DAOD_SUSY1.0._1.pool.root.1

Cache the file using the RUCIO DID: data15_13TeV:DAOD_SUSY1.0._1.pool.root.1

Note this is N → 1 mapping.

5

Wei Yang The Xrootd Workshop @ ICEPP, Tokyo Univ. 2016-11-08

ATLAS Wish List on Xrootd Cache - Google Slides