XCache
Deployment
Integration
Performance
Ilija Vukotic
December 4, 2018
WBS 2.3.5.4 Intelligent Data Delivery
Current situation
2
Current understanding of data reuse
Production input are slightly more cacheable (52% accesses and 67% data volume) than Analysis inputs (35% accesses and 37% data volume).
Different file types have very different access patterns (eg. HITS, EVNT, payload files are very cacheable, DAODs, panda*, AODs less so).
As expected it rarely happens the same file is accessed at two different sites.
Claim: even a cache of 100TB per site would be sufficient to deliver roughly half or the accesses and data volume.
3
Current understanding of US scale cache
Multilayer cache wouldn’t significantly help.
Throughput generated between sites would be reasonable.
4
Can XCache deliver what is needed?
Can SLATE deliver what is needed?
5
Try it and find out!
XCache TODO list from November 7th
6
DONE
DONE
DONE
DONE
Hospital queues
Queues that get jobs not accepted anywhere else. Configured to use remote SEs for input but all inputs go through local XCaches. Output the same as regular jobs.
Setup at MWT2 (production and analysis) and AGLT2 (production). Thanks Judith and Wenjing!
We “mounted” them with US DATADISKs. Not without issues:
7
XCaches for Hospital queues
MWT2
AGLT2
8
Monitoring
XCache reporter
Panda job reports
Rucio Traces
K8s monitoring
9
Monitoring
XCache reporter
Panda job reports
Rucio Traces
K8s monitoring
10
Monitoring
XCache reporter
Panda job reports
Rucio Traces
K8s monitoring
11
Monitoring
XCache reporter
Panda job reports
Rucio Traces
K8s monitoring
12
XCache findings
13
SLATE findings
Very easy to deploy, redeploy, remove application instance. I basically issue 3 commands and I’m done in 30 seconds flat:
I was able to completely manage AGLT2 xcache without Shawn’s or Wenjing’s involvement except initial SLATE and HW setup.
SLATE instance monitoring needs a bit of improvement (logs, per container/pod, instance metrics/events), but even without it completely usable for production level applications.
14
./slate instance list
./slate instance delete <instance name>
./slate app install --vo atlas-xcache --cluster uchicago-prod --conf MWT2.yaml xcache
Next steps
15
Reminder
slides
16
XCache simulation
All simulations:
17
Access overlaps between sites
18
PRODUCTION | ||||
site1 | site2 | as percent of site1 unique files | as percent of site2 unique files | files accessed on both sites |
MWT2 | AGLT2 | 2.82% | 11.21% | 67538 |
MWT2 | NET2 | 1.75% | 4.03% | 42032 |
MWT2 | SWT2 | 5.74% | 9.77% | 137555 |
MWT2 | BNL | 9.62% | 7.00% | 230618 |
AGLT2 | NET2 | 5.33% | 3.08% | 32117 |
AGLT2 | SWT2 | 6.75% | 2.89% | 40668 |
AGLT2 | BNL | 15.80% | 2.89% | 95171 |
NET2 | SWT2 | 2.06% | 1.52% | 21423 |
NET2 | BNL | 7.14% | 2.26% | 74427 |
SWT2 | BNL | 8.96% | 3.83% | 126087 |
ANALYSIS | ||||
site1 | site2 | as percent of site1 unique files | as percent of site2 unique files | files accessed on both sites |
MWT2 | AGLT2 | 0.79% | 1.97% | 26761 |
MWT2 | NET2 | 0.58% | 2.26% | 19756 |
MWT2 | SWT2 | 0.68% | 1.27% | 22974 |
MWT2 | BNL | 2.99% | 1.32% | 101525 |
AGLT2 | NET2 | 1.00% | 1.56% | 13605 |
AGLT2 | SWT2 | 1.09% | 0.82% | 14854 |
AGLT2 | BNL | 3.33% | 0.59% | 45354 |
NET2 | SWT2 | 1.80% | 0.87% | 15727 |
NET2 | BNL | 6.87% | 0.78% | 60137 |
SWT2 | BNL | 3.14% | 0.74% | 56809 |
Let’s try simulate 2 layer cache - numbers
19
site | cleanups | avg. accesses | requests | cache hits | requested [TB] | delivered from cache [TB] | cache hits | cache data |
xc_AGLT2 | 110 | 8.0 | 2350045 | 1305904 | 4553.6 | 2654.9 | 55.57% | 58.30% |
xc_BNL | 149 | 3.3 | 9926097 | 5113342 | 18178.2 | 10600.2 | 51.51% | 58.31% |
xc_MWT2 | 301.5 | 5.9 | 8027849 | 4303754 | 13685.7 | 8720.7 | 53.61% | 63.72% |
xc_NET2 | 124.75 | 14.5 | 3549113 | 2144678 | 7307.7 | 5166.3 | 60.43% | 70.70% |
xc_SWT2 | 178 | 6.2 | 4275897 | 2233458 | 7542.2 | 4551.5 | 52.23% | 60.35% |
xc_Int2 | 237.8 | 1.1 | 13027865 | 731289 | 19573.8 | 1072.1 | 5.61% | 5.48% |
ORIGIN | | | 12296576 | 12296576 | 18501.7 | 18501.7 | | |
Production Inputs, August and September
Let’s try simulate 2 layer cache - plots
20
Production Inputs, August and September
Let’s try simulate 2 layer cache - numbers
21
Analysis Inputs, August and September
site | cleanups | avg. accesses | requests | cache hits | requested [TB] | delivered from cache [TB] | cache hits | cache data |
xc_AGLT2 | 70.75 | 1.8 | 2514129 | 938976 | 2336.8 | 1060.0 | 37.35% | 45.36% |
xc_BNL | 174 | 1.9 | 15487492 | 5437455 | 14821.1 | 6038.6 | 35.11% | 40.74% |
xc_Int2 | 182.8 | 1.0 | 18878388 | 309758 | 14832.3 | 459.4 | 1.64% | 3.10% |
xc_MWT2 | 162.25 | 2.7 | 6868473 | 2695981 | 5099.0 | 2361.3 | 39.25% | 46.31% |
xc_NET2 | 26 | 2.0 | 1588378 | 632108 | 1005.4 | 446.3 | 39.80% | 44.39% |
xc_SWT2 | 83.5 | 1.9 | 3138866 | 1014430 | 2571.2 | 1095.1 | 32.32% | 42.59% |
ORIGIN | | | 18568630 | 18568630 | 14372.9 | 14372.9 | | |
Let’s try simulate 2 layer cache - plots
22
Analysis Inputs, August and September
Adding 350 TB of cache to the system
23
4x10 | MWT2, AGLT2, NET2, SWT2 |
4x30 | BNL |
5x100 | Int2 |
9x10 | MWT2, AGLT2, NET2, SWT2 |
9x30 | BNL |
5x30 | Int2 |
What kind of traffic would XCache generate?
We assume each client consumes ⅓ of 1Gbps link (43MB/s). This is an upper limit of what our codes can read (uncompress) and ~2x of what is average read speed measured from LSM monitoring. We assumed enlarged T2 caches.
24
Simulated
Traffic 1
25
Simulated
Traffic 2
26
Simulated
Traffic 3
27
Simulated
Traffic 4
28
MWT2 traffic
All our dCache servers.
Some admixture of OSG jobs...
29
~ 5Gbps ingress (FTS ingress + job output)
~25 Gbps egress (FTS egress + job input)
MWT2 current FTS
30
Egress:
2.6 Gbps
Ingress:
3 Gbps
Containerized XCache service basics
XCache needs:
31
XCache related information
ATLAS information
32
Why?
We have a subscription that was defined more than 3 years ago to have a complete replicas of all DAODs to the US sites:
https://rucio-ui.cern.ch/subscription?name=DAODs%20to%20US%20T2%20DATADISK&account=ddmadmin https://its.cern.ch/jira/browse/ATLDDMOPS-5089
During the last days we accumulated a huge backlog of files to transfer on a few sites :
Nbfiles Bytes RSE
7705227 270458200008683 SLACXRD_DATADISK
542134 127768052728015 MWT2_DATADISK
498550 161189127282917 NET2_DATADISK
466033 162974407287848 AGLT2_DATADISK
These transfers compete with other transfers like Production Input or T0 export.
Is this subscription still needed taking into account that a huge volume of DAODs are never touched ? Or can we disable it and think about a smarter placement (e.g. only send the an extra copy of the DAODs that were accessed at least once) ?
Cheers,
Cedric
33
CAVEAT EMPTOR
Cache performance will completely depend on how we will use them:
34