1 of 25

XCache on RU-DataLake (on a dedicated server and on distributed nodes)

deployment and testing

with ATLAS tools

Andrey Zarochentsev, Aleksandr Alekseev, Stephane Jezequel,

Andrey Kiryanov, Alexei Klimentov, Tatiana Korchuganova, Danila Oleynik

2 of 25

Russian DataLake RnD

2

  • New site - MEPhI (Moscow)
  • New Torque (Local Resource Manager System) with task affinity patches
  • New ARC CE deployment
  • New monitoring (ELK)
  • Addition of node-local tests
  • dCache upgrade/downgrade at JINR
  • Network backbone reconfiguration in Moscow
  • A work on EOS-based write buffer system has been started
  • Distributed XCache on nodes

3 of 25

This report is about Distributed XCache on nodes

Direct Access without cache

Services for Distributed xCache on nodes:

Services on all nodes : cmsd – server, xrootd – server

Service on one node: cmsd – manager, xrootd - manager

3

xCache

WN

WN

WN

WN

WN

WN

xCache

WN

xCache

WN

xCache

WN

xCache

WN

WN

WN

Distributed xCache on nodes

Dedicated xCache

4 of 25

Technical characteristics

  • Work nodes at PNPI: 4x4 cores, Xeon E5-2680, 12GB RAM/node (VM)

4

PNPI ←

JINR ←

PNPI →

10Gbps

3Gbps, ~5.3 s Latency

JINR →

1.4Gbps, ~8.4 s Latency

5 of 25

Hammer cloud settings for tests

  • New HC templates for tests (thanks to Stephane):
      • 1132 (copy2scratch), 1129 (direct_access_lan)
      • Category: stress
      • Jobtemplate: CeleryProd/Datalakes/Derivation_21233_datalakes_folder_Russia.tpl
      • Input file size per event is higher than with HITS (one file 4 GB so 1000 events)
      • NUM DATASETS PER BULK: 1, MIN QUEUE DEPTH: 24 MAX RUNNING JOBS: 16
      • Duration time of tests: 1 days
  • Panda Queues:
      • PNPI-TEST2 (direct access without XCache)
      • PNPI_XCACHE-TEST (dedicated XCache)
      • PNPI_XCACHE-NODE (distributed XCache on WNs)

5

6 of 25

1st test:�Distributed xCache on nodes vs Dedicated XCache vs Direct Access (tests run simultaneously). Test copy2scratch.

Histograms and corresponding Gaussians curves

6

Download input file time*

totaltime*

payload (Athena) running time*

* walltime metrics reported by Pilot: time to fetch job, setup, stage in, run payload, stage out. totaltime is sum of them.

7 of 25

1st test:�Distributed xCache on nodes vs Dedicated XCache vs Direct Access (tests run simultaneously). Test copy2scratch.

Histograms and corresponding Gaussians curves

7

Download input file time*

* walltime metrics reported by Pilot: time to fetch job, setup, stage in, run payload, stage out. totaltime is sum of them.

8 of 25

1st test:�Distributed xCache on nodes vs Dedicated XCache vs Direct Access (tests run simultaneously). Test copy2scratch.

Histograms and corresponding Gaussians curves

8

totaltime*

* walltime metrics reported by Pilot: time to fetch job, setup, stage in, run payload, stage out. totaltime is sum of them.

9 of 25

1st test:�Distributed xCache on nodes vs Dedicated XCache vs Direct Access (tests run simultaneously). Test copy2scratch.

Histograms and corresponding Gaussians curves

9

payload (Athena) running time*

* walltime metrics reported by Pilot: time to fetch job, setup, stage in, run payload, stage out. totaltime is sum of them.

10 of 25

1st test:�Distributed xCache on nodes vs Dedicated XCache vs Direct Aaccess (tests run simultaneously). Test copy2scratch.

10

Direct access

PNPI-TEST2

Distributed XCache

PNPI_XCACHE-NODE

Single XCache server

PNPI_XCACHE-TEST

Download input file time

μ=137

σ=79

μ=30

σ=11

μ=32

σ=13

Athena running time

μ=321

σ=33

μ=411

σ=31

μ=335

σ=41

totaltime

μ=472

σ=81

μ=460

σ=43

μ=384

σ=50

Number of completed tests

121

107

120

11 of 25

2nd test:�Distributed XCache on nodes vs Dedicated XCache vs Direct Access (tests run separately). Test copy2scratch.

Histograms and corresponding Gaussians curves

11

Download input file time*

totaltime*

payload (Athena) running time*

* walltime metrics reported by Pilot: time to fetch job, setup, stage in, run payload, stage out. totaltime is sum of them.

12 of 25

Zabbix monitoring

12

PNPI_XCACHE-NODE

PNPI_XCACHE-TEST

PNPI-TEST2

13 of 25

Zabbix monitoring

13

14 of 25

Kibana XCache monitoring

Monitoring xCache with details for one of the files saved on node v010.

14

15 of 25

2nd test:�Distributed xCache on nodes vs Dedicated XCache vs Direct Access (tests run separately)

15

PNPI-TEST2

PNPI_XCACHE-NODE

PNPI_XCACHE-TEST

Download input file time

μ=290s

σ=677

μ=114s

σ=587

μ=27s

σ=7

Athena running time

μ=309s

σ=19

μ=340s

σ=82

μ=311s

σ=21

totaltime

μ=616s

σ=678

μ=469s

σ=607

μ=354s

σ=28

Number of completed tests

771

528

1109

CPU user time

μ=30%

σ=14

μ=31%

σ=12

μ=40%

σ=16

Incoming network

μ=46Mb/s

σ=31

μ=109Mb/s

σ=34

μ=74Mb/s

σ=28

16 of 25

Results and nearest plans

  • Distributed XCache on worker nodes was implemented.
  • The efficiency gain to use Distributed XCache (with the 10 Gbps internal/external connectivity) is minimal in comparison with other configurations such as a Dedicated xCache or Direct Access.
  • We expect the efficiency gain to go up on a site with more imbalance between internal and external connectivity and we plan to conduct the same tests on a site with different configuration.

16

17 of 25

Monitoring

RU-DataLake HC tests monitoring

  • Django based application
  • Data source: a local ElasticSearch storage where HC tests data provided by BigPanDA API is collected
  • Wide searching and filtering capabilities, e.g. by HC test ID, PanDA queue, etc.
  • Various interactive visualizations

Data Lake infrastructure monitoring based on ELK-stack:

  • Xrootd. Logs as a main datasource which provide the following information: the number of hits to files in the cache, size of the files in the cache and so on
  • Billing. Data is extracted from billing database in JINR and contains information about operations (billinginfo) and requests for operations (doorinfo)
  • Jobs. Contains different plots for jobs (only for RU cloud)
  • Accounting. Plots based on Information from cream-CE accounting DB

17

18 of 25

Thanks

18

19 of 25

Backup

19

20 of 25

1st test:�Distributed xCache on nodes vs Dedicated xCache vs Direct Access (tests run simultaneously)

20

21 of 25

2nd test:�Distributed xCache on nodes vs Dedicated xCache vs Direct Access (tests run separately)

21

22 of 25

Russian DataLake 2019 (phase 1)

Reading through xCache

Direct writing

22

JINR SE

dCache

site CE

xCache

site CE

xCache

site CE

xCache

site CE

xCache

23 of 25

Russian Data Lake Phase 2

(2020 – 2021)

Reading through xCache

Writing to closest pool

Replication on demand

23

JINR SE

EOS mgm

site CE

xCache

site CE

xCache

site CE

xCache

site CE

xCache

EOS pools

EOS pools

EOS pools

EOS pools

24 of 25

Russian Data Lake testbed on 2020�

24

25 of 25

Testbed changes

  • New site - MEPhI (Moscow)
  • New Torque with task affinity patches
  • New ARC CE deployment
  • New monitoring (ELK)
  • Addition of node-local tests
  • dCache upgrade/downgrade at JINR
  • Network backbone reconfiguration in Moscow
  • Distributed xCache on nodes

25