Federated
Varnish & XCache
Deployment with SLATE
Service Deployment Models
There are services that we would like running at most of the ATLAS sites (eg. Perfsonar)
Standard way of doing it is to ask sys admins to run it:
Adding a service incurs a significant cost in sys-admin time:
People using a service need to communicate with sites, explain changes needed, debug things that break due to skipped updates, etc. sometime take months.
This makes for a unreliable service, security issues, slow rollout of new features, stifles innovation.
2
Service Deployment Models - cont’d
Federated way of doing it:
3
SLATE: Services Layer At The Edge
4
Trusted image registries
SLATE - Adding a service
Assuming an old-fashioned app…
5
SLATE - Managing a service
6
$ slate instance list
$ slate instance delete <instance name>
$ slate app install --group atlas-xcache --cluster uchicago-prod --conf MWT2.yaml xcache
Web interface
CLI
Kibana monitoring
XCache
ATLAS has two situations where data is remotely accessed:
Both practically require caching input data for faster subsequent accesses.
7
XCache in SLATE
A rather complex application with multiple containers:
Special requirements:
8
Deploying XCache
Once a site approved application, informed me of disk mounts, IP and labeled node we:
9
$ slate app install --group atlas-xcache --cluster esnet-lbl --conf ESnet.yaml xcache
$ slate secret create --group atlas-xcache --cluster esnet-lbl
--from-file userkey=xcache.key.pem --from-file usercert=xcache.crt.pem xcache-cert-secret
$ xrdcp -f root://198.129.248.94:1094//root://fax.mwt2.org:1094//pnfs/uchicago.edu/atlasdatadisk/rucio/data15_13TeV/3b/5d/AOD.11227489._001118.pool.root.1 /dev/null
XCache in SLATE
10
< 5 min
5-10 min
XCache Container Download
Kubernetes objects
instantiated
SLATE creates secrets and XCache deployment on cluster
Pod starts up, registers itself in Rucio
5-10 min
< 5 min
A data caching network deployed in less than 20 minutes.
Upgrades are as simple as re-deploying.
XCache monitoring
Several sources:
11
Squid
12
client site
remote
Stratum 1
Frontier
Stratum 1
Varnish
13
client site
remote
Frontier
Stratum 1
Frontier
Stratum 1
Serving Frontier requests
14
Serving CVMFS requests
15
vcl 4.1;
import dynamic;
import directors;
{{- range $nindex, $be := .Values.backends }}
backend {{ $be.name }} {
.host = "{{ $be.host }}";
.port = "{{ $be.port }}";
}
{{- end }}
acl local {
{{.Values.acl | nindent 4 }}
}
sub vcl_recv {
if (!(client.ip ~ local)) { return (synth(405));}
if (req.method != "GET" && req.method != "HEAD") {
return (pipe);
}
{{- range $nindex, $be := .Values.backends }}
if (req.restarts == {{ $nindex }}) {
set req.backend_hint = {{ $be.name }};
}
{{- end }}
}
sub vcl_backend_fetch { unset bereq.http.host; }
sub vcl_backend_response {
if ( beresp.status == 404 ) {
{{ $lb := last .Values.backends }}
if (bereq.backend != {{ get $lb "name" }} ){
set beresp.uncacheable = true;
return (deliver);
} else {
set beresp.ttl = 180s;
}
}
}
sub vcl_deliver {
if (resp.status == 404) {
if (obj.uncacheable){ return(restart);}
}
}
Varnish deployment
16
Performance Varnish - SNMP
17
Varnish for Frontier node.
File descriptors is 0 since it doesn’t use disk storage.
request/fetch
Data in/out
Objects
CPU
File descriptors
Performance Squid - SNMP
18
One of the Squid nodes.
Serving both Frontier and Squid.
request/fetch
Data in/out
Objects
CPU
File descriptors
Performance in Elasticsearch
19
Squid
Varnish
Testing it - CVMFS
20
Transactions: 101391 hits
Availability: 100.00 %
Elapsed time: 235.63 secs
Data transferred: 7059.98 MB
Response time: 0.05 secs
Transaction rate: 430.30 trans/sec
Throughput: 29.96 MB/sec
Concurrency: 22.02
Successful transactions: 93525
Failed transactions: 0
Longest transaction: 3.37
Shortest transaction: 0.03
Transactions: 101391 hits
Availability: 100.00 %
Elapsed time: 42.66 secs
Data transferred: 6894.09 MB
Response time: 0.01 secs
Transaction rate: 2376.72 trans/sec
Throughput: 161.61 MB/sec
Concurrency: 16.04
Successful transactions: 96796
Failed transactions: 0
Longest transaction: 4.01
Shortest transaction: 0.00
Varnish was under regular production load.
Squid was completely empty.
x6 faster!
Conclusions
21
Extras
22
23
Squid: A caching proxy for the Web supporting HTTP, HTTPS, FTP, and more. Squid reduces bandwidth and improves response times by caching and reusing frequently-requested web pages. Squid has extensive access controls and makes a great server accelerator. It runs on most available operating systems, including Windows and is licensed under the GNU GPL;
Varnish: High-performance HTTP accelerator. Varnish Cache is a web application accelerator also known as a caching HTTP reverse proxy. You install it in front of any server that speaks HTTP and configure it to cache the contents. Varnish Cache is really, really fast. It typically speeds up delivery with a factor of 300 - 1000x, depending on your architecture.
Testing it - Frontier
Harder to test (but will be done).
24
Transactions: 177602 hits
Availability: 99.96 %
Elapsed time: 37.07 secs
Data transferred: 1232.79 MB
Response time: 0.00 secs
Transaction rate: 4790.99 trans/sec
Throughput: 33.26 MB/sec
Concurrency: 16.55
Successful transactions: 177602
Failed transactions: 68
Longest transaction: 4.97
Shortest transaction: 0.00
Transactions: 177595 hits
Availability: 99.96 %
Elapsed time: 1612.89 secs
Data transferred: 1232.78 MB
Response time: 0.26 secs
Transaction rate: 110.11 trans/sec
Throughput: 0.76 MB/sec
Concurrency: 28.34
Successful transactions: 177595
Failed transactions: 75
Longest transaction: 24.38
Shortest transaction: 0.21
Varnish was under regular production load.
Squid was completely empty.
Huge difference!