Linux Clusters Institute:�Monitoring
J.D. Maloney | Sr. HPC Storage Engineer
National Center for Supercomputing Applications (NCSA)
malone12@illinois.edu
1
May 1-5, 2023
This document is a result of work by volunteer LCI instructors and is licensed under CC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/).
Purposes Behind Monitoring
2
May 1-5, 2023
External Sources of Monitoring Deliverables
3
May 1-5, 2023
Internal Sources of Monitoring Deliverables
4
May 1-5, 2023
High Level “Layers” to the Monitoring Stack
5
May 1-5, 2023
High Level “Layers” to the Monitoring Stack
6
May 1-5, 2023
What to Collect (Metrics)
7
May 1-5, 2023
What to Collect (Metrics)
8
May 1-5, 2023
Compute Infrastructure
Network Infrastructure
Network Infrastructure (cont.)
Storage Infrastructure
What to Collect (Metrics)
9
May 1-5, 2023
Scheduler/Job Related
Security Related
Common Infrastructure
Collection Tools
10
May 1-5, 2023
Collection Intervals
11
May 1-5, 2023
Metric Analysis and Storage
12
May 1-5, 2023
Visualization/Reporting
13
May 1-5, 2023
Notifications
14
May 1-5, 2023
Notification
15
May 1-5, 2023
Notification
16
May 1-5, 2023
Notification
17
May 1-5, 2023
Notification
18
May 1-5, 2023
Log Management
19
May 1-5, 2023
Log Management
20
May 1-5, 2023
A Handful of Monitoring Examples
21
May 1-5, 2023
Example: Grafana
22
May 1-5, 2023
Example: Grafana
23
May 1-5, 2023
Example: Kibana
24
May 1-5, 2023
Example: Kibana
25
May 1-5, 2023
Example: Zabbix
26
May 1-5, 2023
The Future Side of Monitoring
27
May 1-5, 2023
Questions?
28
May 1-5, 2023