Linux Clusters Institute:�Storage Scale & Ceph
J.D. Maloney | Lead HPC Storage Engineer
Storage Enabling Technologies Group (SET)
National Center for Supercomputing Applications (NCSA)
malone12@illinois.edu
Oklahoma University, May 13th – 17th 2024
Storage Scale (GPFS) Overview
highly reliable disks presented by redundant controllers
2
May 13 – 17, 2024
Quick History of Storage Scale
3
May 13 – 17, 2024
4
May 13 – 17, 2024
Image Credit: spectrumscale.org
Stand Out Storage Scale Features
5
May 13 – 17, 2024
Stand Out Storage Scale Features
6
May 13 – 17, 2024
Stand Out Storage Scale Features
7
May 13 – 17, 2024
Storage Scale Weaknesses
8
May 13 – 17, 2024
Storage Scale Appliances
9
May 13 – 17, 2024
IBM/Lenovo
Dell
HPE
Storage Scale Hardware
10
May 13 – 17, 2024
Storage Scale Concepts
11
May 13 – 17, 2024
Key Definitions
12
May 13 – 17, 2024
Scaling Out
13
May 13 – 17, 2024
Cluster vs Scatter
14
May 13 – 17, 2024
Storage Scale NSD Server
15
May 13 – 17, 2024
Storage Scale Architecture
16
May 13 – 17, 2024
Image Credit: ibm.com
File Sets
17
May 13 – 17, 2024
Storage Scale Tuning
18
May 13 – 17, 2024
Tuning Parameters
19
May 13 – 17, 2024
Tuning Parameters
20
May 13 – 17, 2024
File System Block Size
21
May 13 – 17, 2024
Tuning Parameters
Tuning Parameters
Page Pool
22
May 13 – 17, 2024
Tuning Parameters
maxMBpS
23
May 13 – 17, 2024
Tuning Parameters
maxFilesToCache
24
May 13 – 17, 2024
Tuning Parameters
maxStatCache
25
May 13 – 17, 2024
Tuning Parameters
nsdMaxWorkerThreads
26
May 13 – 17, 2024
Storage Scale Node Classes
27
May 13 – 17, 2024
GPFS Node Classes
28
May 13 – 17, 2024
Creating a Node Class
29
May 13 – 17, 2024
List of Node Classes
30
May 13 – 17, 2024
Storage Scale Snapshots
31
May 13 – 17, 2024
What Is A Snapshot
32
May 13 – 17, 2024
What Is A Snapshot
33
May 13 – 17, 2024
Snapshot Types
File System Snapshot
Fileset Snapshot
34
May 13 – 17, 2024
Snapshot Storage
35
May 13 – 17, 2024
Snapshot Creation
36
May 13 – 17, 2024
Listing Snapshots
37
May 13 – 17, 2024
Snapshot Deletion
38
May 13 – 17, 2024
File Level Restore from Snapshot
39
May 13 – 17, 2024
Snapshot Restore Utility
# mmsnaprest -h
GPFS Restore From Snapshot
Please note: This utility uses rsync style processing for directories. If
you are unsure of how that matching works, you may want to play
with it in a test area. There are examples in the EXAMPLES
section of this help screen.
Usage: mmsnaprest [-D|--debug] [-u|--usage] [-v|--verbose] [-h|--help]
[--dry-run] [-ls SOURGE] [-s SOURCE -t TARGET]
40
May 13 – 17, 2024
Storage Scale Cluster Export Services
41
May 13 – 17, 2024
CES – Cluster Export Services
High availability
42
May 13 – 17, 2024
CES – Cluster Export Services
Monitoring
Protocol support
43
May 13 – 17, 2024
Common CES Commands
44
May 13 – 17, 2024
Storage Scale Policy Engine
45
May 13 – 17, 2024
Policy Engine
46
May 13 – 17, 2024
Example Policy Run #1
# cat rules.txt
RULE 'listall' list 'all-files'
SHOW( varchar(kb_allocated) || ' ' || varchar(file_size) || ' ' || varchar(user_id) || ' ' || fileset_name )
WHERE PATH_NAME LIKE '/fs0/projects/%'
47
May 13 – 17, 2024
Example Policy Run #1
Sample output from a policy run:
# mmapplypolicy fs0 -f /fs0/tmp/ -P rules.txt -I defer
[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name KB_Occupied KB_Total Percent_Occupied
archive 131072 41934848 0.312561047%
data 192512 41934848 0.459074038%
system 0 0 0.000000000% (no user data)
[I] 4422 of 502784 inodes used: 0.879503%.
[W] Attention: In RULE 'listall' LIST name 'all-files' appears but there is no corresponding "EXTERNAL LIST 'all-files' EXEC ... OPTS ..." rule to specify a program to process the matching files.
[I] Loaded policy rules from rules.txt.
Evaluating policy rules with CURRENT_TIMESTAMP = 2017-07-25@15:34:38 UTC
Parsed 1 policy rules.
RULE 'listall' list 'all-files'
SHOW( varchar(kb_allocated) || ' ' || varchar(file_size) || ' ' || varchar(user_id) || ' ' || fileset_name )
WHERE PATH_NAME LIKE '/fs0/projects/%'
[I] 2017-07-25@15:34:39.041 Directory entries scanned: 385.
[I] Directories scan: 362 files, 23 directories, 0 other objects, 0 'skipped' files and/or errors.
[I] 2017-07-25@15:34:39.043 Sorting 385 file list records.
[I] Inodes scan: 362 files, 23 directories, 0 other objects, 0 'skipped' files and/or errors.
48
May 13 – 17, 2024
Example Policy Run #1
Sample output from a policy run (continued):
[I] 2017-07-25@15:34:40.954 Policy evaluation. 385 files scanned.
[I] 2017-07-25@15:34:40.956 Sorting 360 candidate file list records.
[I] 2017-07-25@15:34:41.024 Choosing candidate files. 360 records scanned.
[I] Summary of Rule Applicability and File Choices:
Rule# Hit_Cnt KB_Hit Chosen KB_Chosen KB_Ill Rule
0 360 61184 360 61184 0 RULE 'listall' LIST 'all-files' SHOW(.) WHERE(.)
[I] Filesystem objects with no applicable rules: 25.
[I] GPFS Policy Decisions and File Choice Totals:
Chose to list 61184KB: 360 of 360 candidates;
Predicted Data Pool Utilization in KB and %:
Pool_Name KB_Occupied KB_Total Percent_Occupied
archive 131072 41934848 0.312561047%
data 192512 41934848 0.459074038%
system 0 0 0.000000000% (no user data)
[I] 2017-07-25@15:34:41.027 Policy execution. 0 files dispatched.
[I] A total of 0 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;
0 'skipped' files and/or errors.
#
49
May 13 – 17, 2024
Example Policy Run #1
Sample output from a policy run:
]# wc -l /fs0/tmp/list.all-files
360 /fs0/tmp/list.all-files
# head -n 10 /fs0/tmp/list.all-files
402432 374745509 0 3584 1741146 0 projects -- /fs0/projects/dar-2.4.1.tar.gz
402434 229033036 0 0 1217 1000 projects -- /fs0/projects/dar-2.4.1/README
402435 825781038 0 256 43668 1000 projects -- /fs0/projects/dar-2.4.1/config.guess
402436 1733958940 0 256 18343 1000 projects -- /fs0/projects/dar-2.4.1/config.rpath
402437 37654404 0 0 371 1000 projects -- /fs0/projects/dar-2.4.1/INSTALL
402438 1471382967 0 0 435 1000 projects -- /fs0/projects/dar-2.4.1/TODO
402440 398210967 0 0 376 1000 projects -- /fs0/projects/dar-2.4.1/misc/batch_cygwin
402441 292549403 0 0 738 1000 projects -- /fs0/projects/dar-2.4.1/misc/README
402442 1788675584 0 256 3996 1000 projects -- /fs0/projects/dar-2.4.1/misc/dar_ea.rpm.proto
402443 637382920 0 256 4025 1000 projects -- /fs0/projects/dar-2.4.1/misc/dar64_ea.rpm.proto
#
50
May 13 – 17, 2024
Example Policy Run #2
RULE 'purge_30days' DELETE
FOR FILESET ('scratch')
WHERE CURRENT_TIMESTAMP - MODIFICATION_TIME > INTERVAL '30' DAYS and
CURRENT_TIMESTAMP - CREATION_TIME > INTERVAL '30' DAYS and
CURRENT_TIMESTAMP - ACCESS_TIME > INTERVAL '30' DAYS and
PATH_NAME LIKE '/gpfs/iccp/scratch/%'
51
May 13 – 17, 2024
Example Policy Run #2
Sample output from a policy run:
[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name KB_Occupied KB_Total Percent_Occupied
data 1006608482304 2621272227840 38.401523948%
system 0 0 0.000000000% (no user data)
[I] 378536926 of 689864704 inodes used: 54.871183%.
[I] Loaded policy rules from scratch.purge.policy.
Evaluating policy rules with CURRENT_TIMESTAMP = 2019-04-12@16:00:02 UTC
Parsed 1 policy rules.
RULE 'purge_30days' DELETE
FOR FILESET ('scratch')
WHERE CURRENT_TIMESTAMP - MODIFICATION_TIME > INTERVAL '30' DAYS and
CURRENT_TIMESTAMP - CREATION_TIME > INTERVAL '30' DAYS and
CURRENT_TIMESTAMP - ACCESS_TIME > INTERVAL '30' DAYS and
PATH_NAME LIKE '/gpfs/iccp/scratch/%'
[I] 2019-04-12@16:00:04.045 Directory entries scanned: 0.
[I] 2019-04-12@16:00:19.026 Directory entries scanned: 1376623.
[I] 2019-04-12@16:00:34.027 Directory entries scanned: 1376623.
[I] 2019-04-12@16:00:37.104 Directory entries scanned: 8576323.
[I] Directories scan: 4132091 files, 3713818 directories, 730414 other objects, 0 'skipped' files and/or errors.
52
May 13 – 17, 2024
Example Policy Run #2
Sample output from a policy run (continued):
[I] 2019-04-12@16:00:37.145 Parallel-piped sort and policy evaluation. 0 files scanned.
[I] 2019-04-12@16:00:42.975 Parallel-piped sort and policy evaluation. 8576323 files scanned.
[I] 2019-04-12@16:00:43.523 Piped sorting and candidate file choosing. 0 records scanned.
[I] 2019-04-12@16:00:43.647 Piped sorting and candidate file choosing. 90047 records scanned.
[I] Summary of Rule Applicability and File Choices:
Rule# Hit_Cnt KB_Hit Chosen KB_Chosen KB_Ill Rule
0 90047 1078304928 90047 1078304928 0 RULE 'purge_30days' DELETE FOR FILESET(.) WHERE(.)
[I] Filesystem objects with no applicable rules: 8486148.
[I] GPFS Policy Decisions and File Choice Totals:
Chose to delete 1078304928KB: 90047 of 90047 candidates;
Predicted Data Pool Utilization in KB and %:
Pool_Name KB_Occupied KB_Total Percent_Occupied
data 1005533405024 2621272227840 38.360510379%
system 0 0 0.000000000% (no user data)
[I] 2019-04-12@16:00:43.732 Policy execution. 0 files dispatched.
[I] 2019-04-12@16:00:49.027 Policy execution. 65886 files dispatched.
[I] 2019-04-12@16:00:51.069 Policy execution. 90047 files dispatched.
[I] A total of 90047 files have been migrated, deleted or processed by an EXTERNAL EXEC/script;
0 'skipped' files and/or errors.
53
May 13 – 17, 2024
Storage Scale Storage Pools & File Placement
54
May 13 – 17, 2024
Storage Pools
55
May 13 – 17, 2024
Example Storage Pool Configuration
Flash Tier
Capacity Tier
56
May 13 – 17, 2024
Information Lifecycle Management
57
May 13 – 17, 2024
58
May 13 – 17, 2024
Information Lifecycle Management
Storage Tiering
59
May 13 – 17, 2024
Using Tiered Storage
60
May 13 – 17, 2024
Tiered Storage: Example Config
61
May 13 – 17, 2024
%nsd:
nsd=nvme01
usage=metadataOnly
pool=system
%nsd:
nsd=nvme02
usage=metadataOnly
pool=system
Flash Metadata Pool
%nsd:
nsd=sata01
usage=dataOnly
pool=data
%nsd:
nsd=sata02
usage=dataOnly
pool=data
Bulk Data Pool
%nsd:
nsd=sas01
usage=dataOnly
pool=hippa
%nsd:
nsd=sas02
usage=dataOnly
pool=hippa
Encrypted Data Pool
File Placement Policies
# cat policy
rule 'default' set pool 'data'
62
May 13 – 17, 2024
Installing File Placement Policies
# Usage: mmchpolicy Device PolicyFilename
[-t DescriptiveName] [-I {yes|test}]
63
May 13 – 17, 2024
Test the policy before installing it is good practice!
# mmchpolicy fs0 policy -I test
Validated policy 'policy': Parsed 1 policy rules.
No errors on the policy, so lets install it:
# mmchpolicy fs0 policy
Validated policy 'policy': Parsed 1 policy rules.
Policy `policy' installed and broadcast to all nodes.
Viewing Installed Policies
# Usage: mmlspolicy Device
List the file placement policies:
# mmlspolicy fs0
Policy for file system '/dev/fs0':
Installed by root@ss-demo1.os.ncsa.edu on Fri Apr 12 09:26:10 2019.
First line of policy 'policy' is:
rule 'default' set pool 'data'
64
May 13 – 17, 2024
Verify prior policy installed successfully:
Storage Scale Monitoring
65
May 13 – 17, 2024
Monitoring with mmpmon
66
May 13 – 17, 2024
Monitoring with mmpmon
67
May 13 – 17, 2024
Sample output from mmpmon (human readable)
Monitoring with mmpmon
68
May 13 – 17, 2024
Other Storage Scale Monitoring
69
May 13 – 17, 2024
Resources
70
May 13 – 17, 2024
What is Ceph?
71
May 13 – 17, 2024
What is Ceph?
72
May 13 – 17, 2024
Scaling Up Ceph
73
May 13 – 17, 2024
Underlying concepts
74
May 13 – 17, 2024
Underlying concepts
75
May 13 – 17, 2024
Conceptual Diagram
76
May 13 – 17, 2024
Conceptual Diagram
77
May 13 – 17, 2024
Terminology
78
May 13 – 17, 2024
How does this affect you as an Engineer (Storage Admin)?
79
May 13 – 17, 2024
Locality Customization
80
May 13 – 17, 2024
Failure Domains
81
May 13 – 17, 2024
Redundancy
82
May 13 – 17, 2024
Redundancy
83
May 13 – 17, 2024
Scalability features
Ceph has some useful features that greatly aid in its ability to scale well across hundreds-thousands of storage servers
84
May 13 – 17, 2024
Scalability features
85
May 13 – 17, 2024
Performance features
Ceph has some specific performance related features that have been developed over the years that are worth noting
86
May 13 – 17, 2024
Performance features
87
May 13 – 17, 2024
Ceph Monitoring
Before we dive into debugging; it’s helpful to touch on monitoring Ceph
88
May 13 – 17, 2024
Ceph Monitoring
89
May 13 – 17, 2024
Ceph Monitoring
90
May 13 – 17, 2024
Ceph Tools
Some commands to have on hand when debugging a cluster or getting the “lay out the land” on a cluster you inherit
# ceph -w
# ceph osd df
# ceph daemon osd.XX config show
91
May 13 – 17, 2024
Ceph Tools
# ceph osd tree
92
May 13 – 17, 2024
Crushmaps
# ceph osd getcrushmap -o /root/crushmap_raw
# crushtool -d /root/crushmap_raw -o /root/crushmap.txt
93
May 13 – 17, 2024
Crushmaps
# crushtool -c /root/crushmap.txt -o /root/crushmap_new
# ceph osd setcrushmap -i /root/cruchmap_new
94
May 13 – 17, 2024
Ceph PGs (Placement Groups)
95
May 13 – 17, 2024
Ceph PGs (Placement Groups)
Some useful pg-related commands:
# ceph pg ls
# ceph pg 1.0 query
# ceph pg dump_stuck [stale, inactive, unclean]
# ceph pg repair 1.0
96
May 13 – 17, 2024
Ceph PGs (Placement Groups)
97
May 13 – 17, 2024
Ceph Troubleshooting
98
May 13 – 17, 2024
Resources to Know About
99
May 13 – 17, 2024
Acknowledgements
100
May 13 – 17, 2024
Questions
101
May 13 – 17, 2024