1 of 53

Linux Clusters Institute:�Basics of Data Management for HPC

J.D. Maloney | Lead HPC Storage Engineer

Storage Enabling Technologies Group (SET)

National Center for Supercomputing Applications (NCSA)

malone12@illinois.edu

Mississippi State, August 21^st – 25^th 2023

2 of 53

Topic Coverage

Going over “less technical” basics of running storage in an HPC environment
Things like:

Storage Classifications
Quotas
Permissions
Data Sharing
Monitoring

This is stuff you’ll need to know after hardware arrives, as you’re configuring it

Other that the “Storage Classifications” section; that’s handy to know pre-procurement of hardware

Aug 21^st - 25^th 2023

3 of 53

Common HPC Storage Classifications

Aug 21^st - 25^th 2023

4 of 53

Storage Classifications

Many HPC environments across the US/Internationally have somewhat common storage “areas” for users to use for different purposes

You don’t have to have all of them
You may have additional special ones for your users

Good to understand them so if users ask you can point them toward the relevant storage in your environment
Classifying data in into certain areas has influence on hardware choices and policies

Aug 21^st - 25^th 2023

5 of 53

Storage Classifications

Adhering to these (at least some of them) will help keep things familiar for your users

Especially if your users are also using national level resources or systems at other institutions

What you end up implementing will need to fit with the workflows that you support on the systems you run

Aug 21^st - 25^th 2023

6 of 53

Storage Classifications

Aug 21^st - 25^th 2023

Classification	ENV Variable	Description
home	$HOME	User home directories, where they land via SSH; generally used for scripts, personal software, job logs, etc.
software (or apps)	$SW or $APPS	Area where system provided software comes from/is stored
project (or work)	$PROJECT or $WORK	Area where teams store shared data, be it software, raw data, result data, etc.
scratch	$SCRATCH	Area where temporary job data and checkpoints are written; data stored here can be regenerated.

7 of 53

Storage Classifications

Aug 21^st - 25^th 2023

Classification	Performance Characteristics	Reliability Characteristics
home	Slower throughput performance, decent metadata performance	Very reliable
software (or apps)	Slower throughput performance, decent metadata performance	Very reliable
project (or work)	Decent throughput and metadata performance	Very reliable
scratch	High throughput and metadata performance	Less reliable

8 of 53

Storage Classifications

Having different classes of storage also helps administratively as it allows for different policies to be set up on the different areas
Different storage classifications don’t have to be actual separate file systems

Could be different filesets in Spectrum Scale or projects in Lustre
Could be on different NSDs or OSTs based on policy or PFL
Can be hosted by different storage servers within the file system (servers that host the specific NSDs or OSTs)
Can have different QOS policies if your file system allows

In the end this is to help workflow and encourage better data management by end users/teams

Aug 21^st - 25^th 2023

9 of 53

Common HPC Storage Policies

Aug 21^st - 25^th 2023

10 of 53

Quotas

Limit the amount of space or inodes that users or groups of users consume on the storage resource
Quotas can be set as:

a matter of fair-share policy
based on a granted allocation
to limit usage of the resource to the amount the end user/group is paying for

Setting up quota policies early on in a system’s life is much easier as you don’t have to merge existing use into a new quota scheme

Aug 21^st - 25^th 2023

11 of 53

Quotas

There are two main types of quota:

Explicit --> a certain amount of TB or inodes set at a “custom” level, either based on investment level or allocation granted
Default --> an amount of TB or inodes set globally for all users or all groups based on a system’s policy (or at least default policy)

Default quotas are handy for things like:

Home directory quotas
Baseline project or scratch capacity granted as part of access to the storage system

Explicit quotas are handy for things like:

Overwritten defaults based on allocation or investment

Aug 21^st - 25^th 2023

12 of 53

Quotas

Setting quotas on inodes can be very useful, especially in areas of:

AI workloads
Bio workloads
Some geospatial workloads
Almost any python driven workload ;)
Probably even more workloads, I haven’t seen them all

If your metadata is on a separate flash pool you want to make sure it doesn’t fill up too fast
Also keeps users from doing something silly and filling up your metadata space (this can cause problems)

Aug 21^st - 25^th 2023

13 of 53

Quotas

In the end quotas are there to:

Enforce fair use of the resource by users/teams
Act as a guide for good data management techniques

Even just enabling quota tracking will help you as an admin have a window into what’s happening on the system

Watching data growth/shrinkage per user can let you know who is pushing the FS
Forecast where/who demand for storage is coming from
Keeps you from having to “du” a bunch of stuff all the time to get reporting data

Aug 21^st - 25^th 2023

14 of 53

Purge

The (usually automated) removal of data from certain parts of the file system based on a pre-defined set of criteria
Most common use in HPC is the purge of /scratch

Usually based on file mtime or atime
Commonly data older than 30 days (though can definitely vary based on site needs/resources)
Usually run on a daily or weekly basis

Another tool to keep file system areas from filling up and/or getting too full that they slow down

Aug 21^st - 25^th 2023

15 of 53

Purge

Needs to be a well documented and communicated policy

You are permanently deleting user data, in a generally automated fashion without direct human oversight
Well documented/communicated policy won’t stop all “help my /scratch data is gone” tickets but it will reduce them and importantly give you justification for why the data was removed

Things to consider when setting up purge:

How often is it run
How old of data is removed
When is it run
Will you entertain exemptions to purge
What specific FS areas have purge enabled

Aug 21^st - 25^th 2023

16 of 53

Purge

Some handy tips we’ve found useful with regard to purge:

Keep a full list of files/directories that were purged during each run

Helps defend against/figure out did purge delete something or a user actually did accidentally

Track how much data/how many files were purged on each run

Just handy information to have for analysis

If at all possible, automate the process

Keeps it fresh in user’s heads that whatever area is subject to purge…actually gets purged

Aug 21^st - 25^th 2023

17 of 53

Data Management and Sharing

Aug 21^st - 25^th 2023

18 of 53

Common File/Dir Permissions

The Linux Foundation has a good tutorial overview on basic Linux permissions (not covering that here)

https://www.linuxfoundation.org/blog/blog/classic-sysadmin-understanding-linux-file-permissions

Setting up a good permissions structure on your file systems is very important
Use a central authentication system (LDAP, AD, etc.)

Don’t just do local accounts

Lay out how you want permissions to work and how that plays into sharing

Which we’ll discuss soon

Aug 21^st - 25^th 2023

19 of 53

Common File/Dir Permissions

Home Directories

Effectively you want 700 permissions here; user access only
Having “not the user” get access to certain files (.bashrc, .bash_profile, ~/.ssh, etc.) can have security implications
Thwarting this is users seem to have some incessant desire to share data out of their home directory

You should strongly advise against this behavior
What we do is set home directories the following way:

Perms 700
Ownership is root:root
User gets access via an ACL (we’ll talk about these next)
Now the user can’t modify their directory’s permissions since they aren’t owner

Aug 21^st - 25^th 2023

20 of 53

Common File/Dir Permissions

Project/Work Directories

A more open area than home directories, an area mean for collaboration
For user ownership usually either root or the PI of the project that the space is for, is the owner
For group ownership it is set to the LDAP/AD group of the group the project is for
Permissions here are 770 (so group has full access)

Generally you also set the setgid bit (chmod g+s) so that group ownership inherits all the way down

Some ACL best practices here too that we’ll shortly discuss

Aug 21^st - 25^th 2023

21 of 53

Common File/Dir Permissions

Scratch Directories

A lot of variation here, your site-specific policies for scratch can have major implications here
If quota is per-user then individual user directories similar to home could be setup

Though allowing a user to share out of /scratch is much more acceptable than /home

If quota is per-group then something similar to projects is the way to go
Just make sure that the stock setup doesn’t let users see each other’s data

Aug 21^st - 25^th 2023

22 of 53

ACLs

Stands for ”Access Control List”
Allows for some more fine-grained and over-ride permissions setting on files and directories
Can be a bit of a pain

Make sure if migrating data you are migrating ACLs (use the right rsync flags, etc.)
Make sure to check for ACLs when debugging permissions issues

However they do wield some power that can be handy in classic HPC situations, which we’ll go over here
Note: GPFS has built in commands for ACLs (the mm[get/put]acl commands), check those out if relevant to you

Aug 21^st - 25^th 2023

23 of 53

ACLs

Reading the ACLs of a file/directory with getfacl, here’s a standard directory with no ACLs

Aug 21^st - 25^th 2023

Matches what you would expect, that all lines up with the POSIX permissions on the directory

24 of 53

ACLs

Common scenario #1: Project Directories

Commonly a project’s PI outlasts many users in the project

Ex. Professor Smith has a project space for her lab group but she has grad students that will come/go as they progress through their graduate studies; when they leave they often don’t clean up all their data and/or re-permission it to a relevant person. The Smith group’s space gets close to quota and no one can clean up this orphaned data….halp! (you get a ticket ☺)

Solution 🡪 Set a default ACL on project spaces when you create them for the PI to have rwx on all files beneath that space, then they have the ability to sort through/clean up the data

Aug 21^st - 25^th 2023

25 of 53

ACLs

Common scenario #1: Project Directories

The setgid bit that discussed earlier helps as all data is owned by the right group but if the file/directory has 700 or 750 permissions the PI can’t clean them up
Here’s what a default ACL on that directory looks like with the ACL in place to give the PI full access to all data:

Aug 21^st - 25^th 2023

26 of 53

ACLs

Common scenario #1: Project Directories

Setting that ACL is done with the following command

Aug 21^st - 25^th 2023

If data already exists in the project directory, setting the ACL/default ACL only works for new data

You’ll need to run setting both the ACL and the default ACL with the -R flag so it applies to existing data (single threaded, slow on tons of files/directories)
You may want to break it out into threads if you have a lot of data to fix retroactively
Why it’s really nice to get setup correctly from the start

27 of 53

ACLs

Common scenario #2: Home Directories

Users like to muck with their home directory permissions (to open them up for sharing usually); we (admins) like to avoid security incidents on our systems
Let’s give a user access to their home directory only via an ACL so that they can’t mess with the permissions

Doesn’t have to be a default ACL in this case, we don’t want this to inherit below

Aug 21^st - 25^th 2023

28 of 53

ACLs

Common scenario #2: Home Directories

Here’s what it looks like when done:

Aug 21^st - 25^th 2023

Issue solved, do these steps when making home directories during the account provisioning process

29 of 53

Data Sharing on a System

Data is usually shared on systems via project or work directories

Users sharing data amongst team members makes sense

This is a big reason why having a “projects” or ”work” space is so important
Can use default ACLs as mentioned in prior section to ensure whole group has access to data in projects

Without this data in these areas can be inaccessible even to members of the same group

Sometimes though data needs shared with people outside a given project

Aug 21^st - 25^th 2023

30 of 53

Data Sharing on a System

Adding the “new” users to the existing ownership group is possible but not always feasible

Only need is to share data:

new users needing access shouldn’t be able to use that group’s compute allocation
have privilege to anything else that group has access to

Many times its only a small subset of the group’s data that needs wider access
Data access for these users is commonly read-only where as the original owning group is read/write

Aug 21^st - 25^th 2023

31 of 53

Data Sharing on a System

There are two common methods for dealing with this:

Create a new project directory for a new group only given storage access

This group has the new combination of users
Group exists solely to own this data and has no other permissions
Downside is data kept in multiple places if also staying in original project directory; or original group has to do more data management to shuffle data back and forth

Use ACLs to give the “new” users access to the data they need

More of a pain to apply after the fact but can make data management easier for users

Aug 21^st - 25^th 2023

32 of 53

Data Sharing outside a System

Sometimes data needs to be shared with collaborators outside of your organization for many reasons

Colleague left your institution for another but is still involved in a project
A project has members from a cross-institutional team that is collaborating as part of a grant or program
Data being produced is meant to be published/accessible to the broader science community for further consumption

Key things you’ll need to figure out:

Who precisely needs access
For how long
Read-Only or Read/Write
Restrictions of method of access

Aug 21^st - 25^th 2023

33 of 53

Data Sharing outside a System

These different scenarios can have a variety of solutions, to an extent they’ll depend on

Institution policies
Your system’s authentication capabilities
End user preferences
The exact situation at hand

However, we’re going to go over some common ways to support this type of data sharing
This is not an exhaustive list by any means, there are tons of tools to help with this need

Aug 21^st - 25^th 2023

34 of 53

Data Sharing outside a System

Method #1: Guest Account

Your institution may support guest accounts that are sponsored by a faculty/staff member
Getting the people who need access credentials at your local institution will allow you to get the access how you would for “Data Sharing on a System”
Especially easy for:

Short list of people (who are known)
Ex-Faculty/Staff who need access for collab with existing faculty/staff

Aug 21^st - 25^th 2023

35 of 53

Data Sharing outside a System

Method #2: Globus Shared Collection

Globus will allow you to allow users to create “Shared Collections” on your primary endpoint
Users can add access for accounts that need access to the data; with individual user level read-only or read/write
Can be pretty much self-service for users to manage
Great if:

All people involved have Globus accounts
Your site has a Globus license

Aug 21^st - 25^th 2023

36 of 53

Data Sharing outside a System

Method #3: Web Server

Especially handy for situation where data needs to be shared with the community in a read-only fashion
Easy for end users to use a huge variety of tools to pull down the data for use on their systems
Also fairly easy to setup via Nginx or Apache on bare metal or via Kubernetes
Need to watch out that:

Data is available read only
Uses https
Credentials (even shared) are setup if possible to discourage abuse from bots/malicious entities

Aug 21^st - 25^th 2023

37 of 53

Data Sharing outside a System

Method #4: Sync to a Cloud Solution

Research groups or your institution may have a preferred cloud storage vendor
Tools like Globus can sync data to Google Drive/Box/S3/etc. backends
Data can then be shared via tools that those platforms support
Requires the syncing of data which can be a pain

Though could be automated by you or the user potentially depending on exact situation

Aug 21^st - 25^th 2023

38 of 53

Storage Monitoring

Aug 21^st - 25^th 2023

39 of 53

Storage Monitoring

Understanding the behavior/characteristics that your file systems are displaying is incredibly helpful for:

Day-to-day troubleshooting
Understanding what users are doing on the file system
Alert you to potential issues before or as they happen
Helps plan for future procurement as needs are more well known
Fulfill reporting obligations you may have
Allow you to gain insight for potential sharing with the community via talks/posters/papers/etc.

When evaluating storage solutions make sure monitoring capabilities are part of the criteria used to select

Aug 21^st - 25^th 2023

40 of 53

Storage Monitoring

There are a lot of tools out there you can use to monitor storage infrastructure
Biggest suggestion is to choose a tool set for all of your infrastructure and use that to collection storage metrics

Will help when correlating storage metrics with metrics from other sources (network, schedulers, compute, etc.)

Some examples of tools to gather the metrics are:

Telegraf
Prometheus
Nagios
Many more

Aug 21^st - 25^th 2023

41 of 53

Storage Monitoring

There are 3 main “areas” to monitor when monitoring storage:

Performance

How fast/slow are all components running
Are there hot spots in the infrastructure/are things balanced
What trends do you see/what do they tell you about workloads running

Quota

Measures of consumption (bytes/inodes)
Again very handy for trend analysis/reporting in many situations

Health

Is all infrastructure healthy
Are all servers healthy and running as expected
Are errors getting logged

Aug 21^st - 25^th 2023

42 of 53

Performance Monitoring

Things worth gathering (not a fully encompassing list) gather all you can in your environment:

Read/Write bytes per server/LUN/disk
Read/Write operations per server/LUN/disk
Metadata operations per server/LUN
All of the above on a per-client basis
If desired and possible, gather I/O perf stats on a per-job bassis (eg. Lustre job stats and equivalents for other solutions)

Frequency ideally is at least once per minute

Balance quick sampling with performance impact on system
Make sure to account for sampling frequency when reading counters

Aug 21^st - 25^th 2023

43 of 53

Performance Monitoring

Tools like Telegraf and Prometheus have community driven options for gathering data from Lustre, Ceph, Spectrum Scale, etc.

Generally for various file systems poke around with the following to get insight into perf stats:

Lustre – Counters up in /proc/fs/lustre/obdfilter
Spectrum Scale – Use mmpmon or mmperfmon
Ceph – has built in metrics exporter

Aug 21^st - 25^th 2023

44 of 53

Performance Monitoring

Some real life examples to look at (some plots we have at NCSA for some of our systems):

Lustre – here

Spectrum Scale – here

Ceph – here

When plotting things out always make sure your units are correct for what you are gathering
Heat maps and line charts are great for these types of metrics

Aug 21^st - 25^th 2023

45 of 53

Quota Monitoring

Things worth gathering:

Bytes and inode usage per user
Bytes and inode usage per group
Bytes and inode usage per project
Bytes and inodes per user/group within a project (if possible)
Overall FS usage for bytes/inodes

Really important to analyze growth trends for capacity planning and debugging issues
Get this information out to users too so they can manage their usage

Aug 21^st - 25^th 2023

46 of 53

Quota Monitoring

Gathering and storing on as frequent of an interval as possible so users can get quicker feedback

Our default interval is every 15 minutes

Presenting the data via dashboards is great for trend analysis and or viewing tables of information for all users and doing sorts
Presenting via a CLI command/script is handy for instant/in the moment information

Can even have this print out every time the user logs into the system

Aug 21^st - 25^th 2023

47 of 53

Quota Monitoring

Some real life examples from our environments at NCSA

Lustre -- here

Spectrum Scale -- here

Ceph – here

One of the most “user-facing” aspects of storage monitoring
Remind users of the sampling delay of collection, whatever it ends up being for you

Aug 21^st - 25^th 2023

48 of 53

Health Monitoring

Very key to noticing issues on your file systems before/at the same time that your users notice

Users can/will find issues with your FS very quickly since so much rides on it

What you monitor for this is going to be very dependent on the file system you are running and even what vendor supplies that system
Don’t forget about some of the peripheral “health” aspects

Power metrics (is it stable/redundant/etc.)
Temperature metrics (is all staying cool)
Fans/Power Supplies/Low-Level hardware

Aug 21^st - 25^th 2023

49 of 53

Health Monitoring

A helpful way to stay on top of issues is to monitor file system “Quality of Service” (QOS)

How long does an “ls” take of certain areas
How long does a “stat” take of a file
Do quota commands respond
Track this stuff over time to learn best what is “normal” and what is an indication of a problem

Check the hardware health across all layers of the stack

Drive health
Cable link health
Server health
Network health

Aug 21^st - 25^th 2023

50 of 53

Health Monitoring

Check the FS health across all layers of its stack, will vary from FS to FS

Look at service health for all FS related services
Look at output of FS health commands
Parse through logs to look for errors being logged by the file system

Some examples from NCSA’s stack

Luster – here
Spectrum Scale – here
Ceph -- here

Aug 21^st - 25^th 2023

51 of 53

Storage Monitoring

Storage is often one of the more fragile pieces of HPC infrastructure so good monitoring of it is key
Use the tools best suited to your environment, what you’re familiar with, and what you can keep up and running
Monitoring in general (not just for storage) is one of my passions, feel free to checkout our Telegraf work at:

Aug 21^st - 25^th 2023

52 of 53

Acknowledgements

Members of the SET group at NCSA for slide content & review
Members of the steering committee for slide review

Aug 21^st - 25^th 2023

1 of 53

2 of 53

3 of 53

4 of 53

5 of 53

6 of 53

7 of 53

8 of 53

9 of 53

10 of 53

11 of 53

12 of 53

13 of 53

14 of 53

15 of 53

16 of 53

17 of 53

18 of 53

19 of 53

20 of 53

21 of 53

22 of 53

23 of 53

24 of 53

25 of 53

26 of 53

27 of 53

28 of 53

29 of 53

30 of 53

31 of 53

32 of 53

33 of 53

34 of 53

35 of 53

36 of 53

37 of 53

38 of 53

39 of 53

40 of 53

41 of 53

42 of 53

43 of 53

44 of 53

45 of 53

46 of 53

47 of 53

48 of 53

49 of 53

50 of 53

51 of 53

52 of 53

53 of 53