1 of 53

Linux Clusters Institute:�Basics of Data Management for HPC

J.D. Maloney | Lead HPC Storage Engineer

Storage Enabling Technologies Group (SET)

National Center for Supercomputing Applications (NCSA)

malone12@illinois.edu

Mississippi State, August 21st – 25th 2023

2 of 53

Topic Coverage

  • Going over “less technical” basics of running storage in an HPC environment
  • Things like:
    • Storage Classifications
    • Quotas
    • Permissions
    • Data Sharing
    • Monitoring
  • This is stuff you’ll need to know after hardware arrives, as you’re configuring it
    • Other that the “Storage Classifications” section; that’s handy to know pre-procurement of hardware

2

Aug 21st - 25th 2023

3 of 53

Common HPC Storage Classifications

3

Aug 21st - 25th 2023

4 of 53

Storage Classifications

  • Many HPC environments across the US/Internationally have somewhat common storage “areas” for users to use for different purposes
    • You don’t have to have all of them
    • You may have additional special ones for your users
  • Good to understand them so if users ask you can point them toward the relevant storage in your environment
  • Classifying data in into certain areas has influence on hardware choices and policies

4

Aug 21st - 25th 2023

5 of 53

Storage Classifications

  • Adhering to these (at least some of them) will help keep things familiar for your users
    • Especially if your users are also using national level resources or systems at other institutions
  • What you end up implementing will need to fit with the workflows that you support on the systems you run

5

Aug 21st - 25th 2023

6 of 53

Storage Classifications

6

Aug 21st - 25th 2023

Classification

ENV Variable

Description

home

$HOME

User home directories, where they land via SSH; generally used for scripts, personal software, job logs, etc.

software (or apps)

$SW or $APPS

Area where system provided software comes from/is stored

project (or work)

$PROJECT or $WORK

Area where teams store shared data, be it software, raw data, result data, etc.

scratch

$SCRATCH

Area where temporary job data and checkpoints are written; data stored here can be regenerated.

7 of 53

Storage Classifications

7

Aug 21st - 25th 2023

Classification

Performance Characteristics

Reliability Characteristics

home

Slower throughput performance, decent metadata performance

Very reliable

software (or apps)

Slower throughput performance, decent metadata performance

Very reliable

project (or work)

Decent throughput and metadata performance

Very reliable

scratch

High throughput and metadata performance

Less reliable

8 of 53

Storage Classifications

  • Having different classes of storage also helps administratively as it allows for different policies to be set up on the different areas
  • Different storage classifications don’t have to be actual separate file systems
    • Could be different filesets in Spectrum Scale or projects in Lustre
    • Could be on different NSDs or OSTs based on policy or PFL
    • Can be hosted by different storage servers within the file system (servers that host the specific NSDs or OSTs)
    • Can have different QOS policies if your file system allows
  • In the end this is to help workflow and encourage better data management by end users/teams

8

Aug 21st - 25th 2023

9 of 53

Common HPC Storage Policies

9

Aug 21st - 25th 2023

10 of 53

Quotas

  • Limit the amount of space or inodes that users or groups of users consume on the storage resource
  • Quotas can be set as:
    • a matter of fair-share policy
    • based on a granted allocation
    • to limit usage of the resource to the amount the end user/group is paying for
  • Setting up quota policies early on in a system’s life is much easier as you don’t have to merge existing use into a new quota scheme

10

Aug 21st - 25th 2023

11 of 53

Quotas

  • There are two main types of quota:
    • Explicit --> a certain amount of TB or inodes set at a “custom” level, either based on investment level or allocation granted
    • Default --> an amount of TB or inodes set globally for all users or all groups based on a system’s policy (or at least default policy)
  • Default quotas are handy for things like:
    • Home directory quotas
    • Baseline project or scratch capacity granted as part of access to the storage system
  • Explicit quotas are handy for things like:
    • Overwritten defaults based on allocation or investment

11

Aug 21st - 25th 2023

12 of 53

Quotas

  • Setting quotas on inodes can be very useful, especially in areas of:
    • AI workloads
    • Bio workloads
    • Some geospatial workloads
    • Almost any python driven workload ;)
    • Probably even more workloads, I haven’t seen them all
  • If your metadata is on a separate flash pool you want to make sure it doesn’t fill up too fast
  • Also keeps users from doing something silly and filling up your metadata space (this can cause problems)

12

Aug 21st - 25th 2023

13 of 53

Quotas

  • In the end quotas are there to:
    • Enforce fair use of the resource by users/teams
    • Act as a guide for good data management techniques
  • Even just enabling quota tracking will help you as an admin have a window into what’s happening on the system
    • Watching data growth/shrinkage per user can let you know who is pushing the FS
    • Forecast where/who demand for storage is coming from
    • Keeps you from having to “du” a bunch of stuff all the time to get reporting data

13

Aug 21st - 25th 2023

14 of 53

Purge

  • The (usually automated) removal of data from certain parts of the file system based on a pre-defined set of criteria
  • Most common use in HPC is the purge of /scratch
    • Usually based on file mtime or atime
    • Commonly data older than 30 days (though can definitely vary based on site needs/resources)
    • Usually run on a daily or weekly basis
  • Another tool to keep file system areas from filling up and/or getting too full that they slow down

14

Aug 21st - 25th 2023

15 of 53

Purge

  • Needs to be a well documented and communicated policy
    • You are permanently deleting user data, in a generally automated fashion without direct human oversight
    • Well documented/communicated policy won’t stop all “help my /scratch data is gone” tickets but it will reduce them and importantly give you justification for why the data was removed
  • Things to consider when setting up purge:
    • How often is it run
    • How old of data is removed
    • When is it run
    • Will you entertain exemptions to purge
    • What specific FS areas have purge enabled

15

Aug 21st - 25th 2023

16 of 53

Purge

Some handy tips we’ve found useful with regard to purge:

  • Keep a full list of files/directories that were purged during each run
    • Helps defend against/figure out did purge delete something or a user actually did accidentally
  • Track how much data/how many files were purged on each run
    • Just handy information to have for analysis
  • If at all possible, automate the process
    • Keeps it fresh in user’s heads that whatever area is subject to purge…actually gets purged

16

Aug 21st - 25th 2023

17 of 53

Data Management and Sharing

17

Aug 21st - 25th 2023

18 of 53

Common File/Dir Permissions

The Linux Foundation has a good tutorial overview on basic Linux permissions (not covering that here)

  • Setting up a good permissions structure on your file systems is very important
  • Use a central authentication system (LDAP, AD, etc.)
    • Don’t just do local accounts
  • Lay out how you want permissions to work and how that plays into sharing
    • Which we’ll discuss soon

18

Aug 21st - 25th 2023

19 of 53

Common File/Dir Permissions

Home Directories

  • Effectively you want 700 permissions here; user access only
  • Having “not the user” get access to certain files (.bashrc, .bash_profile, ~/.ssh, etc.) can have security implications
  • Thwarting this is users seem to have some incessant desire to share data out of their home directory
    • You should strongly advise against this behavior
    • What we do is set home directories the following way:
      • Perms 700
      • Ownership is root:root
      • User gets access via an ACL (we’ll talk about these next)
      • Now the user can’t modify their directory’s permissions since they aren’t owner

19

Aug 21st - 25th 2023

20 of 53

Common File/Dir Permissions

Project/Work Directories

  • A more open area than home directories, an area mean for collaboration
  • For user ownership usually either root or the PI of the project that the space is for, is the owner
  • For group ownership it is set to the LDAP/AD group of the group the project is for
  • Permissions here are 770 (so group has full access)
    • Generally you also set the setgid bit (chmod g+s) so that group ownership inherits all the way down
  • Some ACL best practices here too that we’ll shortly discuss

20

Aug 21st - 25th 2023

21 of 53

Common File/Dir Permissions

Scratch Directories

  • A lot of variation here, your site-specific policies for scratch can have major implications here
  • If quota is per-user then individual user directories similar to home could be setup
    • Though allowing a user to share out of /scratch is much more acceptable than /home
  • If quota is per-group then something similar to projects is the way to go
  • Just make sure that the stock setup doesn’t let users see each other’s data

21

Aug 21st - 25th 2023

22 of 53

ACLs

  • Stands for ”Access Control List”
  • Allows for some more fine-grained and over-ride permissions setting on files and directories
  • Can be a bit of a pain
    • Make sure if migrating data you are migrating ACLs (use the right rsync flags, etc.)
    • Make sure to check for ACLs when debugging permissions issues
  • However they do wield some power that can be handy in classic HPC situations, which we’ll go over here
  • Note: GPFS has built in commands for ACLs (the mm[get/put]acl commands), check those out if relevant to you

22

Aug 21st - 25th 2023

23 of 53

ACLs

  • Reading the ACLs of a file/directory with getfacl, here’s a standard directory with no ACLs

23

Aug 21st - 25th 2023

  • Matches what you would expect, that all lines up with the POSIX permissions on the directory

24 of 53

ACLs

Common scenario #1: Project Directories

  • Commonly a project’s PI outlasts many users in the project
    • Ex. Professor Smith has a project space for her lab group but she has grad students that will come/go as they progress through their graduate studies; when they leave they often don’t clean up all their data and/or re-permission it to a relevant person. The Smith group’s space gets close to quota and no one can clean up this orphaned data….halp! (you get a ticket ☺)
  • Solution 🡪 Set a default ACL on project spaces when you create them for the PI to have rwx on all files beneath that space, then they have the ability to sort through/clean up the data

24

Aug 21st - 25th 2023

25 of 53

ACLs

Common scenario #1: Project Directories

  • The setgid bit that discussed earlier helps as all data is owned by the right group but if the file/directory has 700 or 750 permissions the PI can’t clean them up
  • Here’s what a default ACL on that directory looks like with the ACL in place to give the PI full access to all data:

25

Aug 21st - 25th 2023

26 of 53

ACLs

Common scenario #1: Project Directories

  • Setting that ACL is done with the following command

26

Aug 21st - 25th 2023

  • If data already exists in the project directory, setting the ACL/default ACL only works for new data
    • You’ll need to run setting both the ACL and the default ACL with the -R flag so it applies to existing data (single threaded, slow on tons of files/directories)
    • You may want to break it out into threads if you have a lot of data to fix retroactively
    • Why it’s really nice to get setup correctly from the start

27 of 53

ACLs

Common scenario #2: Home Directories

  • Users like to muck with their home directory permissions (to open them up for sharing usually); we (admins) like to avoid security incidents on our systems
  • Let’s give a user access to their home directory only via an ACL so that they can’t mess with the permissions
    • Doesn’t have to be a default ACL in this case, we don’t want this to inherit below

27

Aug 21st - 25th 2023

28 of 53

ACLs

Common scenario #2: Home Directories

  • Here’s what it looks like when done:

28

Aug 21st - 25th 2023

  • Issue solved, do these steps when making home directories during the account provisioning process

29 of 53

Data Sharing on a System

  • Data is usually shared on systems via project or work directories
    • Users sharing data amongst team members makes sense
  • This is a big reason why having a “projects” or ”work” space is so important
  • Can use default ACLs as mentioned in prior section to ensure whole group has access to data in projects
    • Without this data in these areas can be inaccessible even to members of the same group
  • Sometimes though data needs shared with people outside a given project

29

Aug 21st - 25th 2023

30 of 53

Data Sharing on a System

  • Adding the “new” users to the existing ownership group is possible but not always feasible
    • Only need is to share data:
      • new users needing access shouldn’t be able to use that group’s compute allocation
      • have privilege to anything else that group has access to
    • Many times its only a small subset of the group’s data that needs wider access
    • Data access for these users is commonly read-only where as the original owning group is read/write

30

Aug 21st - 25th 2023

31 of 53

Data Sharing on a System

There are two common methods for dealing with this:

  • Create a new project directory for a new group only given storage access
    • This group has the new combination of users
    • Group exists solely to own this data and has no other permissions
    • Downside is data kept in multiple places if also staying in original project directory; or original group has to do more data management to shuffle data back and forth
  • Use ACLs to give the “new” users access to the data they need
    • More of a pain to apply after the fact but can make data management easier for users

31

Aug 21st - 25th 2023

32 of 53

Data Sharing outside a System

  • Sometimes data needs to be shared with collaborators outside of your organization for many reasons
    • Colleague left your institution for another but is still involved in a project
    • A project has members from a cross-institutional team that is collaborating as part of a grant or program
    • Data being produced is meant to be published/accessible to the broader science community for further consumption
  • Key things you’ll need to figure out:
    • Who precisely needs access
    • For how long
    • Read-Only or Read/Write
    • Restrictions of method of access

32

Aug 21st - 25th 2023

33 of 53

Data Sharing outside a System

  • These different scenarios can have a variety of solutions, to an extent they’ll depend on
    • Institution policies
    • Your system’s authentication capabilities
    • End user preferences
    • The exact situation at hand
  • However, we’re going to go over some common ways to support this type of data sharing
  • This is not an exhaustive list by any means, there are tons of tools to help with this need

33

Aug 21st - 25th 2023

34 of 53

Data Sharing outside a System

Method #1: Guest Account

  • Your institution may support guest accounts that are sponsored by a faculty/staff member
  • Getting the people who need access credentials at your local institution will allow you to get the access how you would for “Data Sharing on a System”
  • Especially easy for:
    • Short list of people (who are known)
    • Ex-Faculty/Staff who need access for collab with existing faculty/staff

34

Aug 21st - 25th 2023

35 of 53

Data Sharing outside a System

Method #2: Globus Shared Collection

  • Globus will allow you to allow users to create “Shared Collections” on your primary endpoint
  • Users can add access for accounts that need access to the data; with individual user level read-only or read/write
  • Can be pretty much self-service for users to manage
  • Great if:
    • All people involved have Globus accounts
    • Your site has a Globus license

35

Aug 21st - 25th 2023

36 of 53

Data Sharing outside a System

Method #3: Web Server

  • Especially handy for situation where data needs to be shared with the community in a read-only fashion
  • Easy for end users to use a huge variety of tools to pull down the data for use on their systems
  • Also fairly easy to setup via Nginx or Apache on bare metal or via Kubernetes
  • Need to watch out that:
    • Data is available read only
    • Uses https
    • Credentials (even shared) are setup if possible to discourage abuse from bots/malicious entities

36

Aug 21st - 25th 2023

37 of 53

Data Sharing outside a System

Method #4: Sync to a Cloud Solution

  • Research groups or your institution may have a preferred cloud storage vendor
  • Tools like Globus can sync data to Google Drive/Box/S3/etc. backends
  • Data can then be shared via tools that those platforms support
  • Requires the syncing of data which can be a pain
    • Though could be automated by you or the user potentially depending on exact situation

37

Aug 21st - 25th 2023

38 of 53

Storage Monitoring

38

Aug 21st - 25th 2023

39 of 53

Storage Monitoring

  • Understanding the behavior/characteristics that your file systems are displaying is incredibly helpful for:
    • Day-to-day troubleshooting
    • Understanding what users are doing on the file system
    • Alert you to potential issues before or as they happen
    • Helps plan for future procurement as needs are more well known
    • Fulfill reporting obligations you may have
    • Allow you to gain insight for potential sharing with the community via talks/posters/papers/etc.
  • When evaluating storage solutions make sure monitoring capabilities are part of the criteria used to select

39

Aug 21st - 25th 2023

40 of 53

Storage Monitoring

  • There are a lot of tools out there you can use to monitor storage infrastructure
  • Biggest suggestion is to choose a tool set for all of your infrastructure and use that to collection storage metrics
    • Will help when correlating storage metrics with metrics from other sources (network, schedulers, compute, etc.)
  • Some examples of tools to gather the metrics are:
    • Telegraf
    • Prometheus
    • Nagios
    • Many more

40

Aug 21st - 25th 2023

41 of 53

Storage Monitoring

There are 3 main “areas” to monitor when monitoring storage:

  • Performance
    • How fast/slow are all components running
    • Are there hot spots in the infrastructure/are things balanced
    • What trends do you see/what do they tell you about workloads running
  • Quota
    • Measures of consumption (bytes/inodes)
    • Again very handy for trend analysis/reporting in many situations
  • Health
    • Is all infrastructure healthy
    • Are all servers healthy and running as expected
    • Are errors getting logged

41

Aug 21st - 25th 2023

42 of 53

Performance Monitoring

Things worth gathering (not a fully encompassing list) gather all you can in your environment:

    • Read/Write bytes per server/LUN/disk
    • Read/Write operations per server/LUN/disk
    • Metadata operations per server/LUN
    • All of the above on a per-client basis
    • If desired and possible, gather I/O perf stats on a per-job bassis (eg. Lustre job stats and equivalents for other solutions)
  • Frequency ideally is at least once per minute
    • Balance quick sampling with performance impact on system
    • Make sure to account for sampling frequency when reading counters

42

Aug 21st - 25th 2023

43 of 53

Performance Monitoring

Tools like Telegraf and Prometheus have community driven options for gathering data from Lustre, Ceph, Spectrum Scale, etc.

Generally for various file systems poke around with the following to get insight into perf stats:

  • Lustre – Counters up in /proc/fs/lustre/obdfilter
  • Spectrum Scale – Use mmpmon or mmperfmon
  • Ceph – has built in metrics exporter

43

Aug 21st - 25th 2023

44 of 53

Performance Monitoring

Some real life examples to look at (some plots we have at NCSA for some of our systems):

Lustre – here

Spectrum Scale – here

Ceph – here

  • When plotting things out always make sure your units are correct for what you are gathering
  • Heat maps and line charts are great for these types of metrics

44

Aug 21st - 25th 2023

45 of 53

Quota Monitoring

Things worth gathering:

    • Bytes and inode usage per user
    • Bytes and inode usage per group
    • Bytes and inode usage per project
    • Bytes and inodes per user/group within a project (if possible)
    • Overall FS usage for bytes/inodes

  • Really important to analyze growth trends for capacity planning and debugging issues
  • Get this information out to users too so they can manage their usage

45

Aug 21st - 25th 2023

46 of 53

Quota Monitoring

  • Gathering and storing on as frequent of an interval as possible so users can get quicker feedback
    • Our default interval is every 15 minutes
  • Presenting the data via dashboards is great for trend analysis and or viewing tables of information for all users and doing sorts
  • Presenting via a CLI command/script is handy for instant/in the moment information
    • Can even have this print out every time the user logs into the system

46

Aug 21st - 25th 2023

47 of 53

Quota Monitoring

Some real life examples from our environments at NCSA

Lustre -- here

Spectrum Scale -- here

Ceph – here

  • One of the most “user-facing” aspects of storage monitoring
  • Remind users of the sampling delay of collection, whatever it ends up being for you

47

Aug 21st - 25th 2023

48 of 53

Health Monitoring

  • Very key to noticing issues on your file systems before/at the same time that your users notice
    • Users can/will find issues with your FS very quickly since so much rides on it
  • What you monitor for this is going to be very dependent on the file system you are running and even what vendor supplies that system
  • Don’t forget about some of the peripheral “health” aspects
    • Power metrics (is it stable/redundant/etc.)
    • Temperature metrics (is all staying cool)
    • Fans/Power Supplies/Low-Level hardware

48

Aug 21st - 25th 2023

49 of 53

Health Monitoring

  • A helpful way to stay on top of issues is to monitor file system “Quality of Service” (QOS)
    • How long does an “ls” take of certain areas
    • How long does a “stat” take of a file
    • Do quota commands respond
    • Track this stuff over time to learn best what is “normal” and what is an indication of a problem
  • Check the hardware health across all layers of the stack
    • Drive health
    • Cable link health
    • Server health
    • Network health

49

Aug 21st - 25th 2023

50 of 53

Health Monitoring

  • Check the FS health across all layers of its stack, will vary from FS to FS
    • Look at service health for all FS related services
    • Look at output of FS health commands
    • Parse through logs to look for errors being logged by the file system
  • Some examples from NCSA’s stack
    • Luster – here
    • Spectrum Scale – here
    • Ceph -- here

50

Aug 21st - 25th 2023

51 of 53

Storage Monitoring

  • Storage is often one of the more fragile pieces of HPC infrastructure so good monitoring of it is key
  • Use the tools best suited to your environment, what you’re familiar with, and what you can keep up and running
  • Monitoring in general (not just for storage) is one of my passions, feel free to checkout our Telegraf work at:

51

Aug 21st - 25th 2023

52 of 53

Acknowledgements

  • Members of the SET group at NCSA for slide content & review
  • Members of the steering committee for slide review

52

Aug 21st - 25th 2023

53 of 53

Questions

53

Aug 21st - 25th 2023