1 of 29

UTIA Computational Resources Orientation

Dr. Margaret Staton and Ryan Kuster

Updated 2025.07.03

2 of 29

What You’ll Learn:

  • Code of conduct
  • Computational Infrastructure
    • Equipment
    • Software Management
  • Data management
    • Project directories
    • Documentation

3 of 29

Code of conduct

We are committed to a diverse, welcoming, and inclusive environment. We welcome students, postdoctoral research associates, visiting scholars, and others regardless of age, appearance, disability status, gender, gender identity, geographic background, marital/partnered status, political affiliation, race, religion, sexual orientation, and all other characteristics that make each of us unique. We continually work to create an inclusive environment that reflects the diversity of society in general. We aim to cultivate an environment built on mentorship, encouragement, tolerance, and mutual respect. We believe diversity brings together a wide range of abilities, experiences, perspectives, and world views that are crucial to enriching experiences and addressing challenging research questions.

4 of 29

Introduction - Who we are

  • Staton lab is partially dedicated to helping other labs with bioinformatics.
  • If you have questions about running a process, don’t hesitate to contact us!
    • If we are unavailable at the time, we will try to get someone else from the Staton lab to help you.

Meg Staton Ryan Kuster

mstaton@utk.edu rkuster@utk.edu

5 of 29

Introduction - statonutia Shared Computational Resources

/pickett_centaur

224Tb

iSCSI

/pickett_sphinx

101Tb

iSCSI

/sphinx_local

70Tb

SSD, local

Sphinx is faster than Centaur - and has faster storage - but has fewer CPUs. Be judicious in using it and move all files out of Sphinx once you are done with them. We reserve the right to remove unattended files from Sphinx if they are taking up too much memory.

centaur.ag.utk.edu

96 logical CPUs

512 Gb RAM

sphinx.ag.utk.edu

64 logical CPUs

512 Gb RAM

centaur

sphinx

6 of 29

Introduction - statonutia Shared Computational Resources

/pickett_centaur

224Tb

iSCSI

/pickett_sphinx

101Tb

iSCSI

/sphinx_local

70Tb

SSD, local

While logged in to the UTIA-CR systems, you’ll be connected to the ISAAC-NG filesystem. Your home directory will be in /nfs/home/netid and you will have access to scratch at /lustre/isaac/scratch/netid.

/nfs/home/netid

50Gb

nfs

/lustre/isaac24/scratch/netid

10Tb

lustre

centaur

sphinx

7 of 29

Introduction - statonutia Shared Computational Resources

/pickett_centaur

224Tb

iSCSI

/pickett_sphinx

101Tb

iSCSI

/sphinx_local

70Tb

SSD, local

Note: if you login to isaac-ng, you will not have access to the /pickett or /sphinx_local storage!

/nfs/home/netid

50Gb

nfs

/lustre/isaac24/scratch/netid

10Tb

lustre

isaac-login

8 of 29

Introduction - statonrg Shared Computational Resources

/pickett_flora

224Tb

iSCSI

When you login to flora, you’ll be mounted to the pickett_flora storage.

flora.ag.utk.edu

96 logical CPUs

512 Gb RAM

flora

9 of 29

Introduction - statonrg Shared Computational Resources

While logged in to the UTIA-CR systems, you’ll be connected to the ISAAC-NG filesystem.

Your home directory will be in /nfs/home/<netid> and you will have access to scratch at /lustre/isaac/scratch/netid and project UTK033[01] at /lustre/isaac24/proj/UTK0330/

/nfs/home/netid

50Gb

nfs

/lustre/isaac24/scratch/netid

10Tb

lustre

/pickett_flora

224Tb

iSCSI

flora

phinx

/lustre/isaac24/proj/UTK033[01]

115Tb

lustre

10 of 29

Introduction - statonrg Shared Computational Resources

While logged in to the UTIA-CR systems, you’ll be connected to the ISAAC-NG filesystem:

home at /nfs/home/<netid>

scratch at /lustre/isaac/scratch/netid

project UTK033[01] at /lustre/isaac24/proj/UTK0330/

/nfs/home/netid

50Gb

nfs

/lustre/isaac24/scratch/netid

10Tb

lustre

phinx

mandrake

/lustre/isaac24/proj/UTK033[01]

115Tb

lustre

11 of 29

Servers overview

Filesystem

Directory

Project Name

Group ID

SSH

Management

/lustre/isaac24

…/proj/UTK0032

ACF-UTK0032

tug2137

login.isaac.utk.edu

slurm

*/lustre/isaac24

…/proj/UTK0330

ISAAC-UTK0330

isaac2627�tug2106

mandrake.ag.utk.edu�flora.ag.utk.edu

calendar

*/lustre/isaac24

…/proj/UTK0331

ISAAC-UTK0331

isaac2628

centaur.ag.utk.edu�sphinx.ag.utk.edu

calendar

/lustre/isaac24

…/scratch/<netid>

server-dependent

* The pickett storage system is linked to these directories, but is distinct (/pickett rather than lustre).

12 of 29

Getting Logged On

  1. Request an OIT ISAAC account here.
  2. We add you to the project “statonutia” (ISAAC-UTK0331) via the ISAAC user portal.

13 of 29

How to Login?

From home, use the VPN:

Install and activate VPN Software Ivanti: https://utk.teamdynamix.com/TDClient/2277/OIT-Portal/KB/ArticleDet?ID=122938

Open a terminal window and run ssh:

ssh <yourusername>@centaur.ag.utk.edu

Or

ssh <yourusername>@sphinx.ag.utk.edu

14 of 29

Server Organization

*When you first log in, you will be in your home directory (/nfs/home/netid).

Don’t put analysis stuff here, it is tiny!!!

/nfs/home/netid (your home directory*)

/lustre/isaac24/scratch/netid (scratch directory)

/pickett_centaur/projects (centaur login)

/pickett_sphinx/projects (sphinx login)

/sphinx_local/projects (sphinx login)

15 of 29

Sharing the Servers

  • It can get a bit crowded!
  • Commands to track server usage:
    • htop
    • ps -ef | grep <yourusername>
  • There should always be at least 1 CPU free so the server doesn’t crash and other users can login
  • Realistically, it’s hard to know if your job or others will reach this limit, so if cpu use is ~90%, consider alternative servers options.

16 of 29

The UTIA Computational Resources Calendar

  • After orientation, you will be added to the UTIA Computational Resources Calendar.
  • This offers an opportunity to book RAM/cores and let others know when you will be using the server for an intensive process.
  • Always update whenever you start a new process!
  • If a job finishes earlier than expected, also be sure to update so other users will know they can run their jobs now.

17 of 29

Data Management is

ESSENTIAL

18 of 29

Project Organization

  • Everything for one project will go in one master folder
  • Give this folder a sensible, descriptive name

The organization inside this folder varies from lab to lab. One strategy is outlined in Noble (2009) “A Quick Guide to Organizing Computational Biology Projects

Optional approach: Cookie Cutter Data Science

Just be consistent.

19 of 29

The Staton Lab Approach to Project Organization

supercoolproject

raw_data

analysis

code

analysis_20200325_dex

analysis_20190605_genetic_map

1_trimming

2_assembly

3_functional_annotation

(scripts)

We like to use this method in the lab to keep things in order; while we can’t tell you how to organize your analyses, we do suggest this method!

20 of 29

Raw Data

  • Always save the raw data in the raw_data folder AND on Google drive or OneDrive
  • README – with origin of data
    • Where did it come from? Who gave it to you?
    • Date of download
    • Version

Because you never know what might happen!

21 of 29

Software Management

Download binary

Conda OR Apptainer (FKA Singularity)

Direct Install

/pickett_centaur/software

Or

/sphinx_local/software

You are welcome to install software using your own conda instance.

For direct installs, email/slack us.

>

>

22 of 29

Permissions Management

  • We can set read/write/execute permissions on linux folders
  • You will start out in your own very limited group. You will not be able to make changes to any folder but your own.
  • For now, we plan to keep it that way to prevent users from accidentally deleting others’ data.

23 of 29

Storage Management

  • Due to the nature of data that we work with, do not take storage for granted: storage can become full quickly.
  • The best solution is to be proactive and establish best practices to reduce excessive memory usage so that we do not have to purge every file without your permission.

24 of 29

Storage Good Citizenship

  • What to keep?
    • Raw data
      • Do not keep if it is publicly available. Instead add to the README about where to find it. (Save the fastq-dump command!)
    • Scripts
    • Final analysis outputs
  • COMPRESS
      • .tar.gz is smaller than either alone
      • Sam -> BAM
      • Compress files or small folders, not whole project directories
  • What to throw away?
    • Intermediate analysis files
    • Files from abandoned analysis

25 of 29

Example

How do I check the size of a file?

ls -lh

How do I check the size of a folder?

du –skh <folder>

How do I compress a file?

tar -cvzf file.tar.gz file.txt

How do I decompress a file?

tar -xvzf file.tar.gz

What about a bam file?

See samtools documentation

26 of 29

Best Practices for Files

  • Use sensible, descriptive names for all folders and files
  • No spaces
  • Use file extensions (.txt, .fasta, .sam)

What about back ups?

  • ISAAC: RAID only
  • Google Drive/OneDrive: Rclone
  • Back up your laptop on a hard drive every day

27 of 29

Transferring Data

  • If you need to transfer huge files between computers/hard drives, consider these technologies:
    • Globus (ISAAC for the most part)
    • Scp
    • Rsync
    • Rclone (Google drive, Microsoft OneDrive, drop box)
  • should use checksums!

28 of 29

GitHub Wiki

  • Upon completing orientation, you will be given access to the UTIA Computational Resources Wiki.
  • This Wiki gives an in-depth review on the topics we have covered during this talk.
  • If you have new documentation for a common process, feel free to add it to the Wiki!

29 of 29

Any Questions?