1 of 40

Annual PI & User Meeting

HPC Center

--

Time and date:

2:00 - 3:30 PM, April 27, 2018

Location:

Genomics Building Auditorium 1102A

--

Presented by:

Jordan Hayes, Charles Forsyth, Thomas Girke

1

HPCC

Meeting

2 of 40

Outline

  • Overview
    • Organization
    • Usage stats
    • Services
      • Hardware
      • Software
      • Events and training
    • Recharging
      • Current rates
      • New rates, e.g. storage
      • Extended ownership options
    • News
      • Cloud HPC (e.g. AWS, GCP, XSEDE)
      • Miscellaneous Updates
  • Planning
    • Upgrades and expansions
    • Additional services
    • Funding: grants and donations
  • Discussion

2

HPCC

Meeting

3 of 40

Mission

3

  • Campus-wide research HPC infrastructure and planning
  • Networking with other HPC resources at UCR, other UCs and beyond
  • Training in HPC, big data processing and cloud computing

HPCC

Meeting

4 of 40

Historical Context

4

  • HPCC started in first half of 2017
  • Evolved from IIGB bioinformatics facility
  • Motivation: campus-wide facility is more efficient and minimizes duplications
  • Bioinformatics facility is now a user of HPCC

HPCC

Meeting

5 of 40

HPCC Website

5

  • URL: http://hpcc.ucr.edu
  • Example pages
    • Access info
    • Events
    • Manuals
    • Facility description for grants
    • ...

HPCC

Meeting

6 of 40

Management

6

  • As campus-wide facility HPCC reports directly to RED
  • Staff
    • Jordan Hayes and Charles Forsyth (HPC Systems Administrators)
    • Student HPC admin assistant(s): Austin Leong
    • Thomas Girke (Director)
  • Advisory board: faculty and technical experts with hands-on HPC experience
    • Current members see here
  • Office location: 1207G/1208 Genomics Building, 3401 Watkins Drive, University of California, Riverside, CA 92521

HPCC

Meeting

7 of 40

HPCC Server Room

7

Main server room

  • Genomics Building, Rm 1120A
  • For future expansions: additional server room under discussion

Backup system

  • CoLo facility in ITS server room (SOM Education Building, Rm 1601B)
  • Geographically separated location more secure in case of disaster

HPCC

Meeting

8 of 40

Funding

8

  • Recharging: to offset staff salaries and to provide operating budget for repairs and basic infrastructure, e.g. network switches, AWS/GCP accounts
  • Remaining funds from RED (in past it was CNAS)
  • Recent Equipment Grants
    • NIH-S10-2014: $652,816
    • NSF-MRI-2014: $783,537
    • NSF-CC*IIE-2014: $499,893
  • Seed funding by UCR/RED for HPCC
    • RED-2016: $320,000
  • Future grants
    • Equipment grants later this year

HPCC

Meeting

9 of 40

Communication Among Users and Staff

9

  • Website: hpcc.ucr.edu
  • Request Tracker ticket system where users/PIs email support@biocluster.ucr.edu to request:
    • User accounts
    • Software installs/upgrades
    • Troubleshooting inquiries
    • Schedule meetings for in-person training
    • Other help
  • Communication beyond email: Twitter, Slack & GitHub

HPCC

Meeting

10 of 40

Outline

  • Overview
    • Organization
    • Usage stats
    • Services
      • Hardware
      • Software
      • Events and training
    • Recharging
      • Current rates
      • New rates, e.g. storage
      • Extended ownership options
    • News
      • Cloud HPC (e.g. AWS, GCP, XSEDE)
      • Miscellaneous Updates
  • Planning
    • Upgrades and expansions
    • Additional services
    • Funding: grants and donations
  • Discussion

10

HPCC

Meeting

11 of 40

Usage Stats: Labs and Users

11

UCR labs

  • 104 registered UCR labs (>140 past 8 years) from 22 departments at 3 colleges (BCOE, CNAS, SOM)
  • Growth rate >10 new labs per year

External labs

  • 10 registered external labs; often from other UC campuses or labs that moved and still have affiliation to UCR

Number of users

  • >350; >150 highly active

HPCC

Meeting

12 of 40

Usage Stats: Signups

12

HPCC

Meeting

13 of 40

Usage Stats: Lab by Departments and Colleges

13

  • BCOE
  • CNAS
  • SOM

HPCC

Meeting

14 of 40

Usage Stats: Disk Usage

14

HPCC

Meeting

15 of 40

Usage Stats: Software Installs

15

Installs per week for the last 365 days: GitHub Activity Graph

HPCC

Meeting

16 of 40

Tracking Usage

16

  • squeue
    • squeue -A girkelab -t PD --start
    • squeue -A girkelab -t R
  • jobMonitor (activity report)
  • Ganglia
  • Dashboard
    • Create/Disable Accounts
    • Storage Usage
    • Cluster Load
    • CPU Usage

HPCC

Meeting

17 of 40

Outline

  • Overview
    • Organization
    • Usage stats
    • Services
      • Hardware
      • Software
      • Events and training
    • Recharging
      • Current rates
      • New rates, e.g. storage
      • Extended ownership options
    • News
      • Cloud HPC (e.g. AWS, GCP, XSEDE)
      • Miscellaneous Updates
  • Planning
    • Upgrades and expansions
    • Additional services
    • Funding: grants and donations
  • Discussion

17

HPCC

Meeting

18 of 40

Current Hardware Infrastructure

18

For both

  • Research
  • Teaching

Head nodes

User

Computer cluster

  • 106 Nodes (Intel, AMD, GPU)
  • ~6500 CPU cores
  • 512-1024GB RAM per node
  • GPU: ~60,000 cuda cores
  • IB network @ 56Gbs

Big data storage cluster

  • ~1.6PB GPFS storage (scales to >50PB)
  • Home directories on dedicated system

Backup system

  • ~1.2PB GPFS storage
  • Separate server room

HPCC

Meeting

19 of 40

Queues for Different CPU/GPU Architectures

19

  • AMD batch queue: 48 AMD nodes with 3,072 CPU cores and 512GB RAM per node
  • Intel batch queue: 48 Intel nodes with 3,072 CPU cores and 512GB RAM per node
  • Highmem queue: 6 Intel nodes with 192 CPU cores and 1TB RAM per node
  • GPU queue: 12 NVIDIA Tesla K80 boards with 59,904 cuda cores
  • Interconnect: FDR IB network @ 56Gbs

HPCC

Meeting

20 of 40

Software

20

  • Large stack of over 1000 free and/or open-source software tools
  • Facility installs new software upon user requests on a daily basis often with less than 24 hours turnaround time. Difficult installs may take longer.
  • Commercial tools only where necessary:
    • GPFS for parallel storage system
    • Gaussian
    • MATLAB (to use ask to be added to license!)
    • Intel Parallel Suite
    • SAS
  • Additional commercial software can be installed if funding for license can be resolved

HPCC

Meeting

21 of 40

How to Check What Software Is Installed?

21

  • HPCC website
  • Module system
    • module avail
  • GitHub Repo
    • HPCC Modules

HPCC

Meeting

22 of 40

Online Software: Alternatives to Terminal Access

22

Users with HPCC account also have access to:

  • Galaxy - Bioinformatics suite, fully integrated with cluster resources
  • RStudio Server - R editor, schedule cluster jobs via code
  • Jupyter-Hub - Python/R editor, schedule cluster jobs via code
  • Shiny - TBA

Additional access (specialized account):

  • SMRT-Portal - PacBio suite, fully integrated with cluster resources
  • HTS - Genomics portal for managing and delivering sequencing data

HPCC

Meeting

23 of 40

User Training

23

  • Short monthly events on Linux, scheduler, parallelization, data carpentry, reproducible research, etc.
  • Longer annual events: on to-do list; possibly jointly with Data Science Center on HPC/cloud usage, big data analysis programming, etc.
  • In person training available upon request

HPCC

Meeting

24 of 40

Online Manuals

24

  • Available on HPCC website
    • Linux basics
    • HPC Cluster
    • HPCC Cloud/External
      • AWS
      • GCP: to be developed
      • XSEDE: under construction

HPCC

Meeting

25 of 40

Outline

  • Overview
    • Organization
    • Usage stats
    • Services
      • Hardware
      • Software
      • Events and training
    • Recharging
      • Current rates
      • New rates, e.g. storage
      • Extended ownership options
    • News
      • Cloud HPC (e.g. AWS, GCP, XSEDE)
      • Miscellaneous Updates
  • Planning
    • Upgrades and expansions
    • Additional services
    • Funding: grants and donations
  • Discussion

25

HPCC

Meeting

26 of 40

Recharging and Quotas

26

Subscription fee

  • Subscription based access model: $1000 per lab/yr
  • Big data storage: $1000 per 10TB data of backed up data storage
  • Ownership models for computer nodes and data storage. Attractive to labs that need 24/7 access to hundreds of CPU cores and more than 30TB storage.

Quota

  • Maximum CPU core usage limited to 512 CPU cores per lab and 256 CPU cores per user per partition
  • Note: there is no charge for CPU hours!!!
  • RAM quotas are queue specific

HPCC

Meeting

27 of 40

Recharging Updates for 2018/2019

27

  • Proposed option to rent storage in smaller increments: $25/100GB/yr ($250/TB/yr)
  • One month test account for new labs: restricted to one user with reduced CPU quota and no big data access.
  • Any additional suggestions?

HPCC

Meeting

28 of 40

Ownership Options

28

Ownership models for computer nodes and data storage. Attractive to labs that need 24/7 access to hundreds of CPU cores and more than 30TB storage.

Storage

    • Purchase hard drives @ current market price that will be added to GPFS storage system
    • Annual maintenance fee ¼ of rental price
    • Often more cost effective for storage needs ≥30TB

Computer nodes

    • Purchase nodes at current market price with compatible network cards
    • Administered under a priority queueing system that gives users from an owner lab priority via private queue and also increases

that lab’s overall CPU quota by the number of

owned CPU cores

    • Currently, no annual extra cost

HPCC

Meeting

29 of 40

Outline

  • Overview
    • Organization
    • Usage stats
    • Services
      • Hardware
      • Software
      • Events and training
    • Recharging
      • Current rates
      • New rates, e.g. storage
      • Extended ownership options
    • News
      • Cloud HPC (e.g. AWS, GCP, XSEDE)
      • Miscellaneous Updates
  • Planning
    • Upgrades and expansions
    • Additional services
    • Funding: grants and donations
  • Discussion

29

HPCC

Meeting

30 of 40

Cloud HPC

30

Through AWS tools and custom HPCC configurations we enable users to:

  • Build a private cluster in ~15 min.
    • Any number of nodes, auto scaling (up and down), limit 20 to start
    • Many types of compute nodes
      • High memory
      • High CPU
      • GPUs
      • Choose HPCC base type
    • For only as long as you need it
    • Familiar interface and job scheduler
    • Easy ability to have the software you need installed by HPCC, if it’s not there already
  • Build as many clusters as you need (even at the same time)
  • Pay for only the time you use it - per/min billing

HPCC

Meeting

31 of 40

Cloud HPC

31

Easy to use custom controls with HPCC Software installed along with support and consulting by HPCC staff.

Amazon AWS

Head Node

Instance

HPCC

Software

Compute Node Instance

Compute Node Instance

Compute Node Instance

Compute Node Instance

Compute Node Instance

Compute Node Instance

Compute Node Instance

Compute Node Instance

Compute Node Instance

Compute Node Instance

Auto Scaling Group

10G Network

HPCC

hpcc_cloud create

hpcc_cloud list

hpcc_cloud status

hpcc_cloud delete

ssh

scp

HPCC

Meeting

32 of 40

Cloud HPC

32

Billing and Cost Control

  • Users billed directly by Amazon through a PO associated with a FAU.
  • Costs based on:
    • Data Transferred (UC/AWS Enterprise Agreement - egress waiver)
    • Data Storage
    • Compute and Head node types used
      • These vary based on job requirements
        • High memory
        • High CPU single/multi-node
        • GPUs
      • Can use spot pricing to lower costs in some cases
  • Budget alerts and built in limits can be used to monitor and control spending

HPCC

Meeting

33 of 40

XSEDE

33

XSEDE is an NSF-funded virtual organization that integrates and coordinates the sharing of advanced digital services - including supercomputers and high-end visualization and data analysis resources - with researchers to support science.

  • Website
    • https://www.xsede.org/
  • Getting Started
  • Campus Champion Allocation
    • https://www.xsede.org/web/campus-champions/ccalloc
    • We have a sizable campus champion allocation of about 50,000 compute hours on multiple supercomputers which can be used for testing. This includes resources such as latest Intel CPUs, High Memory, Storage, Jetstream and GPUs.

HPCC

Meeting

34 of 40

Miscellaneous Updates

34

  • Conda - Python management
  • Singularity - Docker alternative (singularity-hub)

HPCC

Meeting

35 of 40

Outline

  • Overview
    • Organization
    • Usage stats
    • Services
      • Hardware
      • Software
      • Events and training
    • Recharging
      • Current rates
      • New rates, e.g. storage
      • Extended ownership options
    • News
      • Cloud HPC (e.g. AWS, GCP, XSEDE)
      • Miscellaneous Updates
  • Planning
    • Upgrades and expansions
    • Additional services
    • Funding: grants and donations
  • Discussion

35

HPCC

Meeting

36 of 40

Upgrades and Expansions

36

  • Additional Intel and GPU nodes → major upgrade will require equipment grant
  • Potentially, custom cluster support model?

HPCC

Meeting

37 of 40

Additional Services

37

  • Additional web-based access to cluster similar to RStudio Server
  • Additional workshops
  • Please send us your suggestions and ideas

HPCC

Meeting

38 of 40

Funding: Grants and Donations

38

Equipment grant options

  • NSF-MRI
  • NIH-S10
  • Other

Type of grant

  • Strategically, a GPU cluster request would make a strong application
  • Other ideas?

HPCC

Meeting

39 of 40

Outline

  • Overview
    • Organization
    • Usage stats
    • Services
      • Hardware
      • Software
      • Events and training
    • Recharging
      • Current rates
      • New rates, e.g. storage
      • Extended ownership options
    • News
      • Cloud HPC (e.g. AWS, GCP, XSEDE)
      • Miscellaneous Updates
  • Planning
    • Upgrades and expansions
    • Additional services
    • Funding: grants and donations
  • Discussion

39

HPCC

Meeting

40 of 40

Discussion

40

?

HPCC

Meeting