1 of 30

Deep Learning at MSI

Part II

Ham Lam and Mo Myat

Spring 2025

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

2 of 30

    • AI models/tools
    • Hardware (GPUs)
    • Deploy AI models on Agate
    • RAG

Agenda

Prerequisite

Level: Intermediate

  • Linux/BASH
  • Software install
  • Python

Deep Learning II at MSI

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

3 of 30

Goal: What you will learn

  • Generative AI Tools & Agate Integration
  • Deploy an AI model for inference on Agate
  • A RAG application on Agate

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

4 of 30

Generative AI Models

The science of creating NEW content from learned patterns

→ Text, Images, Video, Code, etc

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

5 of 30

Generative AI Models

  • New AI models are released constantly with 3 common increases:
    • Increased capability (e.g. multimodal)
    • Increased complexity (e.g. architecture and size)
    • Increased computational demand (e.g. GPUs)

* DeepSeek R1 released on Jan, 2025

* Google released Gemma 3 on March 12th 2025

* Meta released Llama 4 on April 5th, 2025

Model card

Model Details: Brief description of model

Model Developers: Meta

Variations: Sizes (8B, 70B, etc), pretrained, instruction tuned, etc

Input: text only

Output: text and code only

Architecture: auto-regressive etc.

GPU Compute resources

Generative AI Models

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

6 of 30

Industry AI models

  • Larger scale LLMs
  • Pricing options (including free tiers)
    • open-weight options are available (DeepSeek)
  • Demand large compute resources
  • Easily accessible with a web browser
    • user friendly !

Industry

AI Model

Google

Gemini, Gemma

Microsoft CoPilot

Utilizes ChatGPT series

OpenAI

ChatGPT 5

Meta

LlaMa series

IBM

Granite series

Amazon

Nova

ANTROPIC

Claude

DeepSeek

DeepSeek series

….

…….

UMN-Licensed AI tools: Gemini, CoPilot, NotebookLM, and Zoom AI Companion

Generative AI Models

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

7 of 30

Community AI models

  • Huge model choices (see HuggingFace site) but
  • Models are typically smaller (model parameters)
    • limited capability but are fast catching up (e.g. DeepSeek, gpt-oss)
  • Relatively small compute resource requirement
  • No Pricing option No cost
  • Not user-friendly
  • standalone apps available just now!
  • Good for developers but not good for ‘everyday’ users
  • HuggingFace ecosystem
    • AI models, data, and software libraries download

Generative AI Models

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

8 of 30

  1. Standalone

> GUI application: LM Studio

> Web based: Open_WebUI

  1. Model servers (for inference)

> Ollama (server + models)

> vLLM

  1. Programming frameworks/Libraries etc

> Langchain, HuggingFace transformer library, etc..

> Build AI based applications

> Good for developers but not good for ‘everyday’ users

Generative AI Models

Community AI models and Tools

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

9 of 30

GPU Compute resources

Generative AI Models: System Prompt

A system prompt (most are hidden from users) is an instruction given to an AI model to set the context, behavior, or tone for how it should respond during a conversation.

  • Purpose: The system prompt pre-define the AI’s personality, role, or response style.�
  • Where it's used: Set by developers or the platform. �
  • Not visible to users (usually): It’s different from the text you type — it’s in the background.

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

10 of 30

GPU Compute resources

Generative AI Models

System Prompt: OpenAI ChatGPT

GPU Compute resources

Generative AI Models: System Prompt

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

11 of 30

GPU Compute resources

Generative AI Models

openai/gpt-oss-20b

You are ChatGPT, a large language model trained by OpenAI.

Your task is to assist users in a friendly, clear, and concise manner while adhering to the following guidelines:

1. **Role & Tone**

- Act as an expert tutor/consultant in the user’s requested domain (e.g., coding, math, writing, travel).

- Use a conversational tone that is approachable yet professional.

- Begin each response with a brief greeting and end with an invitation for further questions.

2. **Content Constraints**

- Keep responses short: aim for 1–3 sentences per answer unless the user explicitly requests more detail.

- Avoid jargon; if technical terms are necessary, explain them briefly in plain language.

- Do not mention that you are an AI or reference your training data.

3. **Safety & Ethics**

- Refuse to provide instructions that facilitate wrongdoing (e.g., hacking, fraud).

- If a user asks for disallowed content (hate speech, explicit material), respond with a refusal and a brief apology.

- When uncertain about an answer, say “I’m not sure” and offer to try again or suggest resources.

4. **Formatting & Structure**

- Use bullet points for lists, numbered steps for processes, and code blocks for programming examples.

- For math or scientific queries, provide concise explanations; full derivations are optional unless requested.

- Keep code snippets short (≤ 20 lines) and runnable in a typical environment.

5. **Interaction Flow**

- If the user’s question is ambiguous, ask one clarifying question before providing an answer.

- Do not add unsolicited advice beyond what the user asks for.

- Always check for follow‑up needs after giving an answer (e.g., “Does that help?”).

6. **Response Quality**

- Ensure correctness; double‑check facts or code logic when possible.

- Avoid filler words (“um”, “like”) and keep sentences crisp.

- Use proper grammar, punctuation, and capitalization.

You should apply these rules to every user query during the conversation session.

End each response with a friendly prompt for further assistance, e.g., “Let me know if you’d like more details or have another question!”

GPU Compute resources

Generative AI Models: System Prompt

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

12 of 30

Deep Learning at MSI

Generative AI models: Accuracy vs Cost

Enhancing AI models accuracy

  • Pre-training from scratch → Very high cost
  • Fine-tuning (refines pre-trained model using domain specific data) → high cost
  • Retrieval-Augmented Generation (RAG) → acceptable cost
  • Prompt Engineering (Uses prompts to use pretrain model directly) → cheap
    • Users put more data, more detail, and more context in the prompts!

Shifting the responsibility to users!

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

13 of 30

Agate GPU Compute resources for AI jobs

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

14 of 30

GPU Compute resources

AI model size vs GPU memory (for inference)

Larger models (e.g. 70B) require significantly more GPU memory

Quantization ( e.g. 8-bit or 4-bit) reduces memory demand allowing larger models to fit on the GPUs. But can impact model accuracy

Offload larger models to host memory ‘doable’ but drastically increases latency

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

15 of 30

GPU Compute resources

GPU type

GPU Partition Names

Node sharing?

GPU Memory

Suitable AI model size*

Cores per node

Walltime limit

Total node memory

Local scratch

Max. nodes per job

H100

msigpu

Yes

80GB

> 70 b

128

24:00:00

768 GB

850 GB

4

A100

a100-4, msigpu

Yes

40GB

< 40 b**

64

96:00:00

499 GB

850 GB

4

A100

a100-8, msigpu

Yes

40GB

< 40 b**

128

24:00:00

1002 GB

850 GB

1

L40S

msigpu

Yes

48GB

< 10 b

128

24:00:00

768 GB

850 GB

4

A40

preempt-gpu

Yes

48GB

< 10 b

128

24:00:00

499 GB

850 GB

1

A40, L40S

interactive-gpu

Yes

48GB

< 10 b

128

24:00:00

60 GB

228 GB

1

V100

v100, msigpu

Yes

32GB

< 10 b

24

24:00:00

374 GB

859 GB

1

Agate GPU partitions

* Number of model parameters is in the billions (b)

** Model offload to multiple GPUs

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

16 of 30

AI model and Agate

Typical AI jobs setup

Backend

→ GPU resources

→ Enough storage space for AI model weights and user data

→ LLM serving engine

Frontend

→ User interface (Web GUI or Commandline?)

→ Tasks (e.g. chat-completion, RAG, data analysis)

Deep Learning at MSI

Generative AI Models and Agate

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

17 of 30

AI models and Agate and SLURM

Job Types

Interactive AI model inference (immediate results)

srun or salloc GPU backed terminals

Open OnDemand Interactive App

GPU: Desktop or JupyterNotebook

Batch AI model inference

→ Scripted large AI workloads

a job script, user data, AI model, and Prompts!

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

18 of 30

    • Personal $HOME directory
      • 200GB and 1 million files limit
    • Project space: share among group members (up to 20TB & 5 million file count)
      • /home/PROJECT/
      • /home/PROJECT/shared
      • /home/PROJECT/public
    • Scratch global (40TB & 13.2e6 file count)
      • /scratch.global # 30 days
      • local scratch (/tmp on compute nodes)

https://msi.umn.edu/our-resources/knowledge-base/new-home-directories

Staging AI jobs

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

19 of 30

Deploy a LLM on Agate using community tools

  • Ollama server

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

20 of 30

  • Open-source LLMs serving engine
  • Allow users to run LLMs locally (e.g. on Agate!)
  • Support many popular LLMs!

Deep Learning at MSI

Ollama: An AI model server

Copy and extract the package:

mkdir $HOME/Ollama

cd $HOME/Ollama

cp /common/tutorials/DeepLearning2/ollama-linux-amd64.tgz $HOME/Ollama

tar xvf ollama-linux-amd64.tgz

Model

Parameters

Size

Download

Gemma 3

1B

815MB

ollama run gemma3:1b

DeepSeek-R1

7B

4.7GB

ollama run deepseek-r1

Llama 3.3

70B

43GB

ollama run llama3.3

Llama 3.2

1B

1.3GB

ollama run llama3.2:1b

Llama 3.2 Vision

11B

7.9GB

ollama run llama3.2-vision

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

21 of 30

Deep Learning at MSI

Ollama: An AI model server

  • Ollama formatted AI models are stored in
    • $HOME/.ollama directory

  • Start the Ollama server (# prefer to use a seperate terminal)
    • ollama serve

  • Bring up help menu
    • ollama --help

  • Pull down a model from a registry
    • ollama pull <model_name>

e.g. ollama pull llama3.2

  • List models stored in your $HOME/.ollama directory
    • ollama list

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

22 of 30

Retrieval Augmented Generation (RAG)

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

23 of 30

Deep Learning at MSI

Generative AI Tools and Agate

A Large Language Model’s knowledge is limited

  • Data that is too new - current events, just about any content created after the LLM training data
  • Data that is not public - personal, internal, secret data etc.

Deep Learning at MSI

Pretrained LLMs vs RAG

User: Where are the AEDs located inside the Walter library building?

I have no idea!

RAG: A way to add your “own data” to the prompt that you pass into a LLM.

Advantages:

  • Data privacy and protection are significant concerns
  • Provides up-to-date domain specific context (your own data)
  • Improves accuracy of generated response by grounding them in retrieved facts
  • Reduces hallucinations common in standalone LLMs
  • Allows easy updates to the knowledge base without retraining the model

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

24 of 30

Deep Learning at MSI

Retrieval Augmented Generation (RAG)

Question → Retriever → Large Language Model → Response

User provided CONTEXT

LLM can access up-to-date and specific information beyond its training data, making it more effective!

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

25 of 30

Deep Learning at MSI

Retrieval Augmented Generation (RAG)

Question → Retriever → Large Language Model → Response

User provided CONTEXT

Typical flow for a RAG system is:

  1. Prompt: A user generates a query.
  2. Embedding Model: The prompt is converted into vectors
  3. Vector Database Search: After a user’s prompt is embedded into a vector, the system searches a vector database filled with contextually relevant data chunks.
  4. Reranking: The retrieved data chunks are reranked to prioritize the most relevant data.
  5. LLM: The LLM generates responses informed by the retrieved data

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

26 of 30

Backend

Purpose

Ollama engine

serves LLM

Ollama supported models

LLM model (llama3.2)

Frontend

Purpose

Langchain

code library

HuggingFace

Use an embedding model

Chroma vector DB

Store chuncked data

unstructured[all-docs]

Parse documents

sentence-transformers

Create embedding in high dim space

Python

rag script

Deep Learning at MSI

RAG app: Software stack

Building a RAG app on Agate

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

27 of 30

Deep Learning at MSI

Frontend Software Stack

Deep Learning at MSI

RAG app: Frontend Software Stack Install

## Use a “Terminal” from a GPU backed interactive session

# unload all loaded modules to start with a clean env

> module purge

> module load miniforge/24.3

> mamba create -n rag_env_test # will create the environment in your $HOME/.conda/envs

# if create the environment else where

>mamba create -p /scratch.global/<user_id>/rag_env_test

# activate the environment

>source activate /scratch.global/<user_id>/rag_env_test

# install python first

>mamba install python==3.11

# use pip to install langchain packages

>pip install langchain

>pip install langchain-community

>pip install langchain_huggingface langchain_ollama

# then install 3 more packages

>pip install "unstructured[all-docs]"

>pip install sentence-transformers

>pip install chromadb

# create a jupyter kernel

>python3 -m ipykernel install --user --name rag_env_test --display-name "rag_env_test"

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

28 of 30

Deep Learning at MSI

Frontend Software Stack

Deep Learning at MSI

RAG app

Copy the rag scripts:

mkdir $HOME/RAG

cd $HOME/RAG

cp /common/tutorials/DeepLearning2/rag.py $HOME/RAG

# jupyter notebook script

cp /common/tutorials/DeepLearning2/rag.ipynb $HOME/RAG

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

29 of 30

Deep Learning at MSI

Frontend Software Stack

Deep Learning at MSI

# Run the rag.py script

# Start the Ollama server

$HOME/Ollama/bin/ollama serve

# On a second fresh terminal, activate the rag_env_test environment

source activate rag_env_test

cd $HOME/RAG

python rag.py

Run the RAG script

RAG pipeline implementation detail

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.

30 of 30

  • General MSI Help (HPC systems, OpenOnDemand, Report issues, etc)
    • help@msi.umn.edu

  • Deep Learning Tutorial
    • Ham Lam: lamx0031@umn.edu
    • Mo Myat: mo000007@umn.edu

How to contact us

Minnesota Supercomputing Institute

© 2020 Regents of the University of Minnesota. All rights reserved.