Deep Learning at MSI
Part II
Ham Lam and Mo Myat
Spring 2025
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Agenda
Prerequisite
Level: Intermediate
Deep Learning II at MSI
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Goal: What you will learn
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Generative AI Models
The science of creating NEW content from learned patterns
→ Text, Images, Video, Code, etc
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Generative AI Models
* DeepSeek R1 released on Jan, 2025
* Google released Gemma 3 on March 12th 2025
* Meta released Llama 4 on April 5th, 2025
Model card
Model Details: Brief description of model
Model Developers: Meta
Variations: Sizes (8B, 70B, etc), pretrained, instruction tuned, etc
Input: text only
Output: text and code only
Architecture: auto-regressive etc.
GPU Compute resources
Generative AI Models
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Industry AI models
Industry | AI Model |
Gemini, Gemma | |
Microsoft CoPilot | Utilizes ChatGPT series |
OpenAI | ChatGPT 5 |
Meta | LlaMa series |
IBM | Granite series |
Amazon | Nova |
ANTROPIC | Claude |
DeepSeek | DeepSeek series |
…. | ……. |
UMN-Licensed AI tools: Gemini, CoPilot, NotebookLM, and Zoom AI Companion
Generative AI Models
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Community AI models
Generative AI Models
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
> GUI application: LM Studio
> Web based: Open_WebUI
> Ollama (server + models)
> vLLM
> Langchain, HuggingFace transformer library, etc..
> Build AI based applications
> Good for developers but not good for ‘everyday’ users
Generative AI Models
Community AI models and Tools
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
GPU Compute resources
Generative AI Models: System Prompt
A system prompt (most are hidden from users) is an instruction given to an AI model to set the context, behavior, or tone for how it should respond during a conversation.
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
GPU Compute resources
Generative AI Models
System Prompt: OpenAI ChatGPT
GPU Compute resources
Generative AI Models: System Prompt
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
GPU Compute resources
Generative AI Models
openai/gpt-oss-20b
You are ChatGPT, a large language model trained by OpenAI.
Your task is to assist users in a friendly, clear, and concise manner while adhering to the following guidelines:
1. **Role & Tone**
- Act as an expert tutor/consultant in the user’s requested domain (e.g., coding, math, writing, travel).
- Use a conversational tone that is approachable yet professional.
- Begin each response with a brief greeting and end with an invitation for further questions.
2. **Content Constraints**
- Keep responses short: aim for 1–3 sentences per answer unless the user explicitly requests more detail.
- Avoid jargon; if technical terms are necessary, explain them briefly in plain language.
- Do not mention that you are an AI or reference your training data.
3. **Safety & Ethics**
- Refuse to provide instructions that facilitate wrongdoing (e.g., hacking, fraud).
- If a user asks for disallowed content (hate speech, explicit material), respond with a refusal and a brief apology.
- When uncertain about an answer, say “I’m not sure” and offer to try again or suggest resources.
4. **Formatting & Structure**
- Use bullet points for lists, numbered steps for processes, and code blocks for programming examples.
- For math or scientific queries, provide concise explanations; full derivations are optional unless requested.
- Keep code snippets short (≤ 20 lines) and runnable in a typical environment.
5. **Interaction Flow**
- If the user’s question is ambiguous, ask one clarifying question before providing an answer.
- Do not add unsolicited advice beyond what the user asks for.
- Always check for follow‑up needs after giving an answer (e.g., “Does that help?”).
6. **Response Quality**
- Ensure correctness; double‑check facts or code logic when possible.
- Avoid filler words (“um”, “like”) and keep sentences crisp.
- Use proper grammar, punctuation, and capitalization.
You should apply these rules to every user query during the conversation session.
End each response with a friendly prompt for further assistance, e.g., “Let me know if you’d like more details or have another question!”
GPU Compute resources
Generative AI Models: System Prompt
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Deep Learning at MSI
Generative AI models: Accuracy vs Cost
Enhancing AI models accuracy
Shifting the responsibility to users!
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Agate GPU Compute resources for AI jobs
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
GPU Compute resources
AI model size vs GPU memory (for inference)
→ Larger models (e.g. 70B) require significantly more GPU memory
→ Quantization ( e.g. 8-bit or 4-bit) reduces memory demand allowing larger models to fit on the GPUs. But can impact model accuracy
→ Offload larger models to host memory ‘doable’ but drastically increases latency
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
GPU Compute resources
GPU type | GPU Partition Names | Node sharing? | GPU Memory | Suitable AI model size* | Cores per node | Walltime limit | Total node memory | Local scratch | Max. nodes per job |
H100 | msigpu | Yes | 80GB | > 70 b | 128 | 24:00:00 | 768 GB | 850 GB | 4 |
A100 | a100-4, msigpu | Yes | 40GB | < 40 b** | 64 | 96:00:00 | 499 GB | 850 GB | 4 |
A100 | a100-8, msigpu | Yes | 40GB | < 40 b** | 128 | 24:00:00 | 1002 GB | 850 GB | 1 |
L40S | msigpu | Yes | 48GB | < 10 b | 128 | 24:00:00 | 768 GB | 850 GB | 4 |
A40 | preempt-gpu | Yes | 48GB | < 10 b | 128 | 24:00:00 | 499 GB | 850 GB | 1 |
A40, L40S | interactive-gpu | Yes | 48GB | < 10 b | 128 | 24:00:00 | 60 GB | 228 GB | 1 |
V100 | v100, msigpu | Yes | 32GB | < 10 b | 24 | 24:00:00 | 374 GB | 859 GB | 1 |
Agate GPU partitions
* Number of model parameters is in the billions (b)
** Model offload to multiple GPUs
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
AI model and Agate
Typical AI jobs setup
Backend
→ GPU resources
→ Enough storage space for AI model weights and user data
→ LLM serving engine
Frontend
→ User interface (Web GUI or Commandline?)
→ Tasks (e.g. chat-completion, RAG, data analysis)
Deep Learning at MSI
Generative AI Models and Agate
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
AI models and Agate and SLURM
Job Types
Interactive AI model inference (immediate results)
→ srun or salloc GPU backed terminals
→ Open OnDemand Interactive App
GPU: Desktop or JupyterNotebook
Batch AI model inference
→ Scripted large AI workloads
a job script, user data, AI model, and Prompts!
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
https://msi.umn.edu/our-resources/knowledge-base/new-home-directories
Staging AI jobs
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Deploy a LLM on Agate using community tools
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Deep Learning at MSI
Ollama: An AI model server
Copy and extract the package:
mkdir $HOME/Ollama
cd $HOME/Ollama
cp /common/tutorials/DeepLearning2/ollama-linux-amd64.tgz $HOME/Ollama
tar xvf ollama-linux-amd64.tgz
Model | Parameters | Size | Download |
Gemma 3 | 1B | 815MB | ollama run gemma3:1b |
DeepSeek-R1 | 7B | 4.7GB | ollama run deepseek-r1 |
Llama 3.3 | 70B | 43GB | ollama run llama3.3 |
Llama 3.2 | 1B | 1.3GB | ollama run llama3.2:1b |
Llama 3.2 Vision | 11B | 7.9GB | ollama run llama3.2-vision |
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Deep Learning at MSI
Ollama: An AI model server
e.g. ollama pull llama3.2
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Retrieval Augmented Generation (RAG)
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Deep Learning at MSI
Generative AI Tools and Agate
A Large Language Model’s knowledge is limited
Deep Learning at MSI
Pretrained LLMs vs RAG
User: Where are the AEDs located inside the Walter library building?
I have no idea!
RAG: A way to add your “own data” to the prompt that you pass into a LLM.
Advantages:
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Deep Learning at MSI
Retrieval Augmented Generation (RAG)
Question → Retriever → Large Language Model → Response
User provided CONTEXT
LLM can access up-to-date and specific information beyond its training data, making it more effective!
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Deep Learning at MSI
Retrieval Augmented Generation (RAG)
Question → Retriever → Large Language Model → Response
User provided CONTEXT
Typical flow for a RAG system is:
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Backend | Purpose |
Ollama engine | serves LLM |
Ollama supported models | LLM model (llama3.2) |
Frontend | Purpose |
Langchain | code library |
HuggingFace | Use an embedding model |
Chroma vector DB | Store chuncked data |
unstructured[all-docs] | Parse documents |
sentence-transformers | Create embedding in high dim space |
Python | rag script |
Deep Learning at MSI
RAG app: Software stack
Building a RAG app on Agate
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Deep Learning at MSI
Frontend Software Stack
Deep Learning at MSI
RAG app: Frontend Software Stack Install
## Use a “Terminal” from a GPU backed interactive session
# unload all loaded modules to start with a clean env
> module purge
> module load miniforge/24.3
> mamba create -n rag_env_test # will create the environment in your $HOME/.conda/envs
# if create the environment else where
>mamba create -p /scratch.global/<user_id>/rag_env_test
# activate the environment
>source activate /scratch.global/<user_id>/rag_env_test
# install python first
>mamba install python==3.11
# use pip to install langchain packages
>pip install langchain
>pip install langchain-community
>pip install langchain_huggingface langchain_ollama
# then install 3 more packages
>pip install "unstructured[all-docs]"
>pip install sentence-transformers
>pip install chromadb
# create a jupyter kernel
>python3 -m ipykernel install --user --name rag_env_test --display-name "rag_env_test"
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Deep Learning at MSI
Frontend Software Stack
Deep Learning at MSI
RAG app
Copy the rag scripts:
mkdir $HOME/RAG
cd $HOME/RAG
cp /common/tutorials/DeepLearning2/rag.py $HOME/RAG
# jupyter notebook script
cp /common/tutorials/DeepLearning2/rag.ipynb $HOME/RAG
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
Deep Learning at MSI
Frontend Software Stack
Deep Learning at MSI
# Run the rag.py script
# Start the Ollama server
$HOME/Ollama/bin/ollama serve
# On a second fresh terminal, activate the rag_env_test environment
source activate rag_env_test
cd $HOME/RAG
python rag.py
Run the RAG script
RAG pipeline implementation detail
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.
How to contact us
Minnesota Supercomputing Institute
© 2020 Regents of the University of Minnesota. All rights reserved.