Building Your Own
Hyper-Specialized LLM
Google Colab · Gemma 3 · QLoRA
Tasha Penwell, MISM
Assistant Professor of Instruction — Analytics & Information Systems
Ohio University College of Business · Doctoral Student, Instructional Technology
CODESTOCK · ROOM 300A · THURSDAY 3:00 PM
prettynerdydigitalmarketing.com · online-thrift.com
https://linktr.ee/penwellcodestock
Before we start
I am not an AI researcher.
I am an educator at a College of Business in rural Ohio. I am building a platform for thrift stores. What I am going to show you today I learned by doing it — in the same free browser tab I am going to hand to my students.
That is not a disclaimer. That is the point.
If I can build this — so can students and other young professionals.
Ohio University College of Business · Doctoral Student, Instructional Technology · prettynerdydigitalmarketing.com
TERMINOLOGY — WE WILL USE THESE WORDS TODAY
Keep this in mind as we go
You do not need to memorize these now.
Parameters / Weights the numbers inside a model that encode learned patterns
Loss how wrong the model's predictions are — should decrease during training
Tokens word fragments — the unit of text a model processes
Gradients the signal telling the model which direction to adjust its weights
Fine-Tuning retraining a model to specialize it for a specific domain
Overfitting memorizing training data instead of learning generalizable patterns
PEFT Parameter-Efficient Fine-Tuning — train a tiny fraction, freeze the rest
Quantization compressing model precision (16-bit → 4-bit) to reduce memory
LoRA Low-Rank Adaptation — the adapter method that makes PEFT practical
QLoRA Quantized LoRA — compresses the model to 4-bit so it runs on free hardware
THE TOOLS THAT MAKE THIS POSSIBLE
PEFT and QLoRA
Two techniques that collapsed the cost of fine-tuning from enterprise hardware to a free browser tab.
PEFT
Parameter-Efficient Fine-Tuning
Freeze most of the model. Train only a small set of new adapter layers on top.
The base model already knows language. You are only teaching it your domain.
Result: same powerful foundation, new specialized behavior.
QLoRA
Quantized Low-Rank Adaptation
PEFT + compression. The base model is squeezed from 16-bit to 4-bit precision before the adapters are trained.
Before QLoRA: fine-tuning a 1B model required a $10k–$50k GPU cluster.
After QLoRA: a free Colab browser tab.
Dettmers et al. (2023) — QLoRA: Efficient Finetuning of Quantized LLMs — University of Washington
THE HOOK
Two Versions of the Same Problem
IN THE CLASSROOM
Students learn to USE AI.
Very few learn to BUILD it.
Fear of breaking things. Fear of cloud costs. Fear of needing a $10k GPU cluster.
A College of Business — not CS — is pushing past that.
IN THE WORKPLACE
General AI knows everything about everywhere.
A fine-tuned model knows everything about
here.
Your data. Your voice. Your domain.
Permanently.
THE FRAMEWORK
Three questions. Same logic.
The Question
Use the simpler tool when...
Go deeper when...
CODE or USE AI?
Task varies, speed > consistency, good enough works
Consistent behavior, proprietary logic, ownership matters
PROMPT or FINE-TUNE?
One-off task, fast setup, low stakes output
Hundreds of outputs, domain data, brittle prompt results
GENERAL or SPECIALIST?
Broad coverage needed, no proprietary data
Deep local knowledge — your data is your moat
All three questions ask: when does a tool stop being sufficient — and when do you need to go deeper?
AI BASICS
AI: What's Happening Inside
A Large Language Model (LLM) is like a massive network of interconnected number tables (weights) that work together to transform text and predict what comes next.
Not a database. Not a search engine. A probabilistic pattern matcher.
Parameters / Weights the numbers inside the model that encode learned patterns
Tokens word fragments — the unit of text a model processes
Base Model the pre-trained foundation — Gemma 3 1B has ~1,000,000,000 parameters
Students who understand this stop treating the tool like a person and start using it strategically.
KEY CONCEPT
Prompting vs Fine-Tuning
PROMPTING
✓ Fast setup, no training�✓ Works for one-off tasks�✓ Good enough for broad queries��✗ Memory is platform-managed and clearable�✗ Domain knowledge re-established each time�✗ Behavior varies when memory is incomplete
FINE-TUNING ✓
✓ Domain knowledge is inside the weights�✓ Consistent across hundreds of outputs�✓ Works even without prompting memory��✗ Requires training data and setup time�✗ Not right for every task or every team�
Note: platforms like ChatGPT and Claude do offer memory features — but that memory lives outside the model. Fine-tuning puts the knowledge inside it.
KEY CONCEPT
Your data is the moat.
The most valuable ingredient is not the model. It's the data nobody else has.
Exclusive Data Moat
Their inventory history, pricing patterns, tone. No general model has this. It is the only thing that cannot be replicated.
Consistency Moat
A fine-tuned model produces consistent output at scale. No two prompts behave differently.
Network Effect Moat
More stores → better patterns → better model → more stores. Data compounds over time.
Gemma is freely available to everyone. Your data is what nobody else has.
THE REAL-WORLD USE CASE
online-thrift.com
THE PROBLEM
Small thrift stores have real inventory, real community value, and real customers.
What many lack: an online presence. Not for lack of wanting — but listing products at scale requires writing fluency that a solo operator simply doesn't have bandwidth for.
THE AI LAYER
The fine-tuning pipeline you are about to see is what I am building for this platform.
The Chrome extension card in the demo is a working prototype of the end product — the extension itself has not been built yet.
The pipeline is real. The extension is next.
Live in beta · actively recruiting test partners · If you know a thrift store, talk to me after.
THE CLASSROOM TOOL
Google Colab: Consequence-Free Learning
No installation
No local setup, no broken environments, no IT tickets
No hardware cost
A free T4 GPU with 15GB Video RAM (VRAM) — in a browser tab
No fear
Students can run this, crash it, restart — zero cost, zero consequence
Portfolio-ready
Save to GitHub → version history + reproducibility + shareable link
In rural America, the gap between using technology and understanding it isn't just an academic gap. It's an economic one.
TRAINING DATA
Input → Output pairs
GIGO
Garbage In
Garbage Out
38 pairs across 8 categories.
Stored in thrift_training_data.json
Separate from the notebook — swap any store's data without changing the code.
You don't need to read this. The model does.
CELL 1–3 — ENVIRONMENT SETUP
Step 1: Install, Authenticate, Check GPU
Install unsloth, transformers, peft, trl, bitsandbytes
Red warnings are harmless — check for ✅ at the bottom
Authenticate with Hugging Face token
Confirm NVIDIA Tesla T4 GPU with 15.6 GB VRAM
Teaching note: Runtime → Change Runtime Type → T4 GPU. Do this first, every time.
CELL 2 — AUTHENTICATION
Step 2: Hugging Face Login
Hugging Face = GitHub for AI models
Gemma is gated — accept terms first
Token = read-only API key
huggingface.co/settings/tokens
Modern LLM development depends on shared model hubs, not training from scratch.
CELL 3 — GPU CHECK
Step 3: Confirm GPU + VRAM
Tesla T4 confirmed: 15.6 GB VRAM
Gemma 3 1B in 4-bit needs ~4 GB
We have plenty of room for training
VRAM = the critical training bottleneck
VRAM is where model weights, gradients, and activations all have to fit simultaneously.
CELL 4–5 — TRAINING DATA
Step 4: Upload data & format prompts
Upload thrift_training_data.json
38 input/output pairs validated
4 categories auto-detected
Formatted into Gemma chat template
Separate data file = swap any store's inventory without changing the notebook.
CELL 6 — LOAD MODEL
Step 5: Load Gemma 3 1B in 4-bit
1,000,000,000 parameters loaded
NF4 quantization: 16-bit → 4-bit
Only ~0.5 GB VRAM for model weights
Math still done in BFloat16 for stability
Like unzipping a file, doing the work, then re-zipping. Quality loss is real but small.
CELL 7 — BASELINE
Step 6: Base model output (before training)
Verbose, over-formatted output
Asks for image placeholder
Markdown asterisks everywhere
Sounds like a chatbot, not a thrift store
Save this output — you'll compare it after fine-tuning. That difference IS the entire value proposition.
CELL 8 — LORA CONFIG
Step 7: Configure LoRA Adapters
trainable%: 1.2879 ← the PEFT proof
r=16, lora_alpha=32
lora_dropout=0.1 — overfitting guard
Target: attention layers (style & tone)
13M trainable out of 1B total. Less than 1.3% — yet behavior changes completely.
CELL 9 — TRAINING
Step 8: Watch the loss. It tells you everything.
HEALTHY RANGE
Final loss: 0.913 ✓
Target: 1.0 – 1.5
Steps: 60 (max_steps=60)
RED FLAGS
↗ Loss going UP — unstable�→ Loss flat — not learning�↘ Loss near 0 — overfitting
Lower is NOT always better with small datasets. A final loss near 0 means the model memorized — not learned.
THE PROOF
Same model. Same question. 38 examples. 20 minutes. Free GPU.
BASE GEMMA 3
Vintage denim jacket. Women's M. Lee brand. Faded wash. Pearl snap buttons. 1980s. Good condition.
"Okay! Let's do this! Here's a draft product description we can tweak — aiming for engaging, informative, and highlighting the key details you provided:
**Product Title:** Vintage Denim Jacket – Classic 1st Printing (M.Lee)
**(Image of the jacket here!)**
Step back in time with this stunning vintage denim jacket..."
FINE-TUNED MODEL ✓
Vintage denim jacket. Women's M. Lee brand. Faded wash. Pearl snap buttons. 1980s. Good condition.
"Pure 1980s character in this vintage Lee denim jacket. Women's Medium. The faded wash is authentic — earned, not manufactured — and the pearl snap buttons are all present and working perfectly. The kind of jacket that looks better the more you wear it."
TEACHING MOMENT — THIS IS WHAT ACTUALLY HAPPENED
The real output. And why it matters.
What went wrong:
The model memorized 38 examples instead of learning the pattern. Loss dropped too fast — 4.1 → 0.13 in 60 steps.
This is overfitting. The infrastructure works. The model does not.
The fix: more examples, better variety, tuned hyperparameters.
"The pipeline doesn't change. The data does. This is not a failure — this is the most important teaching moment in the talk."
CELL 11 — SAVE THE ADAPTER
You saved the intelligence, not the whole brain.
~5–10 MB
LoRA adapter saved
700+ MB
Full base model
1.28%
Of parameters trained
Seconds
To reload vs hours to retrain
Load Gemma fresh → snap adapter on top → back to your specialist model. No retraining.
TEACHING PRACTICE
Documentation as a discipline
Every cell is a documentation opportunity.
1
What data was loaded? What version? How many examples?
2
Who trained it and when? What was the final loss?
3
What hyperparameters were used? r=16, alpha=32, steps=25?
4
What changed between v1 and v2 of the training data?
Save to GitHub → version history + reproducibility + portfolio in one step.
THE EDUCATOR CASE
Why This Belongs in the Curriculum
This is not a computer science argument. This is a workforce readiness argument.
In rural America, the gap between using technology and understanding it isn't just an academic gap. It's an economic one.
The question is not "should business students learn this?"
The question is: "what happens to the ones who don't?"
WHY THIS MATTERS BEYOND THE CLASSROOM
THE INFRASTRUCTURE
QUESTION
The data center buildout consuming rural land, water, and power grids is falling hardest on communities that did not ask for it.
That is a real cost. It deserves to be named.
THE EFFICIENT PATH
QLoRA and PEFT are the alternative. Less compute. Less energy. Less pressure on the land.
Teaching this approach is not separate from caring about rural communities. It is part of it.
That person can come from Appalachian Ohio or rural Tennessee — not just from San Francisco.
The most powerful AI isn't the biggest one.
It does not need a warehouse to run. It does not need a river to cool it.
It needs good data, clear purpose, and someone who understands how to build it.
That person can come from Appalachian Ohio or rural Tennessee.
prettynerdydigitalmarketing.com · online-thrift.com
FOR DEVELOPERS
LinkTree link as resources
FOR EDUCATORS
Put Colab in your next course
FOR THRIFT STORES
online-thrift.com · beta partners wanted
prettynerdydigitalmarketing.com · online-thrift.com
https://linktr.ee/penwellcodestock
Thank you!