1 of 32

Building Your Own

Hyper-Specialized LLM

Google Colab · Gemma 3 · QLoRA

Tasha Penwell, MISM

Assistant Professor of Instruction — Analytics & Information Systems

Ohio University College of Business · Doctoral Student, Instructional Technology

CODESTOCK · ROOM 300A · THURSDAY 3:00 PM

prettynerdydigitalmarketing.com · online-thrift.com

https://linktr.ee/penwellcodestock

2 of 32

Before we start

I am not an AI researcher.

I am an educator at a College of Business in rural Ohio. I am building a platform for thrift stores. What I am going to show you today I learned by doing it — in the same free browser tab I am going to hand to my students.

That is not a disclaimer. That is the point.

If I can build this — so can students and other young professionals.

Ohio University College of Business · Doctoral Student, Instructional Technology · prettynerdydigitalmarketing.com

3 of 32

TERMINOLOGY — WE WILL USE THESE WORDS TODAY

Keep this in mind as we go

You do not need to memorize these now.

Parameters / Weights the numbers inside a model that encode learned patterns

Loss how wrong the model's predictions are — should decrease during training

Tokens word fragments — the unit of text a model processes

Gradients the signal telling the model which direction to adjust its weights

Fine-Tuning retraining a model to specialize it for a specific domain

Overfitting memorizing training data instead of learning generalizable patterns

PEFT Parameter-Efficient Fine-Tuning — train a tiny fraction, freeze the rest

Quantization compressing model precision (16-bit → 4-bit) to reduce memory

LoRA Low-Rank Adaptation — the adapter method that makes PEFT practical

QLoRA Quantized LoRA — compresses the model to 4-bit so it runs on free hardware

4 of 32

THE TOOLS THAT MAKE THIS POSSIBLE

PEFT and QLoRA

Two techniques that collapsed the cost of fine-tuning from enterprise hardware to a free browser tab.

PEFT

Parameter-Efficient Fine-Tuning

Freeze most of the model. Train only a small set of new adapter layers on top.

The base model already knows language. You are only teaching it your domain.

Result: same powerful foundation, new specialized behavior.

QLoRA

Quantized Low-Rank Adaptation

PEFT + compression. The base model is squeezed from 16-bit to 4-bit precision before the adapters are trained.

Before QLoRA: fine-tuning a 1B model required a $10k–$50k GPU cluster.

After QLoRA: a free Colab browser tab.

Dettmers et al. (2023) — QLoRA: Efficient Finetuning of Quantized LLMs — University of Washington

5 of 32

THE HOOK

Two Versions of the Same Problem

IN THE CLASSROOM

Students learn to USE AI.

Very few learn to BUILD it.

Fear of breaking things. Fear of cloud costs. Fear of needing a $10k GPU cluster.

A College of Business — not CS — is pushing past that.

IN THE WORKPLACE

General AI knows everything about everywhere.

A fine-tuned model knows everything about

here.

Your data. Your voice. Your domain.

Permanently.

6 of 32

THE FRAMEWORK

Three questions. Same logic.

The Question

Use the simpler tool when...

Go deeper when...

CODE or USE AI?

Task varies, speed > consistency, good enough works

Consistent behavior, proprietary logic, ownership matters

PROMPT or FINE-TUNE?

One-off task, fast setup, low stakes output

Hundreds of outputs, domain data, brittle prompt results

GENERAL or SPECIALIST?

Broad coverage needed, no proprietary data

Deep local knowledge — your data is your moat

All three questions ask: when does a tool stop being sufficient — and when do you need to go deeper?

7 of 32

AI BASICS

AI: What's Happening Inside

A Large Language Model (LLM) is like a massive network of interconnected number tables (weights) that work together to transform text and predict what comes next.

Not a database. Not a search engine. A probabilistic pattern matcher.

Parameters / Weights the numbers inside the model that encode learned patterns

Tokens word fragments — the unit of text a model processes

Base Model the pre-trained foundation — Gemma 3 1B has ~1,000,000,000 parameters

Students who understand this stop treating the tool like a person and start using it strategically.

8 of 32

KEY CONCEPT

Prompting vs Fine-Tuning

PROMPTING

Fast setup, no training�Works for one-off tasks�Good enough for broad queries��Memory is platform-managed and clearable�Domain knowledge re-established each time�Behavior varies when memory is incomplete

FINE-TUNING ✓

Domain knowledge is inside the weights�Consistent across hundreds of outputs�Works even without prompting memory��Requires training data and setup time�Not right for every task or every team�

Note: platforms like ChatGPT and Claude do offer memory features — but that memory lives outside the model. Fine-tuning puts the knowledge inside it.

9 of 32

KEY CONCEPT

Your data is the moat.

The most valuable ingredient is not the model. It's the data nobody else has.

Exclusive Data Moat

Their inventory history, pricing patterns, tone. No general model has this. It is the only thing that cannot be replicated.

Consistency Moat

A fine-tuned model produces consistent output at scale. No two prompts behave differently.

Network Effect Moat

More stores → better patterns → better model → more stores. Data compounds over time.

Gemma is freely available to everyone. Your data is what nobody else has.

10 of 32

THE REAL-WORLD USE CASE

online-thrift.com

THE PROBLEM

Small thrift stores have real inventory, real community value, and real customers.

What many lack: an online presence. Not for lack of wanting — but listing products at scale requires writing fluency that a solo operator simply doesn't have bandwidth for.

THE AI LAYER

The fine-tuning pipeline you are about to see is what I am building for this platform.

The Chrome extension card in the demo is a working prototype of the end product — the extension itself has not been built yet.

The pipeline is real. The extension is next.

Live in beta · actively recruiting test partners · If you know a thrift store, talk to me after.

11 of 32

THE CLASSROOM TOOL

Google Colab: Consequence-Free Learning

No installation

No local setup, no broken environments, no IT tickets

No hardware cost

A free T4 GPU with 15GB Video RAM (VRAM) — in a browser tab

No fear

Students can run this, crash it, restart — zero cost, zero consequence

Portfolio-ready

Save to GitHub → version history + reproducibility + shareable link

In rural America, the gap between using technology and understanding it isn't just an academic gap. It's an economic one.

12 of 32

TRAINING DATA

Input → Output pairs

GIGO

Garbage In

Garbage Out

38 pairs across 8 categories.

Stored in thrift_training_data.json

Separate from the notebook — swap any store's data without changing the code.

You don't need to read this. The model does.

13 of 32

CELL 1–3 — ENVIRONMENT SETUP

Step 1: Install, Authenticate, Check GPU

Install unsloth, transformers, peft, trl, bitsandbytes

Red warnings are harmless — check for ✅ at the bottom

Authenticate with Hugging Face token

Confirm NVIDIA Tesla T4 GPU with 15.6 GB VRAM

Teaching note: Runtime → Change Runtime Type → T4 GPU. Do this first, every time.

14 of 32

15 of 32

CELL 2 — AUTHENTICATION

Step 2: Hugging Face Login

Hugging Face = GitHub for AI models

Gemma is gated — accept terms first

Token = read-only API key

huggingface.co/settings/tokens

Modern LLM development depends on shared model hubs, not training from scratch.

16 of 32

CELL 3 — GPU CHECK

Step 3: Confirm GPU + VRAM

Tesla T4 confirmed: 15.6 GB VRAM

Gemma 3 1B in 4-bit needs ~4 GB

We have plenty of room for training

VRAM = the critical training bottleneck

VRAM is where model weights, gradients, and activations all have to fit simultaneously.

17 of 32

CELL 4–5 — TRAINING DATA

Step 4: Upload data & format prompts

Upload thrift_training_data.json

38 input/output pairs validated

4 categories auto-detected

Formatted into Gemma chat template

Separate data file = swap any store's inventory without changing the notebook.

18 of 32

CELL 6 — LOAD MODEL

Step 5: Load Gemma 3 1B in 4-bit

1,000,000,000 parameters loaded

NF4 quantization: 16-bit → 4-bit

Only ~0.5 GB VRAM for model weights

Math still done in BFloat16 for stability

Like unzipping a file, doing the work, then re-zipping. Quality loss is real but small.

19 of 32

CELL 7 — BASELINE

Step 6: Base model output (before training)

Verbose, over-formatted output

Asks for image placeholder

Markdown asterisks everywhere

Sounds like a chatbot, not a thrift store

Save this output — you'll compare it after fine-tuning. That difference IS the entire value proposition.

20 of 32

CELL 8 — LORA CONFIG

Step 7: Configure LoRA Adapters

trainable%: 1.2879 ← the PEFT proof

r=16, lora_alpha=32

  • rank of adapter matrices
  • 16 is a standard test task.
  • scaling factor usually at 2x the rank

lora_dropout=0.1 — overfitting guard

Target: attention layers (style & tone)

13M trainable out of 1B total. Less than 1.3% — yet behavior changes completely.

21 of 32

CELL 9 — TRAINING

Step 8: Watch the loss. It tells you everything.

HEALTHY RANGE

Final loss: 0.913 ✓

Target: 1.0 – 1.5

Steps: 60 (max_steps=60)

RED FLAGS

↗ Loss going UP — unstable�→ Loss flat — not learning�↘ Loss near 0 — overfitting

Lower is NOT always better with small datasets. A final loss near 0 means the model memorized — not learned.

22 of 32

THE PROOF

Same model. Same question. 38 examples. 20 minutes. Free GPU.

BASE GEMMA 3

Vintage denim jacket. Women's M. Lee brand. Faded wash. Pearl snap buttons. 1980s. Good condition.

"Okay! Let's do this! Here's a draft product description we can tweak — aiming for engaging, informative, and highlighting the key details you provided:

**Product Title:** Vintage Denim Jacket – Classic 1st Printing (M.Lee)

**(Image of the jacket here!)**

Step back in time with this stunning vintage denim jacket..."

FINE-TUNED MODEL ✓

Vintage denim jacket. Women's M. Lee brand. Faded wash. Pearl snap buttons. 1980s. Good condition.

"Pure 1980s character in this vintage Lee denim jacket. Women's Medium. The faded wash is authentic — earned, not manufactured — and the pearl snap buttons are all present and working perfectly. The kind of jacket that looks better the more you wear it."

23 of 32

TEACHING MOMENT — THIS IS WHAT ACTUALLY HAPPENED

The real output. And why it matters.

What went wrong:

The model memorized 38 examples instead of learning the pattern. Loss dropped too fast — 4.1 → 0.13 in 60 steps.

This is overfitting. The infrastructure works. The model does not.

The fix: more examples, better variety, tuned hyperparameters.

"The pipeline doesn't change. The data does. This is not a failure — this is the most important teaching moment in the talk."

24 of 32

25 of 32

26 of 32

CELL 11 — SAVE THE ADAPTER

You saved the intelligence, not the whole brain.

~5–10 MB

LoRA adapter saved

700+ MB

Full base model

1.28%

Of parameters trained

Seconds

To reload vs hours to retrain

Load Gemma fresh → snap adapter on top → back to your specialist model. No retraining.

27 of 32

28 of 32

TEACHING PRACTICE

Documentation as a discipline

Every cell is a documentation opportunity.

1

What data was loaded? What version? How many examples?

2

Who trained it and when? What was the final loss?

3

What hyperparameters were used? r=16, alpha=32, steps=25?

4

What changed between v1 and v2 of the training data?

Save to GitHub → version history + reproducibility + portfolio in one step.

29 of 32

THE EDUCATOR CASE

Why This Belongs in the Curriculum

This is not a computer science argument. This is a workforce readiness argument.

In rural America, the gap between using technology and understanding it isn't just an academic gap. It's an economic one.

The question is not "should business students learn this?"

The question is: "what happens to the ones who don't?"

30 of 32

WHY THIS MATTERS BEYOND THE CLASSROOM

THE INFRASTRUCTURE

QUESTION

The data center buildout consuming rural land, water, and power grids is falling hardest on communities that did not ask for it.

That is a real cost. It deserves to be named.

THE EFFICIENT PATH

QLoRA and PEFT are the alternative. Less compute. Less energy. Less pressure on the land.

Teaching this approach is not separate from caring about rural communities. It is part of it.

That person can come from Appalachian Ohio or rural Tennessee — not just from San Francisco.

31 of 32

The most powerful AI isn't the biggest one.

It does not need a warehouse to run. It does not need a river to cool it.

It needs good data, clear purpose, and someone who understands how to build it.

That person can come from Appalachian Ohio or rural Tennessee.

prettynerdydigitalmarketing.com · online-thrift.com

32 of 32

FOR DEVELOPERS

LinkTree link as resources

FOR EDUCATORS

Put Colab in your next course

FOR THRIFT STORES

online-thrift.com · beta partners wanted

prettynerdydigitalmarketing.com · online-thrift.com

https://linktr.ee/penwellcodestock

Thank you!