1 of 31

2 of 31

Mark Hinkle

Expert in AI and Emerging Technologies

Founder and Publisher

The Artificially Intelligent Enterprise

Co-founder

All Things AI (formerly All Things Open AI)

LinkedIn – https://www.linkedin.com/in/markrhinkle/

X – x.com/mrhinkle

Email – mrhinkle@peripety.com

A Legacy of Emerging Tech

3 of 31

The AIE Network (TheAIE.net)

We Help People Work Smarter, Not Harder with AI

The Artificially Intelligent Enterprise (AIE) Network

is a collection of AI-focused newsletters and resources connecting business leaders and professionals with actionable insights and tools to maximize the impact of AI in their organizations.

50,000 Subscribers - AI tips and applications for marketing and sales professionals.

155,000 - Curated AI business news

in four minutes, two times per week.

Foundation Training program for AI.

1,000 - Letters to the CIO on AI from

an enterprise software veteran.

21,000 Subscribers - AI strategy

and productivity advice for enterprises.

24

4 of 31

An AI practitioners and users conference focused on technologies, processes, and people

5 of 31

25

My AI Event Hack

  • Warning this isn’t an open source hack
  • Open ChatGPT mobile app on your phone
  • Start a Chat
  • Use the prompt, �This is a chat of what I am learning at SCALE 22X. I will type notes and when I say “Summarize this chat”. You will summarize all the notes and inputs from chat into a conference brief in an executive summary format.
  • Then take pictures, recordings with your phone if allowed (you are allowed in my talk)
  • Then add notes, even short choppy notes will likely work.
  • When finished adding data, type:�“Summarize this chat”

6 of 31

How AI Works: Understanding the underlying mechanisms that power AI tools.

AI Models for NLP and Vision: Examples include NLP models like Meta’s LLaMA, Google Gemini, and Anthropic’s Claude, alongside vision models such as DALL-E, Midjourney, Stable Diffusion, and others.

Importance of Understanding Large Language Models (LLMs): Knowing how these models work is crucial for accurate results.

Key Considerations for Using AI: Privacy, ethics, and practical considerations for integrating AI into your workflow.

I am going to cram 8 hours of content into one, that’s why it’s a “Crash Course”

How AI Works

7 of 31

Data Collection

The foundation of AI

AI systems rely on large amounts of data to learn and make decisions. This data can come from structured sources (e.g., databases) or unstructured ones (e.g., text, images, video). High-quality and diverse datasets are critical for accuracy and performance.

This is why OpenAI and Google buy data from the New York Times and Reddit, to train their models.

8 of 31

Data Processing

Building the knowledge base for AI

Before AI can use data, it must be cleaned, organized, and converted into usable formats. Tasks include handling missing information, removing outliers, and translating text or images into numbers that algorithms can process and potentially labeling that data.

9 of 31

Data Processing

Processing and Labeling Data

Tool Name

Type

Key Functions

Notable Features / Use Cases

Open Source

Data labeling for text, audio, images, video

Extensible with Python SDK; integrates with ML pipelines

Open Source

Programmatic data labeling, weak supervision

Label data without manual effort using weak supervision

Open Source

Data wrangling, cleaning, missing value imputation

Standard for structured data preprocessing in Python

Open Source

Data cleaning, transformation, reconciliation

Ideal for cleaning messy, irregular tabular data

Open Source

Automated feature engineering

Entity-aware feature generation from relational datasets

Open Source

Detect and correct label errors in datasets

Integrates with common ML stacks (PyTorch, TensorFlow)

Proprietary

Data labeling, QA, embedding exploration

AI-augmented human-in-the-loop annotation environment

Proprietary

Scalable data labeling with active learning

Combines machine assistance and human review

Proprietary

Data prep, visual pipelines, auto-cleaning

Supports full ML lifecycle, collaboration

10 of 31

Vector Databases

Where you store your data

A vector database is a specialized system designed to store, index, and retrieve high-dimensional vectors—numerical representations of data that capture semantic information. Unlike traditional databases that handle structured data in rows and columns, vector databases manage unstructured data such as text, images, audio, and video by converting them into vector embeddings. ​

11 of 31

Vector Databases

Where you store your data

Tool Name

Type

Key Functions

Notable Features / Use Cases

Open Source

Vector similarity search, hybrid queries

Highly scalable, supports billion-scale vector indexing

Open Source

Semantic search, hybrid vector search

Built-in modules for transformers, RESTful and GraphQL APIs

Open Source

Nearest neighbor search, vector storage

High-performance, filtering, persistent storage engine

Open Source

Efficient similarity search

Optimized for large-scale datasets, developed by Meta

Open Source

In-memory vector DB, embeddings management

Lightweight, integrates easily with LangChain and LLM pipelines

Proprietary

Fully managed vector database

Real-time indexing and filtering with high availability

Proprietary

Vector search integration with document store

Combines full-text, metadata, and vector search in one platform

12 of 31

Machine Learning

Teaching computers to learn from data and improve over time

Machine Learning (ML): ML enables computers to learn patterns from data and improve their performance on tasks without explicit programming. It relies on statistical techniques to refine predictions or decisions over time.

13 of 31

Deep Learning

Neural networks powering advanced pattern recognition and automation

A specialized form of ML that uses multi-layered neural networks to process and analyze vast amounts of data, excelling in tasks like image recognition and natural language processing.

14 of 31

Machine Learning Models

Algorithms that learn from data

AI uses algorithms to identify patterns in data.

These models fall into three categories:

  • Supervised Learning: Uses labeled data for predictions.
  • Unsupervised Learning: Finds hidden structures in unlabeled data.
  • Reinforcement Learning: Learns through trial and error, optimizing based on feedback.

15 of 31

Machine Learning Models

Algorithms that learn from data

Tool Name

Type

Key Functions

Notable Features / Use Cases

Open Source

Deep learning model training and deployment

Dynamic computation graph, strong support for research and production

Open Source

End-to-end machine learning platform

Graph-based execution, TensorBoard visualization, TF Lite for mobile

Open Source

High-performance numerical computing and autodiff

Composability, XLA compilation, popular in research and experimentation

Open Source

Pretrained transformer models for NLP, vision, and multimodal tasks

Easy fine-tuning, large model zoo, integrates with PyTorch and TensorFlow

Open Source

High-level wrapper for PyTorch

Simplifies training, great for fast prototyping and education

Open Source

Modular deep learning framework

Decouples engineering from research, scales PyTorch to production

Proprietary

Managed service for training and deploying ML models

Distributed training, Autopilot for AutoML, integration with other AWS services

Proprietary

Unified ML platform for training and serving

Supports custom training, AutoML, model monitoring and pipelines

Proprietary

Enterprise platform for ML lifecycle

Automated training, MLOps tools, integrates with Azure ecosystem

16 of 31

Generative AI Models

Creating information with AI

Generative AI models are a class of artificial intelligence designed to create new data, including text, images, audio, and even code. Unlike traditional AI models that classify or predict based on existing patterns, generative AI learns underlying structures from large datasets and produces original outputs that mimic human creativity.

These models power applications such as chatbots, AI-generated art, synthetic voices, and automated content creation.

17 of 31

DeepSeek Hype

The open source model that set the world on fire

  • Multi-Head Latent Attention: Imagine traditional AI attention as looking directly at words in a sentence. DeepSeek's approach is more like understanding the underlying concepts and connections between ideas rather than just the words themselves—like reading between the lines. This helps it grasp meaning more effectively.

  • Better Expert Management: DeepSeek uses a "committee of experts" approach (called Mixture-of-Experts or MoE) where different neural networks specialize in different tasks. Their innovation makes these experts work together more efficiently without needing complicated rules to balance their workload—like a self-organizing team that naturally distributes tasks without a manager.

  • Multi-Token Prediction: Instead of predicting one word at a time, DeepSeek can predict multiple words simultaneously. Think of it as the difference between a person who needs to complete each thought before starting the next versus someone who can see several steps ahead in a conversation.

  • FP8 Precision Training: This is about doing more with less. By using a more efficient way to store numbers in the model (8-bit instead of 16 or 32-bit), DeepSeek dramatically reduces memory usage and speeds up processing while maintaining quality—like compressing a high-resolution image without losing important details.

18 of 31

AI Models that are “Open Source”

Popular open source AI LLM Models

Model

Developer

Parameters

License

Notable Features

Performance Benchmarks

Deployment Options

LLaMA 3

Meta

7B – 65B

Custom (research-focused)

High efficiency; research-oriented

High performance in multilingual tasks

Research platforms

Mistral

Mistral AI

7B – 13B

Apache 2.0

Optimized for efficiency; modular design

Competitive performance with reduced resource usage

Cloud, edge devices

Falcon

Technology Innovation Institute

1B – 40B

Apache 2.0

Emphasis on Arabic language support; versatile

Strong performance in multilingual benchmarks

Cloud, on-premises

Bloom

BigScience

176B

RAIL (Responsible AI License)

Multilingual capabilities; collaborative development

High performance in diverse languages

Cloud, research

Granite

IBM

Varies

Custom (enterprise-friendly)

Enterprise-grade; focus on compliance and security

Excels in enterprise NLP tasks

On-premises, hybrid cloud

DeepSeek V3

DeepSeek

671B

MIT License

Advanced reasoning; cost-effective training

Outperforms LLaMA 3.1 and Qwen 2.5; approaches GPT-4o and Claude 3.5 Sonnet

On-premises, cloud

19 of 31

Neural Networks

Mimicking the human brain

Neural networks are layered structures of interconnected nodes that simulate how the human brain processes information. They enable AI to handle complex tasks like image recognition, natural language understanding, and decision-making.

Though ironically we don’t really now how the human brain works.

20 of 31

Training

Teaching the AI Model

Training involves feeding data into the model and adjusting its internal parameters to reduce errors. Using techniques like gradient descent, the model learns from the data to improve its predictions.�

  • Pretraining: Builds a general-purpose model with broad capabilities. This phase usually involves unsupervised or self-supervised learning with large datasets.�
  • Training (Fine-Tuning): Adapts the pretrained model to specific use cases using smaller, task-specific datasets. This often employs supervised learning.

21 of 31

Distillation

One model training another model e.g. OpenAI and DeepSeek

Distillation is a technique used to transfer knowledge from a larger, more complex machine learning model to a smaller, more efficient version. By leveraging the predictions of a larger "teacher" model, the smaller "student" model is trained to mimic its behavior, maintaining high performance while reducing the computational cost. This approach is essential for deploying AI models in resource-constrained environments, enabling faster inference speeds and lower memory requirements without sacrificing accuracy. Distillation is particularly beneficial in real-time applications and mobile devices, where computational resources are limited but model efficiency remains critical.

22 of 31

Inference

Applying What AI Has Learned,

After training, the model is ready to make predictions or decisions on new, unseen data.

This stage is where AI moves from the lab to solving real-world problems.

This is when the model “thinks” or calculates what the response should be.

Right now we use longer thinking cycles to improve AI capability.

23 of 31

Test Time Compute

How Long a Model “Thinks” About a Query

Test-time compute refers to the computational resources and processes required to run a trained AI model during inference—i.e., when the model is used to make predictions or generate outputs after training is complete. This includes CPU/GPU usage, memory requirements, and energy consumption during deployment.

24 of 31

Tokens

How ChatGPT Converts Language to Digestible Units

ChatGPT doesn’t read language it converts it into tokens. These tokens are converted into numerical representations (embeddings) for processing by the model.

Every interaction with ChatGPT consumes tokens, and a maximum number of tokens can be used in one conversation.

  • GPT-4 (8K): Has a token limit of 8,000 tokens 
  • GPT-4 (32K): Has a token limit of 32,000 tokens 
  • ChatGPT Enterprise: Can process up to 128,000 tokens 

Token ~= 4 chars in English. 1 token ~= ¾ words. 100 tokens ~= 75 words.

25 of 31

Context Windows

How ChatGPT converts language to digestible Units

A context window is the amount of text, measured in tokens, that a language model like ChatGPT can process at once, including both input and output. For example, a 4,096-token window limits the total size of the conversation and response. Larger context windows, such as GPT-4’s 32,768 tokens, allow for handling longer documents or conversations without losing coherence. Managing the context window ensures efficient and accurate interactions.

Context Window

Previous Tokens

Last Tokens

0

26 of 31

Feedback Loops

A feedback loop is the process of feeding AI system outputs back into the model for evaluation, correction, and improvement. It allows models to adapt to real-world changes, learn from mistakes, and optimize their performance over time.

  • Explicit Feedback�Users directly correct or validate outputs, such as marking an email as spam or rating a recommendation.

  • Implicit Feedback�Observing user behavior, such as clicks, scrolls, or time spent on content, provides indirect feedback for model adjustments.

  • System Metrics�Automated monitoring of model performance metrics (e.g., error rates, accuracy) helps detect weaknesses without user input.

  • Human-in-the-Loop (HITL)�Human reviewers actively validate or correct model decisions, ensuring quality and accuracy, especially in critical applications like content moderation.

27 of 31

AI Agents

Autonomous systems that sense, decide, and act in dynamic environments.

AI agents perceive their environment, make decisions, and take actions autonomously, often adapting to new information to optimize outcomes in dynamic settings.

Today the really are more like smart workflows with the intelligence of a 4-year old.

28 of 31

AI Agent Frameworks

Tool Name

Type

Key Functions

Notable Features / Use Cases

Open Source

Collaborative AI agent orchestration

Multi-agent workflows, Python-native, supports LLM chaining

Open Source

Build and manage AI agents integrating LLMs, APIs, tools

Customizable agent behavior, workflow automation

Open Source

Agent-based workflow management

Enterprise-ready, supports IBM Granite, Llama 3.x integration

Open Source

Multi-agent conversation and task execution

Built by Microsoft, easy to chain agents with tools and memory

Open Source

Graph-based framework for LLM agent state management

Integrates with LangChain, supports parallel and dynamic routing

Open Source

Serverless AI agent framework

Cloud-native, plug-and-play tools, persistent memory support

Autonomous systems that sense, decide, and act in dynamic environments.

29 of 31

Enterprise GPT

AI infrastructure that provides complete control by the enterprise user.

Enterprise GPT refers to a customized large language model (LLM) deployment tailored for business use, offering secure, private, and role-specific generative AI capabilities. It integrates with enterprise data, tools, and workflows to support tasks like content generation, summarization, customer support, and analytics. Unlike public GPT, it prioritizes compliance, data governance, and scalability across organizational environments.

30 of 31

Enterprise GPT

A ChatGPT like index that you build like ChatGPT

Component Category

Opinionated Choice

Role in Stack

Why this Choice?

1. RAG Framework

LlamaIndex

Orchestrates data loading, indexing, retrieval from Vector DB, and interaction with the LLM for generation.

Highly focused specifically on RAG, robust data handling/indexing features, strong community, integrates well with Python ecosystem.

2. Vector Database

Milvus

Stores and indexes vector embeddings of enterprise data for efficient similarity search and retrieval.

Scalable, performant, cloud-native, dedicated open-source vector DB. Widely adopted for RAG. Integrates smoothly with LlamaIndex.

3. LLM (Base Model)

IBM Granite 3.2 8B Instruct (ibm-granite/granite-3.2-8b-instruct)

Understands user queries and generates accurate, context-aware responses based on retrieved data.

Open source (Apache 2.0), designed for enterprise, strong RAG/instruction following performance for its size, supports multiple languages, large context window.

4. Chatbot UI (Frontend)

Chatbot UI

Provides the web-based user interface for interacting with the RAG chatbot.

Popular open-source ChatGPT-like UI, customizable, familiar interface, connects to various LLM backends via API.

5. Embedding Model

BGE-Large-EN-v1.5 (or similar)

Converts text data (documents, queries) into vector embeddings for storage and search in Milvus.

Top-performing open-source text embedding model (check MTEB leaderboard), runs locally for privacy, integrates easily.

6. Vector DB Management

Attu

Provides a GUI for managing, monitoring, and interacting directly with the Milvus database instance.

Official GUI tool for Milvus, essential for administration and debugging.

7. Deployment

Docker / Kubernetes (w/ LlamaIndex backend & LLM/Embedding inference via Ollama/vLLM)

Containerizes services; K8s manages deployment/scaling. Backend service exposes API for Chatbot UI. Inference servers like Ollama/vLLM serve models efficiently.

Standard enterprise deployment. Backend needed to connect UI to RAG logic. Ollama/vLLM confirmed to support open-source Granite models.

31 of 31

Have Questions?

Let’s Talk!

Mark Hinkle

CEO and Founder

Phone: 919.522.3520

Email: mrhinkle@peripety.com