Mark Hinkle
Expert in AI and Emerging Technologies
Founder and Publisher
The Artificially Intelligent Enterprise
Co-founder
All Things AI (formerly All Things Open AI)
LinkedIn – https://www.linkedin.com/in/markrhinkle/
X – x.com/mrhinkle
Email – mrhinkle@peripety.com
A Legacy of Emerging Tech
The AIE Network (TheAIE.net)
We Help People Work Smarter, Not Harder with AI
The Artificially Intelligent Enterprise (AIE) Network
is a collection of AI-focused newsletters and resources connecting business leaders and professionals with actionable insights and tools to maximize the impact of AI in their organizations.
50,000 Subscribers - AI tips and applications for marketing and sales professionals.
155,000 - Curated AI business news
in four minutes, two times per week.
Foundation Training program for AI.
1,000 - Letters to the CIO on AI from
an enterprise software veteran.
21,000 Subscribers - AI strategy
and productivity advice for enterprises.
24
An AI practitioners and users conference focused on technologies, processes, and people
25
My AI Event Hack
How AI Works: Understanding the underlying mechanisms that power AI tools.
AI Models for NLP and Vision: Examples include NLP models like Meta’s LLaMA, Google Gemini, and Anthropic’s Claude, alongside vision models such as DALL-E, Midjourney, Stable Diffusion, and others.
Importance of Understanding Large Language Models (LLMs): Knowing how these models work is crucial for accurate results.
Key Considerations for Using AI: Privacy, ethics, and practical considerations for integrating AI into your workflow.
I am going to cram 8 hours of content into one, that’s why it’s a “Crash Course”
How AI Works
Data Collection
The foundation of AI
AI systems rely on large amounts of data to learn and make decisions. This data can come from structured sources (e.g., databases) or unstructured ones (e.g., text, images, video). High-quality and diverse datasets are critical for accuracy and performance.
This is why OpenAI and Google buy data from the New York Times and Reddit, to train their models.
Data Processing
Building the knowledge base for AI
Before AI can use data, it must be cleaned, organized, and converted into usable formats. Tasks include handling missing information, removing outliers, and translating text or images into numbers that algorithms can process and potentially labeling that data.
Data Processing
Processing and Labeling Data
Tool Name | Type | Key Functions | Notable Features / Use Cases |
Open Source | Data labeling for text, audio, images, video | Extensible with Python SDK; integrates with ML pipelines | |
Open Source | Programmatic data labeling, weak supervision | Label data without manual effort using weak supervision | |
Open Source | Data wrangling, cleaning, missing value imputation | Standard for structured data preprocessing in Python | |
Open Source | Data cleaning, transformation, reconciliation | Ideal for cleaning messy, irregular tabular data | |
Open Source | Automated feature engineering | Entity-aware feature generation from relational datasets | |
Open Source | Detect and correct label errors in datasets | Integrates with common ML stacks (PyTorch, TensorFlow) | |
Proprietary | Data labeling, QA, embedding exploration | AI-augmented human-in-the-loop annotation environment | |
Proprietary | Scalable data labeling with active learning | Combines machine assistance and human review | |
Proprietary | Data prep, visual pipelines, auto-cleaning | Supports full ML lifecycle, collaboration |
Vector Databases
Where you store your data
A vector database is a specialized system designed to store, index, and retrieve high-dimensional vectors—numerical representations of data that capture semantic information. Unlike traditional databases that handle structured data in rows and columns, vector databases manage unstructured data such as text, images, audio, and video by converting them into vector embeddings.
Vector Databases
Where you store your data
Tool Name | Type | Key Functions | Notable Features / Use Cases |
Open Source | Vector similarity search, hybrid queries | Highly scalable, supports billion-scale vector indexing | |
Open Source | Semantic search, hybrid vector search | Built-in modules for transformers, RESTful and GraphQL APIs | |
Open Source | Nearest neighbor search, vector storage | High-performance, filtering, persistent storage engine | |
Open Source | Efficient similarity search | Optimized for large-scale datasets, developed by Meta | |
Open Source | In-memory vector DB, embeddings management | Lightweight, integrates easily with LangChain and LLM pipelines | |
Proprietary | Fully managed vector database | Real-time indexing and filtering with high availability | |
Proprietary | Vector search integration with document store | Combines full-text, metadata, and vector search in one platform |
Machine Learning
Teaching computers to learn from data and improve over time
Machine Learning (ML): ML enables computers to learn patterns from data and improve their performance on tasks without explicit programming. It relies on statistical techniques to refine predictions or decisions over time.
Deep Learning
Neural networks powering advanced pattern recognition and automation
A specialized form of ML that uses multi-layered neural networks to process and analyze vast amounts of data, excelling in tasks like image recognition and natural language processing.
Machine Learning Models
Algorithms that learn from data
AI uses algorithms to identify patterns in data.
These models fall into three categories:
Machine Learning Models
Algorithms that learn from data
Tool Name | Type | Key Functions | Notable Features / Use Cases |
Open Source | Deep learning model training and deployment | Dynamic computation graph, strong support for research and production | |
Open Source | End-to-end machine learning platform | Graph-based execution, TensorBoard visualization, TF Lite for mobile | |
Open Source | High-performance numerical computing and autodiff | Composability, XLA compilation, popular in research and experimentation | |
Open Source | Pretrained transformer models for NLP, vision, and multimodal tasks | Easy fine-tuning, large model zoo, integrates with PyTorch and TensorFlow | |
Open Source | High-level wrapper for PyTorch | Simplifies training, great for fast prototyping and education | |
Open Source | Modular deep learning framework | Decouples engineering from research, scales PyTorch to production | |
Proprietary | Managed service for training and deploying ML models | Distributed training, Autopilot for AutoML, integration with other AWS services | |
Proprietary | Unified ML platform for training and serving | Supports custom training, AutoML, model monitoring and pipelines | |
Proprietary | Enterprise platform for ML lifecycle | Automated training, MLOps tools, integrates with Azure ecosystem |
Generative AI Models
Creating information with AI
Generative AI models are a class of artificial intelligence designed to create new data, including text, images, audio, and even code. Unlike traditional AI models that classify or predict based on existing patterns, generative AI learns underlying structures from large datasets and produces original outputs that mimic human creativity.
These models power applications such as chatbots, AI-generated art, synthetic voices, and automated content creation.
DeepSeek Hype
The open source model that set the world on fire
AI Models that are “Open Source”
Popular open source AI LLM Models
Model | Developer | Parameters | License | Notable Features | Performance Benchmarks | Deployment Options |
LLaMA 3 | Meta | 7B – 65B | Custom (research-focused) | High efficiency; research-oriented | High performance in multilingual tasks | Research platforms |
Mistral | Mistral AI | 7B – 13B | Apache 2.0 | Optimized for efficiency; modular design | Competitive performance with reduced resource usage | Cloud, edge devices |
Falcon | Technology Innovation Institute | 1B – 40B | Apache 2.0 | Emphasis on Arabic language support; versatile | Strong performance in multilingual benchmarks | Cloud, on-premises |
Bloom | BigScience | 176B | RAIL (Responsible AI License) | Multilingual capabilities; collaborative development | High performance in diverse languages | Cloud, research |
Granite | IBM | Varies | Custom (enterprise-friendly) | Enterprise-grade; focus on compliance and security | Excels in enterprise NLP tasks | On-premises, hybrid cloud |
DeepSeek V3 | DeepSeek | 671B | MIT License | Advanced reasoning; cost-effective training | Outperforms LLaMA 3.1 and Qwen 2.5; approaches GPT-4o and Claude 3.5 Sonnet | On-premises, cloud |
Neural Networks
Mimicking the human brain
Neural networks are layered structures of interconnected nodes that simulate how the human brain processes information. They enable AI to handle complex tasks like image recognition, natural language understanding, and decision-making.
Though ironically we don’t really now how the human brain works.
Training
Teaching the AI Model
Training involves feeding data into the model and adjusting its internal parameters to reduce errors. Using techniques like gradient descent, the model learns from the data to improve its predictions.�
Distillation
One model training another model e.g. OpenAI and DeepSeek
Distillation is a technique used to transfer knowledge from a larger, more complex machine learning model to a smaller, more efficient version. By leveraging the predictions of a larger "teacher" model, the smaller "student" model is trained to mimic its behavior, maintaining high performance while reducing the computational cost. This approach is essential for deploying AI models in resource-constrained environments, enabling faster inference speeds and lower memory requirements without sacrificing accuracy. Distillation is particularly beneficial in real-time applications and mobile devices, where computational resources are limited but model efficiency remains critical.
Inference
Applying What AI Has Learned,
After training, the model is ready to make predictions or decisions on new, unseen data.
This stage is where AI moves from the lab to solving real-world problems.
This is when the model “thinks” or calculates what the response should be.
Right now we use longer thinking cycles to improve AI capability.
Test Time Compute
How Long a Model “Thinks” About a Query
Test-time compute refers to the computational resources and processes required to run a trained AI model during inference—i.e., when the model is used to make predictions or generate outputs after training is complete. This includes CPU/GPU usage, memory requirements, and energy consumption during deployment.
Tokens
How ChatGPT Converts Language to Digestible Units
ChatGPT doesn’t read language it converts it into tokens. These tokens are converted into numerical representations (embeddings) for processing by the model.
Every interaction with ChatGPT consumes tokens, and a maximum number of tokens can be used in one conversation.
Token ~= 4 chars in English. 1 token ~= ¾ words. 100 tokens ~= 75 words.
Context Windows
How ChatGPT converts language to digestible Units
A context window is the amount of text, measured in tokens, that a language model like ChatGPT can process at once, including both input and output. For example, a 4,096-token window limits the total size of the conversation and response. Larger context windows, such as GPT-4’s 32,768 tokens, allow for handling longer documents or conversations without losing coherence. Managing the context window ensures efficient and accurate interactions.
Context Window
Previous Tokens
Last Tokens
0
∞
Feedback Loops
A feedback loop is the process of feeding AI system outputs back into the model for evaluation, correction, and improvement. It allows models to adapt to real-world changes, learn from mistakes, and optimize their performance over time.
AI Agents
Autonomous systems that sense, decide, and act in dynamic environments.
AI agents perceive their environment, make decisions, and take actions autonomously, often adapting to new information to optimize outcomes in dynamic settings.
Today the really are more like smart workflows with the intelligence of a 4-year old.
AI Agent Frameworks
Tool Name | Type | Key Functions | Notable Features / Use Cases |
Open Source | Collaborative AI agent orchestration | Multi-agent workflows, Python-native, supports LLM chaining | |
Open Source | Build and manage AI agents integrating LLMs, APIs, tools | Customizable agent behavior, workflow automation | |
Open Source | Agent-based workflow management | Enterprise-ready, supports IBM Granite, Llama 3.x integration | |
Open Source | Multi-agent conversation and task execution | Built by Microsoft, easy to chain agents with tools and memory | |
Open Source | Graph-based framework for LLM agent state management | Integrates with LangChain, supports parallel and dynamic routing | |
Open Source | Serverless AI agent framework | Cloud-native, plug-and-play tools, persistent memory support |
Autonomous systems that sense, decide, and act in dynamic environments.
Enterprise GPT
AI infrastructure that provides complete control by the enterprise user.
Enterprise GPT refers to a customized large language model (LLM) deployment tailored for business use, offering secure, private, and role-specific generative AI capabilities. It integrates with enterprise data, tools, and workflows to support tasks like content generation, summarization, customer support, and analytics. Unlike public GPT, it prioritizes compliance, data governance, and scalability across organizational environments.
Enterprise GPT
A ChatGPT like index that you build like ChatGPT
Component Category | Opinionated Choice | Role in Stack | Why this Choice? |
1. RAG Framework | LlamaIndex | Orchestrates data loading, indexing, retrieval from Vector DB, and interaction with the LLM for generation. | Highly focused specifically on RAG, robust data handling/indexing features, strong community, integrates well with Python ecosystem. |
2. Vector Database | Milvus | Stores and indexes vector embeddings of enterprise data for efficient similarity search and retrieval. | Scalable, performant, cloud-native, dedicated open-source vector DB. Widely adopted for RAG. Integrates smoothly with LlamaIndex. |
3. LLM (Base Model) | IBM Granite 3.2 8B Instruct (ibm-granite/granite-3.2-8b-instruct) | Understands user queries and generates accurate, context-aware responses based on retrieved data. | Open source (Apache 2.0), designed for enterprise, strong RAG/instruction following performance for its size, supports multiple languages, large context window. |
4. Chatbot UI (Frontend) | Chatbot UI | Provides the web-based user interface for interacting with the RAG chatbot. | Popular open-source ChatGPT-like UI, customizable, familiar interface, connects to various LLM backends via API. |
5. Embedding Model | BGE-Large-EN-v1.5 (or similar) | Converts text data (documents, queries) into vector embeddings for storage and search in Milvus. | Top-performing open-source text embedding model (check MTEB leaderboard), runs locally for privacy, integrates easily. |
6. Vector DB Management | Attu | Provides a GUI for managing, monitoring, and interacting directly with the Milvus database instance. | Official GUI tool for Milvus, essential for administration and debugging. |
7. Deployment | Docker / Kubernetes (w/ LlamaIndex backend & LLM/Embedding inference via Ollama/vLLM) | Containerizes services; K8s manages deployment/scaling. Backend service exposes API for Chatbot UI. Inference servers like Ollama/vLLM serve models efficiently. | Standard enterprise deployment. Backend needed to connect UI to RAG logic. Ollama/vLLM confirmed to support open-source Granite models. |
Have Questions?
Let’s Talk!
Mark Hinkle
CEO and Founder
Phone: 919.522.3520
Email: mrhinkle@peripety.com