7 of 31

Data Collection

The foundation of AI

AI systems rely on large amounts of data to learn and make decisions. This data can come from structured sources (e.g., databases) or unstructured ones (e.g., text, images, video). High-quality and diverse datasets are critical for accuracy and performance.

This is why OpenAI and Google buy data from the New York Times and Reddit, to train their models.

AI systems rely on vast datasets to improve accuracy, enhance reasoning, and deliver reliable outputs. The quality and diversity of these datasets directly impact the performance of AI models, driving companies to source data from premium providers, including publishers and online platforms.

Key Business Drivers

Data as a Competitive Advantage: AI models improve through exposure to high-quality, real-world data. Access to proprietary or curated datasets provides a significant edge.
Strategic Partnerships: Companies like OpenAI and Google license data from sources such as The New York Times and Reddit, leveraging high-quality content for model training.
Monetization of Digital Assets: Publishers and online communities recognize the value of their data, creating a new revenue stream through licensing agreements.

Market Implications

Rising Costs & Exclusivity: As AI adoption scales, the cost of data acquisition is increasing, with some companies seeking exclusive licensing deals.
Regulatory & Ethical Considerations: Data privacy laws, intellectual property rights, and ethical AI development shape how data is sourced, stored, and used.
Shift in Content Economics: The traditional internet model of free content is evolving, as AI-driven data demand incentivizes paywalls and licensing agreements.

The acquisition of high-quality training data is becoming a core strategy for AI development. Companies investing in proprietary or exclusive datasets will maintain a competitive edge, while content owners are increasingly capitalizing on the value of their digital assets. The intersection of AI, media, and data ownership will continue to redefine the digital economy.

8 of 31

Data Processing

Building the knowledge base for AI

Before AI can use data, it must be cleaned, organized, and converted into usable formats. Tasks include handling missing information, removing outliers, and translating text or images into numbers that algorithms can process and potentially labeling that data.

Before AI models can effectively utilize data, it must undergo a structured preprocessing phase. Raw data is often incomplete, unstructured, or inconsistent, requiring extensive cleaning, transformation, and organization. This process ensures the model receives high-quality, standardized inputs that improve accuracy and performance.

Key Stages in AI Training Data Preparation

1. Data Collection & Aggregation

Sources: Structured databases, text, images, videos, and sensor data.
Considerations: Data relevance, volume, and legal compliance (e.g., copyright, privacy laws).

2. Data Cleaning & Preprocessing

Handling Missing Data – Filling gaps using interpolation, statistical methods, or discarding incomplete records.
Removing Outliers – Detecting and eliminating anomalies that could skew model predictions.
Normalization & Standardization – Converting numerical data to consistent scales to prevent model bias.
Deduplication – Identifying and removing redundant data points.

3. Data Transformation

Encoding Text & Images – Converting words into numerical representations (e.g., word embeddings) and processing images into pixel-based formats.
Feature Engineering – Extracting meaningful attributes to enhance model learning.
Data Augmentation – Creating synthetic variations of existing data to expand datasets (e.g., rotating images, paraphrasing text).

4. Data Annotation & Labeling

Supervised learning models require labeled data (e.g., identifying objects in images, sentiment classification in text).
Human-in-the-loop (HITL) annotation or automated techniques ensure accuracy.

5. Data Splitting & Validation

Training Set – Used to train the model.
Validation Set – Helps tune hyperparameters and avoid overfitting.
Test Set – Evaluates final model performance before deployment.

Why This Matters

High-quality data reduces model errors and improves generalization to new inputs.
Poorly processed data can introduce biases and degrade model performance.
Automated and scalable data pipelines are critical for large-scale AI training.

AI models are only as good as the data they are trained on. Efficient data preprocessing ensures AI systems can learn effectively and make reliable predictions.

9 of 31

Data Processing

Processing and Labeling Data

Tool Name	Type	Key Functions	Notable Features / Use Cases
Label Studio	Open Source	Data labeling for text, audio, images, video	Extensible with Python SDK; integrates with ML pipelines
Snorkel	Open Source	Programmatic data labeling, weak supervision	Label data without manual effort using weak supervision
Pandas	Open Source	Data wrangling, cleaning, missing value imputation	Standard for structured data preprocessing in Python
OpenRefine	Open Source	Data cleaning, transformation, reconciliation	Ideal for cleaning messy, irregular tabular data
Featuretools	Open Source	Automated feature engineering	Entity-aware feature generation from relational datasets
Cleanlab	Open Source	Detect and correct label errors in datasets	Integrates with common ML stacks (PyTorch, TensorFlow)
Axion.ai	Proprietary	Data labeling, QA, embedding exploration	AI-augmented human-in-the-loop annotation environment
Amazon SageMaker Ground Truth	Proprietary	Scalable data labeling with active learning	Combines machine assistance and human review
Dataiku	Proprietary	Data prep, visual pipelines, auto-cleaning	Supports full ML lifecycle, collaboration

Key Stages in AI Training Data Preparation

1. Data Collection & Aggregation

Sources: Structured databases, text, images, videos, and sensor data.
Considerations: Data relevance, volume, and legal compliance (e.g., copyright, privacy laws).

2. Data Cleaning & Preprocessing

Handling Missing Data – Filling gaps using interpolation, statistical methods, or discarding incomplete records.
Removing Outliers – Detecting and eliminating anomalies that could skew model predictions.
Normalization & Standardization – Converting numerical data to consistent scales to prevent model bias.
Deduplication – Identifying and removing redundant data points.

3. Data Transformation

Encoding Text & Images – Converting words into numerical representations (e.g., word embeddings) and processing images into pixel-based formats.
Feature Engineering – Extracting meaningful attributes to enhance model learning.
Data Augmentation – Creating synthetic variations of existing data to expand datasets (e.g., rotating images, paraphrasing text).

4. Data Annotation & Labeling

Supervised learning models require labeled data (e.g., identifying objects in images, sentiment classification in text).
Human-in-the-loop (HITL) annotation or automated techniques ensure accuracy.

5. Data Splitting & Validation

Training Set – Used to train the model.
Validation Set – Helps tune hyperparameters and avoid overfitting.
Test Set – Evaluates final model performance before deployment.

Why This Matters

High-quality data reduces model errors and improves generalization to new inputs.
Poorly processed data can introduce biases and degrade model performance.
Automated and scalable data pipelines are critical for large-scale AI training.

AI models are only as good as the data they are trained on. Efficient data preprocessing ensures AI systems can learn effectively and make reliable predictions.

10 of 31

Vector Databases

Where you store your data

A vector database is a specialized system designed to store, index, and retrieve high-dimensional vectors—numerical representations of data that capture semantic information. Unlike traditional databases that handle structured data in rows and columns, vector databases manage unstructured data such as text, images, audio, and video by converting them into vector embeddings.

Key Characteristics:

High-Dimensional Data Management: Capable of handling vectors with hundreds to thousands of dimensions, enabling nuanced data representation. databricks.com
Similarity Search: Utilizes metrics like cosine similarity or Euclidean distance to find data points that are semantically similar to a given query. fr.wikipedia.org+2digitalocean.com+2developers.cloudflare.com+2
Scalability: Designed to efficiently process large-scale datasets, supporting rapid retrieval and analysis.

Applications:

Recommendation Systems: Suggests products or content based on user behavior and preferences.
Semantic Search: Enhances search engines to understand context and intent, delivering more relevant results.
Natural Language Processing (NLP): Supports tasks like sentiment analysis and language translation by understanding textual nuances. en.wikipedia.org+3arxiv.org+3arxiv.org+3
Image and Audio Recognition: Identifies patterns and features in multimedia data for classification and retrieval. oracle.com

Technical Considerations:

Indexing Techniques: Employs methods such as Hierarchical Navigable Small World (HNSW) graphs and locality-sensitive hashing to optimize similarity searches. en.wikipedia.org+1es.wikipedia.org+1
Data Compression: Utilizes quantization techniques to reduce storage requirements without significant loss of information.
Hybrid Query Support: Integrates vector search with traditional attribute-based queries for comprehensive data retrieval. oracle.com

In the context of artificial intelligence and machine learning, vector databases are essential for managing and querying the embeddings generated by models, facilitating tasks like semantic search and recommendation systems. v7labs.com

By efficiently handling high-dimensional vector data, vector databases enable advanced data analysis and retrieval, supporting applications that require understanding of semantic relationships within complex datasets.

11 of 31

Vector Databases

Where you store your data

Tool Name	Type	Key Functions	Notable Features / Use Cases
Milvus	Open Source	Vector similarity search, hybrid queries	Highly scalable, supports billion-scale vector indexing
Weaviate	Open Source	Semantic search, hybrid vector search	Built-in modules for transformers, RESTful and GraphQL APIs
Qdrant	Open Source	Nearest neighbor search, vector storage	High-performance, filtering, persistent storage engine
Faiss	Open Source	Efficient similarity search	Optimized for large-scale datasets, developed by Meta
Chroma	Open Source	In-memory vector DB, embeddings management	Lightweight, integrates easily with LangChain and LLM pipelines
Pinecone	Proprietary	Fully managed vector database	Real-time indexing and filtering with high availability
MongoDB Atlas	Proprietary	Vector search integration with document store	Combines full-text, metadata, and vector search in one platform

Key Characteristics:

High-Dimensional Data Management: Capable of handling vectors with hundreds to thousands of dimensions, enabling nuanced data representation. databricks.com
Similarity Search: Utilizes metrics like cosine similarity or Euclidean distance to find data points that are semantically similar to a given query. fr.wikipedia.org+2digitalocean.com+2developers.cloudflare.com+2
Scalability: Designed to efficiently process large-scale datasets, supporting rapid retrieval and analysis.

Applications:

Recommendation Systems: Suggests products or content based on user behavior and preferences.
Semantic Search: Enhances search engines to understand context and intent, delivering more relevant results.
Natural Language Processing (NLP): Supports tasks like sentiment analysis and language translation by understanding textual nuances. en.wikipedia.org+3arxiv.org+3arxiv.org+3
Image and Audio Recognition: Identifies patterns and features in multimedia data for classification and retrieval. oracle.com

Technical Considerations:

Indexing Techniques: Employs methods such as Hierarchical Navigable Small World (HNSW) graphs and locality-sensitive hashing to optimize similarity searches. en.wikipedia.org+1es.wikipedia.org+1
Data Compression: Utilizes quantization techniques to reduce storage requirements without significant loss of information.
Hybrid Query Support: Integrates vector search with traditional attribute-based queries for comprehensive data retrieval. oracle.com

12 of 31

Machine Learning

Teaching computers to learn from data and improve over time

Machine Learning (ML): ML enables computers to learn patterns from data and improve their performance on tasks without explicit programming. It relies on statistical techniques to refine predictions or decisions over time.

Machine Learning (ML) is a branch of artificial intelligence (AI) that enables computers to learn from data and improve performance on tasks without explicit

programming. Instead of following pre-defined rules, ML models identify patterns in data and refine their predictions or decisions over time using statistical

techniques.

How Machine Learning Works

Data Collection & Preprocessing

ML models require structured and clean datasets.
Data is labeled (supervised learning) or unstructured (unsupervised learning).

Model Selection & Training

Algorithms such as decision trees, neural networks, or support vector machines (SVM) are chosen based on the problem type.
Models train on historical data, adjusting weights and parameters to improve accuracy.

Evaluation & Optimization

Models are tested on unseen data to measure performance.
Techniques like hyperparameter tuning and feature engineering refine results.

Deployment & Continuous Learning

Once trained, ML models make real-time predictions or automate tasks.
Models can update dynamically using new data (e.g., recommendation systems).

Key Machine Learning Types

Supervised Learning: Uses labeled data to train models (e.g., fraud detection, image recognition).
Unsupervised Learning: Finds patterns in unlabeled data (e.g., customer segmentation, anomaly detection).
Reinforcement Learning: Models learn through trial and error (e.g., robotics, gaming strategies).

Business Applications

Predictive Analytics – Forecasting sales, customer behavior, or market trends.
Automation & AI Assistants – Chatbots, virtual assistants, and workflow automation.
Healthcare & Diagnostics – AI-driven medical imaging and predictive patient care.
Financial Modeling – Risk assessment, fraud detection, and algorithmic trading.

Conclusion

Machine Learning is a foundational AI technology transforming industries by enabling data-driven decision-making and automation. As models improve with more

data and computing power, ML continues to drive business efficiency and innovation.

13 of 31

Deep Learning

Neural networks powering advanced pattern recognition and automation

A specialized form of ML that uses multi-layered neural networks to process and analyze vast amounts of data, excelling in tasks like image recognition and natural language processing.

Overview: Deep Learning (DL)

Introduction

Deep Learning (DL) is a subset of Machine Learning (ML) that uses artificial neural networks (ANNs) with multiple layers—known as deep neural networks—to

learn complex patterns from large datasets. Unlike traditional ML, which often requires manual feature engineering, deep learning models automatically extract

and learn features, making them highly effective for tasks like image recognition, natural language processing, and autonomous decision-making.

How Deep Learning Works

Neural Network Architecture

Input Layer: Receives raw data (e.g., images, text, audio).
Hidden Layers: Multiple layers of artificial neurons process data through weighted connections.
Output Layer: Produces a final prediction (e.g., object classification, sentiment analysis).

Training Process

Forward Propagation: Data moves through the network, generating an output.
Loss Calculation: The model calculates the difference between its prediction and the actual value.
Backpropagation & Optimization: The network adjusts its weights using algorithms like gradient descent to improve accuracy.

Data Requirements & Computation

DL models require large-scale labeled datasets for training.
High-performance GPUs and TPUs accelerate computation for deep networks.

Key Deep Learning Architectures

Convolutional Neural Networks (CNNs): Optimized for image recognition and processing.
Recurrent Neural Networks (RNNs): Effective for sequential data like speech and time series.
Transformers: Used in NLP models like GPT and BERT, excelling in text comprehension.
Generative Adversarial Networks (GANs): Generate realistic images, videos, and synthetic data.

Applications of Deep Learning

Computer Vision: Facial recognition, medical imaging, autonomous vehicles.
Natural Language Processing (NLP): Chatbots, translation, sentiment analysis.
Speech & Audio Processing: Voice assistants, speech-to-text, audio generation.
Autonomous Systems: Robotics, self-driving cars, intelligent automation.
Generative AI: Image synthesis, deepfake detection, AI-generated content.

Conclusion

Deep Learning represents the next evolution in AI, enabling machines to process vast amounts of unstructured data with minimal human intervention. As

computing power and datasets grow, DL continues to drive breakthroughs in automation, creativity, and decision-making across industries.

14 of 31

Machine Learning Models

Algorithms that learn from data

AI uses algorithms to identify patterns in data.

These models fall into three categories:

Supervised Learning: Uses labeled data for predictions.
Unsupervised Learning: Finds hidden structures in unlabeled data.
Reinforcement Learning: Learns through trial and error, optimizing based on feedback.

15 of 31

Machine Learning Models

Algorithms that learn from data

Tool Name	Type	Key Functions	Notable Features / Use Cases
PyTorch	Open Source	Deep learning model training and deployment	Dynamic computation graph, strong support for research and production
TensorFlow	Open Source	End-to-end machine learning platform	Graph-based execution, TensorBoard visualization, TF Lite for mobile
JAX	Open Source	High-performance numerical computing and autodiff	Composability, XLA compilation, popular in research and experimentation
Hugging Face Transformers	Open Source	Pretrained transformer models for NLP, vision, and multimodal tasks	Easy fine-tuning, large model zoo, integrates with PyTorch and TensorFlow
FastAI	Open Source	High-level wrapper for PyTorch	Simplifies training, great for fast prototyping and education
Lightning AI (PyTorch Lightning)	Open Source	Modular deep learning framework	Decouples engineering from research, scales PyTorch to production
Amazon SageMaker	Proprietary	Managed service for training and deploying ML models	Distributed training, Autopilot for AutoML, integration with other AWS services
Google Vertex AI	Proprietary	Unified ML platform for training and serving	Supports custom training, AutoML, model monitoring and pipelines
Azure Machine Learning	Proprietary	Enterprise platform for ML lifecycle	Automated training, MLOps tools, integrates with Azure ecosystem

Let’s take a step back and clarify what a machine learning model actually is.

A machine learning model is a mathematical representation of a process or pattern in data. We “train” it by showing it examples—whether that’s images, text, or structured data—so it can learn to make predictions or decisions without being explicitly programmed.

Model training is where the model learns the patterns by adjusting its internal parameters based on how well it performs on training data. The goal is to minimize error, typically by using optimization algorithms like stochastic gradient descent.

To do this, we need specialized tools and platforms to:

Manage the computational load
Abstract away engineering complexity
Scale training jobs across hardware
Monitor and evaluate performance

That’s where the tools in this list come in.

Open Source Tools:

Tools like PyTorch, TensorFlow, and JAX provide the core functionality for defining and training neural networks. These are favored by researchers and developers for their flexibility and control.
Hugging Face Transformers makes it easy to fine-tune pre-trained models like BERT or GPT on custom datasets, drastically reducing training time and compute requirements.
Frameworks like FastAI and PyTorch Lightning help streamline model development and experimentation, offering high-level abstractions without sacrificing control.

Proprietary Platforms:

On the enterprise side, platforms like Amazon SageMaker, Google Vertex AI, and Azure Machine Learning manage the full ML lifecycle—from data prep and training to deployment and monitoring.
These tools are crucial for organizations that need to scale ML operations while maintaining governance, auditability, and integration with broader cloud ecosystems.

Summary Line for Delivery:

"These frameworks and platforms are the workhorses of AI development—transforming raw data into working models that power predictions, automations, and intelligent applications."

16 of 31

Generative AI Models

Creating information with AI

Generative AI models are a class of artificial intelligence designed to create new data, including text, images, audio, and even code. Unlike traditional AI models that classify or predict based on existing patterns, generative AI learns underlying structures from large datasets and produces original outputs that mimic human creativity.

These models power applications such as chatbots, AI-generated art, synthetic voices, and automated content creation.

17 of 31

DeepSeek Hype

The open source model that set the world on fire

Multi-Head Latent Attention: Imagine traditional AI attention as looking directly at words in a sentence. DeepSeek's approach is more like understanding the underlying concepts and connections between ideas rather than just the words themselves—like reading between the lines. This helps it grasp meaning more effectively.

Better Expert Management: DeepSeek uses a "committee of experts" approach (called Mixture-of-Experts or MoE) where different neural networks specialize in different tasks. Their innovation makes these experts work together more efficiently without needing complicated rules to balance their workload—like a self-organizing team that naturally distributes tasks without a manager.

Multi-Token Prediction: Instead of predicting one word at a time, DeepSeek can predict multiple words simultaneously. Think of it as the difference between a person who needs to complete each thought before starting the next versus someone who can see several steps ahead in a conversation.

FP8 Precision Training: This is about doing more with less. By using a more efficient way to store numbers in the model (8-bit instead of 16 or 32-bit), DeepSeek dramatically reduces memory usage and speeds up processing while maintaining quality—like compressing a high-resolution image without losing important details.

18 of 31

AI Models that are “Open Source”

Popular open source AI LLM Models

Model	Developer	Parameters	License	Notable Features	Performance Benchmarks	Deployment Options
LLaMA 3	Meta	7B – 65B	Custom (research-focused)	High efficiency; research-oriented	High performance in multilingual tasks	Research platforms
Mistral	Mistral AI	7B – 13B	Apache 2.0	Optimized for efficiency; modular design	Competitive performance with reduced resource usage	Cloud, edge devices
Falcon	Technology Innovation Institute	1B – 40B	Apache 2.0	Emphasis on Arabic language support; versatile	Strong performance in multilingual benchmarks	Cloud, on-premises
Bloom	BigScience	176B	RAIL (Responsible AI License)	Multilingual capabilities; collaborative development	High performance in diverse languages	Cloud, research
Granite	IBM	Varies	Custom (enterprise-friendly)	Enterprise-grade; focus on compliance and security	Excels in enterprise NLP tasks	On-premises, hybrid cloud
DeepSeek V3	DeepSeek	671B	MIT License	Advanced reasoning; cost-effective training	Outperforms LLaMA 3.1 and Qwen 2.5; approaches GPT-4o and Claude 3.5 Sonnet	On-premises, cloud

How Tokens Work in ChatGPT: An Overview

Tokens are the building blocks of language processing in ChatGPT. They help the model interpret, process, and generate text. Understanding how tokens function is crucial for optimizing ChatGPT usage in applications like AI-driven customer support, content creation, or technical problem-solving.

What Are Tokens?

Tokens are segments of text, which can be as short as one character or as long as one word, depending on the language and the structure of the text. In English:

Common words like "cat" or "dog" are often single tokens.
Rare or complex words like "encyclopedia" may split into multiple tokens.
Spaces and punctuation are also treated as tokens.

For instance:

Text: "AI is powerful."
Tokens: ["AI", " is", " powerful", "."]

2. Tokenization Process

Tokenization is how raw text is broken down into tokens:

The text is preprocessed by removing unnecessary characters.
The tokenizer (a language model component) splits the text into tokens based on predefined rules.
These tokens are converted into numerical representations (embeddings) for processing by the model.

3. Token Limits

ChatGPT operates within a token limit:

GPT-3.5: Up to 4,096 tokens.
GPT-4: Depending on the variant, 8,192 or 32,768 tokens.

This limit includes:

Input tokens (user's query).
Output tokens (ChatGPT’s response).

For example, if a user's query is 1,000 tokens, the response can use up to 3,096 tokens (in GPT-3.5).

4. Token Cost

When using ChatGPT in API applications, tokens are billed:

Input tokens: Tokens in the user's prompt.
Output tokens: Tokens generated by ChatGPT.

Monitoring token usage is essential for cost efficiency, especially for long or complex interactions.

5. Why Tokens Matter

Tokens influence:

Performance: Shorter prompts leave more room for the model’s response within the token limit.
Clarity: Clear and concise inputs reduce token wastage.
Cost: Fewer tokens equal lower expenses.

6. Strategies to Optimize Token Usage

Streamline Inputs: Remove unnecessary words or redundant details.
Use Context Sparingly: Only include background information relevant to the query.
Test Prompts: Experiment with phrasing to achieve concise yet effective communication.

Understanding how tokens work is key to using ChatGPT effectively, whether for personal projects or enterprise solutions. Efficient token usage enhances

response quality while managing costs.

19 of 31

Neural Networks

Mimicking the human brain

Neural networks are layered structures of interconnected nodes that simulate how the human brain processes information. They enable AI to handle complex tasks like image recognition, natural language understanding, and decision-making.

Though ironically we don’t really now how the human brain works.

20 of 31

Training

Teaching the AI Model

Training involves feeding data into the model and adjusting its internal parameters to reduce errors. Using techniques like gradient descent, the model learns from the data to improve its predictions.�

Pretraining: Builds a general-purpose model with broad capabilities. This phase usually involves unsupervised or self-supervised learning with large datasets.�
Training (Fine-Tuning): Adapts the pretrained model to specific use cases using smaller, task-specific datasets. This often employs supervised learning.

Pretraining: Building General Knowledge Before Training

Pretraining happens before task-specific training. It involves exposing the model to vast amounts of diverse, often unlabeled data to develop general capabilities.

Pretrained models serve as a foundation and can be fine-tuned for specific tasks.

Examples:

A large language model (LLM) like GPT is pretrained on billions of text examples from books, articles, and websites.
A vision model is pretrained on millions of images to recognize generic patterns like shapes, colors, and textures.

Pretraining saves time and resources for downstream tasks, as the model already possesses a foundational understanding that can be applied to new domains.

Relationship Between Pretraining and Training

Pretraining: Builds a general-purpose model with broad capabilities. This phase usually involves unsupervised or self-supervised learning with large datasets.
Training (Fine-Tuning): Adapts the pretrained model to specific use cases using smaller, task-specific datasets. This often employs supervised learning.

For example, GPT-4 was pretrained on general text data and then fine-tuned on high-quality datasets to improve accuracy in conversational AI tasks.

21 of 31

Distillation

One model training another model e.g. OpenAI and DeepSeek

Distillation is a technique used to transfer knowledge from a larger, more complex machine learning model to a smaller, more efficient version. By leveraging the predictions of a larger "teacher" model, the smaller "student" model is trained to mimic its behavior, maintaining high performance while reducing the computational cost. This approach is essential for deploying AI models in resource-constrained environments, enabling faster inference speeds and lower memory requirements without sacrificing accuracy. Distillation is particularly beneficial in real-time applications and mobile devices, where computational resources are limited but model efficiency remains critical.

Distillation is a technique used in training machine learning models, specifically for compressing large models into smaller, more efficient ones without sacrificing performance. The process involves transferring knowledge from a larger, more complex "teacher" model to a smaller "student" model. The teacher model is typically pre-trained on a large dataset and has high capacity, while the student model is designed to be simpler and more computationally efficient.

In distillation, the student model learns to approximate the behavior of the teacher model by mimicking its outputs on a given task. The student is trained using the soft predictions (probabilities) generated by the teacher, rather than the hard labels or actual ground truth. This allows the student model to capture the teacher’s learned representations, including subtle patterns that might not be present in the raw training data.

The key benefits of distillation include:

Reduced Model Size: The student model typically requires less memory and computational power, making it more suitable for deployment on resource-constrained devices.
Improved Inference Speed: Smaller models are generally faster to run, allowing for quicker decision-making in real-time applications.
Maintained Accuracy: Despite the reduction in size, the student model often retains much of the accuracy of the teacher model, achieving a good balance between efficiency and performance.
Distillation has become a common approach in the deployment of AI models, especially in mobile and edge computing environments, where computational resources are limited.

22 of 31

Inference

Applying What AI Has Learned,

After training, the model is ready to make predictions or decisions on new, unseen data.

This stage is where AI moves from the lab to solving real-world problems.

This is when the model “thinks” or calculates what the response should be.

Right now we use longer thinking cycles to improve AI capability.

Inference in AI: From Training to Real-World Decisions

Once an AI model is trained, it enters the inference phase, where it applies its learned patterns to new, unseen data. This is where AI moves beyond theoretical development into practical, real-world problem-solving.

1. What Is Inference?

Inference is the process of using a trained model to generate predictions, classifications, or decisions based on new inputs. Unlike training, which involves learning from historical data, inference is about applying that knowledge in real-time.

Analogy: If training is like studying for an exam, inference is like answering questions on the test.

2. How Inference Works

At a high level, inference involves:

Input Data: The model receives raw data (text, images, sensor readings, etc.).
Feature Extraction: The input is pre-processed (e.g., tokenized for language models or normalized for numerical data).
Computation: The model applies its learned parameters to compute an output.
Output Generation: The model produces predictions, classifications, or recommendations.

For example:

A fraud detection model processes a transaction and predicts whether it’s fraudulent.
A recommendation system suggests a movie based on a user’s viewing history.
A chatbot generates responses in real-time based on user input.

3. Key Considerations for Business Applications

Performance vs. Accuracy

Latency: How fast does the model return an answer? Critical for real-time AI applications like chatbots or fraud detection.
Computational Cost: Cloud-based inference can be expensive; optimizing for efficiency is crucial.
Accuracy vs. Speed Trade-off: Higher accuracy models (e.g., large language models) may require more compute power, impacting responsiveness.

Deployment Strategies

On-Premises vs. Cloud: Cloud-based inference offers scalability but increases dependency on internet connectivity and external infrastructure.
Edge AI: Running inference directly on devices (smartphones, IoT) reduces latency and dependency on cloud resources.

Inference Optimization Techniques

Model Compression: Reducing the size of the model to improve speed and efficiency.
Quantization: Lowering the precision of numerical calculations to reduce computational costs.
Batch Processing: Handling multiple inference requests simultaneously to maximize efficiency.

4. Business Use Cases

Inference powers many AI-driven applications across industries:

IndustryApplicationRetailPersonalized product recommendationsHealthcareDisease diagnosis via medical imagingFinanceFraud detection in real-time transactionsManufacturingPredictive maintenance for machineryCustomer ServiceAI-powered chatbots and virtual assistants5. The Future of Inference

AI inference will continue evolving with:

Faster hardware (GPUs, TPUs, edge devices)
Smarter algorithms (efficient transformers, knowledge distillation)
Improved energy efficiency to reduce costs and environmental impact

Businesses investing in AI inference should focus on balancing speed, accuracy, and cost-efficiency to maximize value.

Would you like a deeper dive into a specific aspect, such as cost optimization or edge inference strategies?

23 of 31

Test Time Compute

How Long a Model “Thinks” About a Query

Test-time compute refers to the computational resources and processes required to run a trained AI model during inference—i.e., when the model is used to make predictions or generate outputs after training is complete. This includes CPU/GPU usage, memory requirements, and energy consumption during deployment.

Test-Time Compute: Explanation and Relevance

Definition:

Test-time compute refers to the computational resources and processes required to run a trained AI model during inference—i.e., when the model is used to

make predictions or generate outputs after training is complete. This includes CPU/GPU usage, memory requirements, and energy consumption during

deployment.

Key Factors Affecting Test-Time Compute:

Model Size: Larger models (e.g., GPT-4, Stable Diffusion) require more memory and processing power during inference.
Algorithm Complexity: Advanced architectures or high-dimensional calculations increase test-time compute needs.
Inference Optimization: Techniques such as quantization, distillation, or hardware acceleration reduce compute requirements without compromising output quality.

Relevance:

**Scalability for Real-Time Applications:

Use cases like chatbots, fraud detection, or recommendation systems often demand low-latency responses. High test-time compute requirements can limit scalability and responsiveness in real-world applications.

Take full advantage.

Big topic: Idea:

Repetition…big memory training cycle-memory trained

It seems like your note got cut off, but here's the continuation:

Relevance (continued):

1. Cost Implications:�Test-time compute significantly impacts operational expenses, especially for businesses deploying AI at scale. High compute needs translate to greater energy consumption and higher costs, making it a key consideration in business models.

Scalability for Real-Time Applications:�Applications like chatbots, recommendation systems, or fraud detection require low-latency responses. High test-time compute can limit scalability, especially in environments with constrained resources or high traffic demands.
Energy Efficiency and Sustainability:�Models with high test-time compute requirements contribute to environmental challenges. Addressing test-time compute is critical for reducing AI's carbon footprint and ensuring sustainable deployment.
Edge AI and IoT:�Edge deployments (e.g., on smartphones or IoT devices) have strict compute and power limitations. Efficient test-time compute is essential to make advanced AI viable in these contexts.

24 of 31

Tokens

How ChatGPT Converts Language to Digestible Units

ChatGPT doesn’t read language it converts it into tokens. These tokens are converted into numerical representations (embeddings) for processing by the model.

Every interaction with ChatGPT consumes tokens, and a maximum number of tokens can be used in one conversation.

GPT-4 (8K): Has a token limit of 8,000 tokens
GPT-4 (32K): Has a token limit of 32,000 tokens
ChatGPT Enterprise: Can process up to 128,000 tokens

Token ~= 4 chars in English. 1 token ~= ¾ words. 100 tokens ~= 75 words.

How Tokens Work in ChatGPT: An Overview

What Are Tokens?

Tokens are segments of text, which can be as short as one character or as long as one word, depending on the language and the structure of the text. In English:

Common words like "cat" or "dog" are often single tokens.
Rare or complex words like "encyclopedia" may split into multiple tokens.
Spaces and punctuation are also treated as tokens.

For instance:

Text: "AI is powerful."
Tokens: ["AI", " is", " powerful", "."]

2. Tokenization Process

Tokenization is how raw text is broken down into tokens:

The text is preprocessed by removing unnecessary characters.
The tokenizer (a language model component) splits the text into tokens based on predefined rules.
These tokens are converted into numerical representations (embeddings) for processing by the model.

3. Token Limits

ChatGPT operates within a token limit:

GPT-3.5: Up to 4,096 tokens.
GPT-4: Depending on the variant, 8,192 or 32,768 tokens.

This limit includes:

Input tokens (user's query).
Output tokens (ChatGPT’s response).

For example, if a user's query is 1,000 tokens, the response can use up to 3,096 tokens (in GPT-3.5).

4. Token Cost

When using ChatGPT in API applications, tokens are billed:

Input tokens: Tokens in the user's prompt.
Output tokens: Tokens generated by ChatGPT.

Monitoring token usage is essential for cost efficiency, especially for long or complex interactions.

5. Why Tokens Matter

Tokens influence:

Performance: Shorter prompts leave more room for the model’s response within the token limit.
Clarity: Clear and concise inputs reduce token wastage.
Cost: Fewer tokens equal lower expenses.

6. Strategies to Optimize Token Usage

Streamline Inputs: Remove unnecessary words or redundant details.
Use Context Sparingly: Only include background information relevant to the query.
Test Prompts: Experiment with phrasing to achieve concise yet effective communication.

Understanding how tokens work is key to using ChatGPT effectively, whether for personal projects or enterprise solutions.

Efficient token usage enhances response quality while managing costs.

25 of 31

Context Windows

How ChatGPT converts language to digestible Units

A context window is the amount of text, measured in tokens, that a language model like ChatGPT can process at once, including both input and output. For example, a 4,096-token window limits the total size of the conversation and response. Larger context windows, such as GPT-4’s 32,768 tokens, allow for handling longer documents or conversations without losing coherence. Managing the context window ensures efficient and accurate interactions.

Context Window

Previous Tokens

Last Tokens

∞

How Tokens Work in ChatGPT: An Overview

What Are Tokens?

Tokens are segments of text, which can be as short as one character or as long as one word, depending on the language and the structure of the text. In English:

Common words like "cat" or "dog" are often single tokens.
Rare or complex words like "encyclopedia" may split into multiple tokens.
Spaces and punctuation are also treated as tokens.

For instance:

Text: "AI is powerful."
Tokens: ["AI", " is", " powerful", "."]

2. Tokenization Process

Tokenization is how raw text is broken down into tokens:

The text is preprocessed by removing unnecessary characters.
The tokenizer (a language model component) splits the text into tokens based on predefined rules.
These tokens are converted into numerical representations (embeddings) for processing by the model.

3. Token Limits

ChatGPT operates within a token limit:

GPT-3.5: Up to 4,096 tokens.
GPT-4: Depending on the variant, 8,192 or 32,768 tokens.

This limit includes:

Input tokens (user's query).
Output tokens (ChatGPT’s response).

For example, if a user's query is 1,000 tokens, the response can use up to 3,096 tokens (in GPT-3.5).

4. Token Cost

When using ChatGPT in API applications, tokens are billed:

Input tokens: Tokens in the user's prompt.
Output tokens: Tokens generated by ChatGPT.

Monitoring token usage is essential for cost efficiency, especially for long or complex interactions.

5. Why Tokens Matter

Tokens influence:

Performance: Shorter prompts leave more room for the model’s response within the token limit.
Clarity: Clear and concise inputs reduce token wastage.
Cost: Fewer tokens equal lower expenses.

6. Strategies to Optimize Token Usage

Streamline Inputs: Remove unnecessary words or redundant details.
Use Context Sparingly: Only include background information relevant to the query.
Test Prompts: Experiment with phrasing to achieve concise yet effective communication.

Understanding how tokens work is key to using ChatGPT effectively, whether for personal projects or enterprise solutions.

Efficient token usage enhances response quality while managing costs.

26 of 31

Feedback Loops

A feedback loop is the process of feeding AI system outputs back into the model for evaluation, correction, and improvement. It allows models to adapt to real-world changes, learn from mistakes, and optimize their performance over time.

Explicit Feedback�Users directly correct or validate outputs, such as marking an email as spam or rating a recommendation.

Implicit Feedback�Observing user behavior, such as clicks, scrolls, or time spent on content, provides indirect feedback for model adjustments.

System Metrics�Automated monitoring of model performance metrics (e.g., error rates, accuracy) helps detect weaknesses without user input.

Human-in-the-Loop (HITL)�Human reviewers actively validate or correct model decisions, ensuring quality and accuracy, especially in critical applications like content moderation.

27 of 31

AI Agents

Autonomous systems that sense, decide, and act in dynamic environments.

AI agents perceive their environment, make decisions, and take actions autonomously, often adapting to new information to optimize outcomes in dynamic settings.

Today the really are more like smart workflows with the intelligence of a 4-year old.

28 of 31

AI Agent Frameworks

Tool Name	Type	Key Functions	Notable Features / Use Cases
CrewAI	Open Source	Collaborative AI agent orchestration	Multi-agent workflows, Python-native, supports LLM chaining
OBot from Acorn.io	Open Source	Build and manage AI agents integrating LLMs, APIs, tools	Customizable agent behavior, workflow automation
IBM Bee AI Agents	Open Source	Agent-based workflow management	Enterprise-ready, supports IBM Granite, Llama 3.x integration
AutoGen	Open Source	Multi-agent conversation and task execution	Built by Microsoft, easy to chain agents with tools and memory
LangGraph	Open Source	Graph-based framework for LLM agent state management	Integrates with LangChain, supports parallel and dynamic routing
Superagent	Open Source	Serverless AI agent framework	Cloud-native, plug-and-play tools, persistent memory support

Autonomous systems that sense, decide, and act in dynamic environments.

29 of 31

Enterprise GPT

AI infrastructure that provides complete control by the enterprise user.

Enterprise GPT refers to a customized large language model (LLM) deployment tailored for business use, offering secure, private, and role-specific generative AI capabilities. It integrates with enterprise data, tools, and workflows to support tasks like content generation, summarization, customer support, and analytics. Unlike public GPT, it prioritizes compliance, data governance, and scalability across organizational environments.

30 of 31

Enterprise GPT

A ChatGPT like index that you build like ChatGPT

Component Category	Opinionated Choice	Role in Stack	Why this Choice?
1. RAG Framework	LlamaIndex	Orchestrates data loading, indexing, retrieval from Vector DB, and interaction with the LLM for generation.	Highly focused specifically on RAG, robust data handling/indexing features, strong community, integrates well with Python ecosystem.
2. Vector Database	Milvus	Stores and indexes vector embeddings of enterprise data for efficient similarity search and retrieval.	Scalable, performant, cloud-native, dedicated open-source vector DB. Widely adopted for RAG. Integrates smoothly with LlamaIndex.
3. LLM (Base Model)	IBM Granite 3.2 8B Instruct (ibm-granite/granite-3.2-8b-instruct)	Understands user queries and generates accurate, context-aware responses based on retrieved data.	Open source (Apache 2.0), designed for enterprise, strong RAG/instruction following performance for its size, supports multiple languages, large context window.
4. Chatbot UI (Frontend)	Chatbot UI	Provides the web-based user interface for interacting with the RAG chatbot.	Popular open-source ChatGPT-like UI, customizable, familiar interface, connects to various LLM backends via API.
5. Embedding Model	BGE-Large-EN-v1.5 (or similar)	Converts text data (documents, queries) into vector embeddings for storage and search in Milvus.	Top-performing open-source text embedding model (check MTEB leaderboard), runs locally for privacy, integrates easily.
6. Vector DB Management	Attu	Provides a GUI for managing, monitoring, and interacting directly with the Milvus database instance.	Official GUI tool for Milvus, essential for administration and debugging.
7. Deployment	Docker / Kubernetes (w/ LlamaIndex backend & LLM/Embedding inference via Ollama/vLLM)	Containerizes services; K8s manages deployment/scaling. Backend service exposes API for Chatbot UI. Inference servers like Ollama/vLLM serve models efficiently.	Standard enterprise deployment. Backend needed to connect UI to RAG logic. Ollama/vLLM confirmed to support open-source Granite models.

1 of 31

2 of 31

3 of 31

4 of 31

5 of 31

6 of 31

7 of 31

8 of 31

9 of 31

10 of 31

11 of 31

12 of 31

13 of 31

14 of 31

15 of 31

16 of 31

17 of 31

18 of 31

19 of 31

20 of 31

21 of 31

22 of 31

23 of 31

24 of 31

25 of 31

26 of 31

27 of 31

28 of 31

29 of 31

30 of 31

31 of 31