AI-Agent-Engineering
Introduction
Introduction
Showcase (I): Automatic generation of teaching material in collaboration of different AI agents. (For the digital edition DERLA: https://erinnerungslandschaft.at/ )
Showcase (II): Automatic generation of teaching material in collaboration of different AI agents. (For the digital edition DERLA: https://erinnerungslandschaft.at/ )
Showcase (III): Automatic generation of teaching material in collaboration of different AI agents. (For the digital edition DERLA: https://erinnerungslandschaft.at/ )
Aim(s)
Recommendations / How to get started
Workshop structure
Limitations of the workshop
Main section
Subdomains (AI-Agent-Engineering)
Large Language Models (LLM)
LLM (established commercial) products
LLMs: Everyday-user exposed concepts
“Concepts that are required for using basic LLM products (chats, coding assistants)”
LLM: Basic concepts
LLM Training
LLM training: phases
Stanford Alpaca: Instruction following training dataset: https://github.com/tatsu-lab/stanford_alpaca (14.02.2025)
(Frontier) model inference
… via REST-APIs
(Frontier-)LLM inference: (REST-)APIs
(Frontier-)LLM inference via anthropic console (https://console.anthropic.com | 28.02.2025)
(Frontier-)LLM inference: OpenAI platform (https://platform.openai.com/ )
(Frontier-)LLM inference: Google cloud https://ai.google.dev/ (28.02.2025)
LLM-inference: Pricing per used model (I)
Pricing per used model (II)
https://www.vellum.ai/llm-leaderboard
DeepSeek platform: https://platform.deepseek.com/usage (19.02.2025)
DeepSeek REST-API: Inference = chat https://api-docs.deepseek.com/api/create-chat-completion (19.02.2025)
Speed, pricing, “intelligence” comparison of DeepSeek: https://artificialanalysis.ai/models/deepseek-v3 (19.02.2025)
Domain Model “Generative AI”
…about REST-API designs
Frontier REST-API design
Frontier REST-API design (II): 1. Chat completions
/v1/chat/completions - HTTP Multipart form “messages”
Basically: “tokens in → tokens back”
Frontier REST-API design (II) - 1a. Tools
/v1/chat/completions – HTTP Multipart form “tools”
Allow LLM to call functions
Example workflow:
AI Domain Model: LLM usage differentiation
Both concepts are established in the AI domain models (REST-API designs)
Common tools: Domain APIs
“Everyday tools used in context of AI-Agent-Engineering”
Google Colab
Different runtimes:
Google Colab (II): Change runtime
Google Colab (III): Pricing
Google Colab (IV): Secrets
Google Colab (V): Setup API keys
Hugging Face (I)
Hugging Face (II): Pipelines
Provide easy to use abstractions to simplify everyday AI related use cases:
https://huggingface.co/docs/transformers/en/main_classes/pipelines
Hugging Face (III): Pipelines
Hugging Face (IV): Pipelines
Hugging Face (V): Pipelines
Hugging Face (VI): Pipelines
LLM Open Source Inference
Cloud computing fundamentals
Cloud Computing (II): Wikipedia, 31.01.2025, URL: https://en.wikipedia.org/wiki/Software_as_a_service#/media/File:Comparison_of_on-premise,_IaaS,_PaaS,_and_SaaS.png
About inference costs: https://a16z.com/llmflation-llm-inference-cost/ (28.02.2025)
LLM cloud inference options
Open source inference: PaaS solutions
E.g. using managed REST-APIs and pay per million tokens for instruct model endpoints and request limits.
Perplexitiy search (04.02.2024): https://www.perplexity.ai/search/hi-i-want-to-access-llm-infere-B71_ELVeTuymxLqlO02_gQ#3
Perplexitiy search (04.02.2024): https://www.perplexity.ai/search/hi-i-want-to-access-llm-infere-B71_ELVeTuymxLqlO02_gQ#3
PaaS via Together AI (I): https://www.together.ai/
PaaS: Together AI (II)
PaaS: Together AI (III)
Pros | Cons |
Access to open source models | Might not be as “strong” as frontier models |
Fast and easy setup | Rate limit for requests |
1$ free trial | |
For the trial phase no credit card required | |
Playground and documentation | |
PaaS: Together AI (IV)
PaaS via Together AI (IV): https://www.together.ai/playground
PaaS: Fireworks AI
PaaS: Fireworks AI
PaaS: Fireworks AI - Pricing https://fireworks.ai/pricing
PaaS: Fireworks AI - Pricing https://fireworks.ai/pricing
Pros | Cons |
Access to open source models | Might not be as “strong” as frontier models |
Fast and easy setup | Rate limit for requests |
1$ free trial | |
For the trial phase no credit card required | |
Playground and documentation | |
PaaS: Mistral
Mistral (https://mistral.ai/en )
Mistral API (https://docs.mistral.ai/api/ )
PaaS: Hugging Face Inference Endpoints
Hugging Face Inference Endpoints catalogue: https://endpoints.huggingface.co/catalog
Hugging Face Inference Endpoints catalogue (pricing): https://endpoints.huggingface.co/catalog
Deploy open source model as PaaS via HuggingFace: https://huggingface.co/meta-llama/Llama-3.1-8B
Open Source inference: “Climbing down the API(s)”
… about lower level APIs
Tokenizer (Hugging Face lib)
Maps between text and tokens for a particular model
Tokenizers for key models
You need to run the same tokenizer at inference-time as in training-time! (otherwise results will be gibberish)
LLama tokenizer
Llama Tokenizer
Llama Tokenizer
Open source inference: instruct models
Models:
Open source inference (instruct models): key aspects
Quantization in Google Colab using hugging face’s transformers library
Tokenization in Google Colab using hugging face’s transformers library
Load model (for inference) in Google Colab using hugging face’s transformers library
Print model details in Google Colab using hugging face’s transformers library
“Doing the inference”: Via Google Colab using hugging face’s transformers library
Comparing LLMs
“Choosing the right LLM for my use case(s) “
LLM comparison (for inference)
There is no simple answer! -> all about picking the right LLM for your task
In general: LLMs need to be evaluated for any given task
E.g. costs to consider when deciding to use a different model for your use case:
LLM Comparison - 2. Looking at the results: benchmarks
7 common benchmarks
ARC | Reasoning | Evaluates scientific reasoning, multiple-choice questions |
DROP | Language comprehesion | Distill details from text then add, count or sort |
HellaSwag | Common Sense | “Harder Endings, Long Contexts and Low Shot Activities” |
MMLU | Understanding | Factual recall, reasoning and problem solving across 57 subjects |
TruthfulQA | Accuracy | Robustness in providing truthful replies in adversarial conditions |
Winogrande | Context | Tests if the LLM understands context and resolves ambiguity |
GSM8K | Math | Math and word problems taught in elementary and middle schools |
| | |
Comparing Open and Closed Source models
Agentic tool use benchmark: https://scale.com/leaderboard/tool_use_chat
Instruction following benchmark: https://scale.com/leaderboard/instruction_following
Chatbot Arena: https://lmarena.ai/
Chatbot Arena: https://lmarena.ai/
Chatbot Arena: https://lmarena.ai/
Evaluating generative AI: About metrics
RAG in agentic systems
Retrieval Augmented Generation
RAG: Basic idea
(Some) Techniques to improve (prompt) results:
RAG = follow-up improvement?
RAG: Basic idea (II)
Bigger Picture:
Vector Embedding:
LangChain’s description of RAG: Data ingestion - https://blog.langchain.dev/tutorial-chatgpt-over-your-data/ (10.02.2025)
LangChain’s description of RAG: Data querying - https://blog.langchain.dev/tutorial-chatgpt-over-your-data/ (10.02.2025)
RAG with LangChain: Key abstractions
Creating a RAG chain via LangChain: Key abstractions
Model Fine-Tuning (frontier models)
… about fine-tuning existing frontier models
How to improve model accuracy?
Model training: Dataset types
Types of datasets:
Model training: Understanding the data
Model training as part of optimizing LLM results
Frontier model fine-tuning: OpenAI
Three steps :
Example for JSONL: https://jsonlines.org/examples/ (14.02.2025)
OpenAI Fine-tuning: Training API
Simplified workflow:
OpenAI fine-tuning using OpenAI’s python client (14.02.2025)
Fine-tuning frontier models: Objectives and challenges
Challenges:
“Fine-tune models for better results and efficiency” - https://platform.openai.com/docs/guides/fine-tuning (14.02.2025)
Fine-tuning open source models
LoRA
QLoRA
QLoRA: Hyperparameters
Demo Python based setup for QLoRA fine-tuning
2. Agents
Definition(s)
AI-Agent definition (I)
First consider:
Most common understanding in context of “AI-Agent-Engineering”:
AI-Agent definition (II)
Common characteristics (“Autonomously performing tasks”)”
AI-Agent definition (III)
AI agents as part of an agent framework
LLM software interacting with traditional software and other LLMs
Some key aspects:
Demo agent architecture: Good deal spotter
AI-Agents (I): Course approach
Perspective of software design / software architecture / object oriented programming:
AI-Agents (II): Course’s definition
Code abstractions around LLMs based on (human) roles in a domain, like “data scientist”, “accountant” or “data steward”. The abstractions should mirror a domain’s pattern of division of labor (simplifying the implementation of semi-automated workflows).
Explicitly excluded:
Important consideration:
AI-Agents (III): LLM usage
Reminder:
Established AI agent frameworks and products
About libraries, frameworks, tools and web interfaces.
Overview
CrewAI (https://www.crewai.com/ )
CrewAI GitHub example: https://github.com/crewAIInc/crewAI | Defining agents via yaml file
CrewAI GitHub example: https://github.com/crewAIInc/crewAI | Defining tasks via yaml file
CrewAI GitHub example: https://github.com/crewAIInc/crewAI | Using AI agent and task declaration via python decorators
CrewAI GitHub example: https://github.com/crewAIInc/crewAI | “Starting” the AI crew with API keys to LLM inference providers
CrewAI on GitHub: https://github.com/crewAIInc/crewAI | CrewAI telemetry
AutoGen example chat using AI agents: https://microsoft.github.io/autogen/stable/
Zapier: https://zapier.com/
Zapier: https://zapier.com/
Zapier: https://zapier.com/ | Example agent workflows
Zapier: https://zapier.com/ | Example agent workflows
Voiceflow: https://www.voiceflow.com/
AI copilot with voiceflow example: https://www.voiceflow.com/
Voiceflow console (https://www.voiceflow.com/ )
StackAI: https://www.stack-ai.com/
StackAI templates: https://www.stack-ai.com/templates
Relevance AI: https://relevanceai.com/
MindStudio: https://www.mindstudio.ai/
MindStudio console: https://www.mindstudio.ai/
n8n: https://n8n.io/
N8n: Creating an AI agent workflow: https://n8n.io/
Current developments
…
Key aspects: Industry focus
Information logistics = central aspect of agentic systems? (contextual awareness of AI agents):
3. (Research) Software Engineering
Cloud Software Engineering
Cloud computing fundamentals
Cloud Computing (II): Wikipedia, 31.01.2025, URL: https://en.wikipedia.org/wiki/Software_as_a_service#/media/File:Comparison_of_on-premise,_IaaS,_PaaS,_and_SaaS.png
Software development
… required key aspects
Distributed software engineering
Dependency Management
Version control
Software Lifecycle
Programming
… required key aspects
Programming language basics
Dependencies
Reproducible, portable development environments
Resources