3 of 169

Showcase (I): Automatic generation of teaching material in collaboration of different AI agents. (For the digital edition DERLA: https://erinnerungslandschaft.at/ )

4 of 169

Showcase (II): Automatic generation of teaching material in collaboration of different AI agents. (For the digital edition DERLA: https://erinnerungslandschaft.at/ )

5 of 169

Showcase (III): Automatic generation of teaching material in collaboration of different AI agents. (For the digital edition DERLA: https://erinnerungslandschaft.at/ )

6 of 169

Aim(s)

Basic understanding of AI-Agent-Engineering

Based on Python using (primarily) commercial LLM providers.
Modeling AI agents in (object oriented) python code.
Solve domain problems via using different specialised AI agents.

Build semi to fully automated systems based on AI agents (using LLMs)?

7 of 169

Recommendations / How to get started

LLM Engineering: Master AI, Large Language Models & Agents, https://www.udemy.com/course/llm-engineering-master-ai-and-large-language-models/ (19.02.2025)
18 months of building autonomous AI agents in 42 Minutes, https://www.youtube.com/watch?v=8N2_iXC16uo&t=126s&ab_channel=DevinKearns%7CCUSTOMAISTUDIO (06.02.2025)

8 of 169

Workshop structure

Introduction

Showcase
Aim(s)
Recommendations / How to get started
…

Subdomains (AI - Agent - Engineering)

AI: Generative AI basics
Agent: Software design, abstractions and reuse
Engineering: (Research) Software Engineering

Conclusion

Resources / references

9 of 169

Limitations of the workshop

Focus on certain frontier-models (OpenAI, Anthropic, etc.)
Focus on Python

10 of 169

Main section

11 of 169

Subdomains (AI-Agent-Engineering)

AI: Generative AI
Agent: Software design, abstractions and reuse
Engineering: (Research) Software Engineering

12 of 169

Generative AI

13 of 169

Large Language Models (LLM)

14 of 169

LLM (established commercial) products

AI coding assistants

Examples: Tabnine, GitHub Copilot, IntelliJ Coding Assistant, Codeium

Domain specific AI “copilots”

Examples: Integration in photoshop, microsoft office etc.

Chatbots

Examples: ChatGPT, Claude, Mistral, Meta AI, Google Gemini, Deepseek, etc.

(Web REST-APIs)

Paying per request for model inference.

15 of 169

LLMs: Everyday-user exposed concepts

“Concepts that are required for using basic LLM products (chats, coding assistants)”

Prompting

Prompt engineering techniques to improve results.
Chain of thought (done manually OR automatically — let another LLM break down prompt into smaller tasks)

Context

Providing relevant context(s) will improve LLM result
Might be static or dynamic
“Chat history”

Evaluation

“does the result make sense” → metrics
Human evaluation
LLM evaluation

16 of 169

LLM: Basic concepts

Training time (not covered)
Inference time

17 of 169

LLM Training

18 of 169

LLM training: phases

Pre-Training

LLMs are trained with a large corpus of text → predicting next tokens.
Pre-trained models are not easy to use → need for post-training.

Post-Training

Instruction-following-training:

LLM adapted to follow specific instructions and commands
Also called supervised-fine-tuning
Makes models easier to use
Trains model to answer in a specific style

RLHF (Reinforcement Learning with human feedback)

Model is being fine-tuned using human preference → better alignment of output with human values and intentions.

19 of 169

Stanford Alpaca: Instruction following training dataset: https://github.com/tatsu-lab/stanford_alpaca (14.02.2025)

20 of 169

(Frontier) model inference

… via REST-APIs

21 of 169

(Frontier-)LLM inference: (REST-)APIs

Difference between pricing models for chatbots and access to REST-API
Developer access is usually separated from “everyday user” access

chatgpt.com VS platform.openai.com
claude.ai VS anthropic.platform
gemini.google.com VS ai.google.dev

Chatbots: Fixed monthly payment rates (e.g. 25$ per month, pricing depends on included features and rates)
REST-APIs: Pay per request (=Millions of tokens for input AND output), consider limited context length.

22 of 169

(Frontier-)LLM inference via anthropic console (https://console.anthropic.com | 28.02.2025)

23 of 169

(Frontier-)LLM inference: OpenAI platform (https://platform.openai.com/ )

24 of 169

(Frontier-)LLM inference: Google cloud https://ai.google.dev/ (28.02.2025)

25 of 169

LLM-inference: Pricing per used model (I)

Pricing is based on millions of tokens per input AND output.
Typically the pricing per token varies per provider and used model.

E.g. “Amazon has cheaper mini models than anthropic”, …
Larger > smaller models
Newer > older models
New product > established product
General LLM capabilities > specialised capabilities (“This model has limited capabilities in terms of reasoning, but for my use case it performs as well as gpt-4o-mini for a fraction of the price.”)

26 of 169

Pricing per used model (II)

https://www.vellum.ai/llm-leaderboard

27 of 169

DeepSeek platform: https://platform.deepseek.com/usage (19.02.2025)

28 of 169

DeepSeek REST-API: Inference = chat https://api-docs.deepseek.com/api/create-chat-completion (19.02.2025)

29 of 169

Speed, pricing, “intelligence” comparison of DeepSeek: https://artificialanalysis.ai/models/deepseek-v3 (19.02.2025)

30 of 169

Domain Model “Generative AI”

…about REST-API designs

31 of 169

Frontier REST-API design

Chat - /v1/chat

Completions - /v1/chat/completions (Request form parameter “messages”)
Tools (Request form parameter “tools”)
Temperature (Request form parameter “temperature”)
…

Images - /v1/images/generations

Generate images
DALL-E

Audio - /v1/audio/speech
Embeddings /v1/embeddings

“Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms”

https://platform.openai.com/docs/api-reference/introduction

32 of 169

Frontier REST-API design (II): 1. Chat completions

/v1/chat/completions - HTTP Multipart form “messages”

Basically: “tokens in → tokens back”

33 of 169

Frontier REST-API design (II) - 1a. Tools

/v1/chat/completions – HTTP Multipart form “tools”

Allow LLM to call functions

Example workflow:

Describe (python) function (e.g. fuzzy_match_person_name) via json-schema (LLM is trained on those tool definitions!)
Initiate chat (“What is the full name of Sebi Schillerstoff?”)
REST-API responds with the wish call the tool (“Call fuzzy_match_person_name with Sebi Schillerstoff”)
Call tool locally → send result to REST-API: fuzzy_match_person_name(“Sebi Schillerstoff”)
REST-API responds with finished chat:

“The full name of Sebi Schillerstoff is Sebastian David Schiller-Stoff”

34 of 169

AI Domain Model: LLM usage differentiation

LLM as interface: Using LLM as natural language based interfaces to automatically access computer systems

“How much is the train ticket from Graz to Vienna” → translate to system call → lookup_placeids_in_database(Graz,Vienna) → calculate_date_ranges() → retrieve_prices() → LLM build answer → “The prices vary between 50€ and 200€ depending if you want to go tomorrow or in the next few days.”

LLM as knowledge base:

Using LLMs for their “weltwissen”.
“How much is the train ticket from Graz to Vienna” → respond directly based on trained data → “...”

Both concepts are established in the AI domain models (REST-API designs)

35 of 169

Common tools: Domain APIs

“Everyday tools used in context of AI-Agent-Engineering”

36 of 169

Google Colab

https://colab.google.com
Jupyter Notebook in the cloud
Collaboration
Integration with other Google services (e.g. google drive)

Different runtimes:

CPU boxes
GPU (free GPUs are available but limited):

Lower spec -> $
Higher spec -> $$$

37 of 169

Google Colab (II): Change runtime

38 of 169

Google Colab (III): Pricing

39 of 169

Google Colab (IV): Secrets

Allow to securely share secrets between colab notebooks
Environment variable workflow

Use for API keys:

Hugging face API key
Github API key
OpenAI platform
Gemini
…

Shared notebooks don’t share secrets!

Need to be setup by users!

40 of 169

Google Colab (V): Setup API keys

41 of 169

Hugging Face (I)

https://huggingface.co/

Hugging Face Platform:

Models: ~800.000 Open Source Models
Datasets: ~200.000 datasets
Spaces: Apps

Hugging Face libraries:

hub
datasets
transformers
peft
trl

42 of 169

Hugging Face (II): Pipelines

Provide easy to use abstractions to simplify everyday AI related use cases:

https://huggingface.co/docs/transformers/en/main_classes/pipelines

Sentiment Analysis
Classifier
Named Entity Recognition (NER)
Question Answering
Summarizing
Translation
Generate content:

Text
Image
Audio

43 of 169

Hugging Face (III): Pipelines

44 of 169

Hugging Face (IV): Pipelines

45 of 169

Hugging Face (V): Pipelines

46 of 169

Hugging Face (VI): Pipelines

47 of 169

LLM Open Source Inference

48 of 169

Cloud computing fundamentals

Software as a Service (SaaS)

Chat interfaces via web-browser.
Example: ChatGPT, Claude, Meta AI web based chat.

Platform as a Service (PaaS)

REST-APIs for inference against instruct models (chat-completions).
Example: opanai.platform, anthropic console, nvidia models, gemini REST-APIs, together.ai, groq

Infrastructure as a Service (IaaS)

Running lower level api code on cloud machines.
Example: Loading model into memory and doing inference on a Google Colab machine.

On-Premise

Hardware, networking, system administration → doing everything by myself.

49 of 169

Cloud Computing (II): Wikipedia, 31.01.2025, URL: https://en.wikipedia.org/wiki/Software_as_a_service#/media/File:Comparison_of_on-premise,_IaaS,_PaaS,_and_SaaS.png

50 of 169

About inference costs: https://a16z.com/llmflation-llm-inference-cost/ (28.02.2025)

51 of 169

LLM cloud inference options

Top 10 AI Inference Platforms in 2025: https://www.helicone.ai/blog/llm-api-providers

52 of 169

Open source inference: PaaS solutions

E.g. using managed REST-APIs and pay per million tokens for instruct model endpoints and request limits.

Together AI (https://www.together.ai/ )
Fireworks AI (https://fireworks.ai/ )
Mistral (Only Mistral models, but open source) (https://mistral.ai/technology/#pricing )?
Groq (https://groq.com/ )
Nvidia build / nim → (playground is PaaS / then IaaS → hosting nvidia microservices on cloud or four yourself) (https://developer.nvidia.com/ )

53 of 169

Perplexitiy search (04.02.2024): https://www.perplexity.ai/search/hi-i-want-to-access-llm-infere-B71_ELVeTuymxLqlO02_gQ#3

54 of 169

Perplexitiy search (04.02.2024): https://www.perplexity.ai/search/hi-i-want-to-access-llm-infere-B71_ELVeTuymxLqlO02_gQ#3

55 of 169

PaaS via Together AI (I): https://www.together.ai/

56 of 169

PaaS: Together AI (II)

https://www.together.ai/
Provides open source model inference
Deepseek R1, Llama, mistral …
Payer per millions of tokens

57 of 169

PaaS: Together AI (III)

Pros	Cons
Access to open source models	Might not be as “strong” as frontier models
Fast and easy setup	Rate limit for requests
1$ free trial
For the trial phase no credit card required
Playground and documentation

58 of 169

PaaS: Together AI (IV)

PaaS via Together AI (IV): https://www.together.ai/playground

59 of 169

PaaS: Fireworks AI

https://fireworks.ai/
“The fastest and most efficient inference engine to build production-ready, compound AI systems.” (03/02/2025)

60 of 169

PaaS: Fireworks AI

61 of 169

PaaS: Fireworks AI - Pricing https://fireworks.ai/pricing

62 of 169

PaaS: Fireworks AI - Pricing https://fireworks.ai/pricing

Pros	Cons
Access to open source models	Might not be as “strong” as frontier models
Fast and easy setup	Rate limit for requests
1$ free trial
For the trial phase no credit card required
Playground and documentation

63 of 169

PaaS: Mistral

https://mistral.ai/
“We release open-weight models for everyone to customize and deploy where they want it. Our super-efficient model Mistral Nemo is available under Apache 2.0, while Mistral Large 2 is available through both a free non-commercial license, and a commercial license. “
https://console.mistral.ai

64 of 169

Mistral (https://mistral.ai/en )

65 of 169

Mistral API (https://docs.mistral.ai/api/ )

66 of 169

PaaS: Hugging Face Inference Endpoints

Deploy any model on Hugging Face Hub via Hugging Face Inference Endpoints for model inference via HTTP (REST-APIS). Includes:

Custom endpoints
Custom model selection
HuggingFace python library integration.

Payment

Different pricing per hour uptime (GPU) per model (type, size, quantization etc.)

67 of 169

Hugging Face Inference Endpoints catalogue: https://endpoints.huggingface.co/catalog

68 of 169

Hugging Face Inference Endpoints catalogue (pricing): https://endpoints.huggingface.co/catalog

69 of 169

Deploy open source model as PaaS via HuggingFace: https://huggingface.co/meta-llama/Llama-3.1-8B

70 of 169

Open Source inference: “Climbing down the API(s)”

… about lower level APIs

71 of 169

Tokenizer (Hugging Face lib)

Maps between text and tokens for a particular model

Translates between text and tokens with encode() and decode()
Contains a vocab that can include special tokes to signal information to the LLM, like start of a prompt. (special character mechanism → training!)
Can include a chat template that knows how to format a chat message for this model.

72 of 169

Tokenizers for key models

You need to run the same tokenizer at inference-time as in training-time! (otherwise results will be gibberish)

LLama 3.1 → Meta
Phi3 → Microsoft
Quen2 → Alibaba Cloud
Starcoder2 → Model for generation code (own tokenizer)

73 of 169

LLama tokenizer

Llama Tokenizer

74 of 169

Llama Tokenizer

75 of 169

Open source inference: instruct models

instruct models = “chat models”
using Hugging Face’s lower level APIs
Using Google Colab’s cloud GPUs to run the inference (loading models, apply tokenizers etc.)

Models:

Llama (Meta)
Mixtral (Mistral)
Phi3 (Microsoft)
Quen2 (Alibaba Cloud)
Gemma (Google)

76 of 169

Open source inference (instruct models): key aspects

Quantization:

Basic idea: Limiting the memory footprint of a model (running on lower-tier machines)
Mechanism: Reducing the precision of the (numbers of the) weights in a model..
Example: Reducing 32 bit numbers of weights in a model -> down to 8 or 4 bits; Instead of float 2.1233232412 -> 2.12
Drawback: Accuracy goes down (but the decrease is limited )

Model Internals:

Pytorch in background of hugging_faces’s transformers library
Deep neural network(s) with lower level concepts: Activation function, perceptron layer, attention layers …

Streaming: Streaming back results

Building interactive chats via text streaming

77 of 169

Quantization in Google Colab using hugging face’s transformers library

78 of 169

Tokenization in Google Colab using hugging face’s transformers library

79 of 169

Load model (for inference) in Google Colab using hugging face’s transformers library

80 of 169

Print model details in Google Colab using hugging face’s transformers library

81 of 169

“Doing the inference”: Via Google Colab using hugging face’s transformers library

82 of 169

Comparing LLMs

“Choosing the right LLM for my use case(s) “

83 of 169

LLM comparison (for inference)

There is no simple answer! -> all about picking the right LLM for your task

In general: LLMs need to be evaluated for any given task

Basic comparison

Open source / closed source
Release date / knowledge cut-off
Parameters
Training
Context length
Pricing

Looking at the results

Benchmarks
Leaderboards
Arenas

84 of 169

Basic comparison: Primary questions

Open source/closed source model

Rights, licenses, etc.
Transparency
E.g. LLama not allowed to be used commercially in EU

Release date and knowledge cut-off

Model has knowledge of current events?

Parameters

Shows “strength ” of the model / costs of the model
How much training data is needed for fine-tuning

Training

Size of the training dataset, size of training parameters
Depth of expertise

Context length?

Size of context window - total size of memory that a model is able to keep in it’s memory.

85 of 169

Basic comparison (II): Involved costs

E.g. costs to consider when deciding to use a different model for your use case:

Inference

API charges (e.g. requesting against Anthropic console),
monthly subscription fees (e.g. using Claude / ChatGPT ),
computing costs (e.g. running models on your own via compute units on Google Colab )

Knowledge requirements

effort to acquire the necessary skills involved using certain LLMs / certain LLM providers: Technical, jurisdictional, scientific expertise etc.

Development

Effort needed to actually create the solution

86 of 169

Basic comparison (III): Involved costs

Time to Market

One of the core arguments for using frontier models and providers, like OpenAI or Anthropic.
Quickly setup a powerful solution via OpenAI platform.
Fine tuning of own open source model would be much harder and slower

Rate limits

Networking requirements
Reliability: Up-time of APIs? REST-APIs might be overloaded,

Speed

How quickly can a response / new tokens generated?

Latency

Does the user have to wait for a response?

87 of 169

Basic comparison (IV): Involved costs

License:

Most of open source model have a very open license
Very common are user limits: “Allowed to use free of charge up to 1 billion users”
LLAMA: restricting for commercial use in EU

88 of 169

LLM Comparison - 2. Looking at the results: benchmarks

7 common benchmarks

ARC	Reasoning	Evaluates scientific reasoning, multiple-choice questions
DROP	Language comprehesion	Distill details from text then add, count or sort
HellaSwag	Common Sense	“Harder Endings, Long Contexts and Low Shot Activities”

89 of 169

MMLU	Understanding	Factual recall, reasoning and problem solving across 57 subjects
TruthfulQA	Accuracy	Robustness in providing truthful replies in adversarial conditions
Winogrande	Context	Tests if the LLM understands context and resolves ambiguity
GSM8K	Math	Math and word problems taught in elementary and middle schools

90 of 169

Comparing Open and Closed Source models

91 of 169

Agentic tool use benchmark: https://scale.com/leaderboard/tool_use_chat

92 of 169

Instruction following benchmark: https://scale.com/leaderboard/instruction_following

93 of 169

Chatbot Arena: https://lmarena.ai/

94 of 169

Chatbot Arena: https://lmarena.ai/

95 of 169

Chatbot Arena: https://lmarena.ai/

96 of 169

Evaluating generative AI: About metrics

Model centric metrics

Tend to measure the direct performance of a model that can be used for model optimization straight away.
Examples:

Loss
Perplexity
Accuracy
Precision, Recall, F1
AUC-ROC

Outcome or domain centric metrics

Ability to solve a domain problems.
Not obviously related to LLM performance. (Lot of unknowns in-between like used prompting techniques, understanding of the domain problem, usage of certain LLMs, etc.)
Examples:

Improvement in time, cost or resources
User satisfaction

97 of 169

RAG in agentic systems

Retrieval Augmented Generation

98 of 169

RAG: Basic idea

(Some) Techniques to improve (prompt) results:

Multi-shot prompting
Usage of tools (LLM tool concept)
Additional context

RAG = follow-up improvement?

Knowledge base: Expert information
Retrieve relevant information from knowledge base
Add relevant information to prompt (context)

99 of 169

RAG: Basic idea (II)

Bigger Picture:

Auto-Regressive LLMs

Input token → output token
Examples: GPT series,

Auto-Encoding LLMs

Whole input → whole output
Sentiment Analysis, classification (hugging face pipelines)
Examples: BERT from Google, OpenAIEmbeddings (REST-API)

Vector Embedding:

Represent an understanding of the text (similarity search)
Central use-case for vector databases: “Return similar documents to add useful context to my prompt.”

100 of 169

LangChain’s description of RAG: Data ingestion - https://blog.langchain.dev/tutorial-chatgpt-over-your-data/ (10.02.2025)

101 of 169

LangChain’s description of RAG: Data querying - https://blog.langchain.dev/tutorial-chatgpt-over-your-data/ (10.02.2025)

102 of 169

RAG with LangChain: Key abstractions

LLM = Abstraction around different Large Language Models
Retriever = Interface on something like a vector store
Memory = Represents some kind of history of a discussion like a chat history.

103 of 169

Creating a RAG chain via LangChain: Key abstractions

104 of 169

Model Fine-Tuning (frontier models)

… about fine-tuning existing frontier models

105 of 169

How to improve model accuracy?

Model inference

Prompt Engineering (Multi-Shot prompting, Prompt chaining, …)
Tools / function calling
RAG (knowledge base)

Model training

Train a new model
Fine-Tuning (Transfer Learning)

106 of 169

Model training: Dataset types

Types of datasets:

Proprietary (own) data
Open data

Kaggle
Hugging Face
Zenodo?
…

Synthetic data

“Frontier model generates synthetic data for a cheaper model”

On demand data (commercial)

E.g. companies crafting datasets for your project (Scale.com)

107 of 169

Model training: Understanding the data

Investigate
Parse
Visualize
Assess data quality
Curate
Save and publish

108 of 169

Model training as part of optimizing LLM results

Requirement Engineering

Decide on metrics: measure success → how should success be measured for given task?
Non-functional requirements: Scalability, latency, etc.

Preparation (=architecture?)

Overview: Complete products - standard software - custom development

Existing products: Problem already solved? “Pay somebody”? Delegate parts to standard software?

Methodology

Non-LLM solution?: Traditional data-science / machine learning methods (feature, engineering, linear regression, …)?

LLM comparison (context-length, pricing, license, …)

Benchmark, leaderboards, arenas,
Specialist scores

Data curation

LLM selection

Choose LLMs
Experiment with LLMs and different tasks
Train and validate (using curated data)

Customization

Optimization of LLM results

Inference time (Prompt Engineering, RAG)
Training time (fine-tuning, decide to train own model?)

Production

Architecture suitable for production? (Security, reportability, etc.)
API design etc.
…

109 of 169

Frontier model fine-tuning: OpenAI

Three steps :

Dataset needs to be provided as JSONL (JSON Lines)
Run training

Recommendation by OpenAI: (use between 50 and 100 datapoints)

Evaluate results (tweak and repeat)

110 of 169

Example for JSONL: https://jsonlines.org/examples/ (14.02.2025)

111 of 169

OpenAI Fine-tuning: Training API

Simplified workflow:

Create training and validation dataset in JSONL format.
Send both to OpenAI → Starts an asynchronous training process (Job).

Additional parameters: Model name (to be created), training options etc.

Wait for the process to finish (automatic mail to account)
Use model name to start inference.

https://platform.openai.com/docs/guides/fine-tuning

112 of 169

OpenAI fine-tuning using OpenAI’s python client (14.02.2025)

113 of 169

Fine-tuning frontier models: Objectives and challenges

Setting response style
Improve reliability (produce expected / similar outputs)
Correcting failures with complex prompts
Better performance with edge cases
Perform skill that is hard to describe within a prompt

Challenges:

Fine-tuning datasets are not going to have a large impact compared to the huge datasets used for model training.
Fine-tuning might erode model performance.

114 of 169

“Fine-tune models for better results and efficiency” - https://platform.openai.com/docs/guides/fine-tuning (14.02.2025)

115 of 169

Fine-tuning open source models

116 of 169

LoRA

Lower Rank Adaption
Basic idea:

Gather training data (“input token → prediction token”)
Freeze all layers in neural network
Target modules: Select a few of the layers
Create own fewer dimension layers: = Lower rank adapters are applied to the target modules to shift the weights to predict the expected training token,

117 of 169

QLoRA

Quantized variant of LoRA

118 of 169

QLoRA: Hyperparameters

There are three essential hyperparameters in case of QLoRA fine-tuning

Count of dimension in the lower-rank matrices

Alpha

Scaling factor
Multiplies with the lower rank matrices → bigger Alpha means more weight shift.

Target Modules

Which layers of the neural networks are targeted by QLoRA.

119 of 169

Demo Python based setup for QLoRA fine-tuning

120 of 169

2. Agents

121 of 169

Definition(s)

122 of 169

AI-Agent definition (I)

First consider:

Umbrella term
Different meaning in different contexts (term “agent”)

Most common understanding in context of “AI-Agent-Engineering”:

Software entities that can autonomously perform tasks
AI agents as part of an agent framework to solve complex problems with limited human involvement.

123 of 169

AI-Agent definition (II)

Common characteristics (“Autonomously performing tasks”)”

Autonomous
Goal-oriented : “Thing to do”
Task specific: “Specialised to be good on one thing”

124 of 169

AI-Agent definition (III)

AI agents as part of an agent framework

LLM software interacting with traditional software and other LLMs

Some key aspects:

Memory / Persistence

E.g. traditional software handling writing output to file

Decision-Making / orchestration

E.g. LLM assigning task to specific agent (traditional software or LLM)

Planning capabilities

E.g. LLM breaking problems down for other models or software

Use of tools; potentially connecting to databases or the internet.

E.g. LLM calls tool calculator

125 of 169

Demo agent architecture: Good deal spotter

Planning agent (Coordinating activities)
Scanner agent (Identify promising deals in the web via web scraping from given URLs)
Price analyser agent(s) (Different agents trying to judge if given price is a good deal for given product description )
Ensemble agent (Compares different price estimations)
Messaging agent (Sends push notifications)

126 of 169

AI-Agents (I): Course approach

Perspective of software design / software architecture / object oriented programming:

OOP → model the “real world problem” into code.
Real world problem? → automate human workflows?

127 of 169

AI-Agents (II): Course’s definition

Code abstractions around LLMs based on (human) roles in a domain, like “data scientist”, “accountant” or “data steward”. The abstractions should mirror a domain’s pattern of division of labor (simplifying the implementation of semi-automated workflows).

Explicitly excluded:

No need for an agentic framework
Degree of autonomy might vary a lot

Important consideration:

Involved agents (based on roles) to solve a specific task might be assigned / created completely autonomously.

128 of 169

AI-Agents (III): LLM usage

Reminder:

LLMs used for their “Weltwissen”
LLMs used as natural language interfaces to computer systems

LLM tool concept

129 of 169

Established AI agent frameworks and products

About libraries, frameworks, tools and web interfaces.

130 of 169

Overview

CrewAI (https://www.crewai.com/ )

Coding (Python) skills required = Python framework

AutoGen (https://microsoft.github.io/autogen/stable/ )

Coding (Python) skills required = Python framework

Zapier (https://zapier.com/ )

No code
Graphical user interface to create agents

Voiceflow (https://www.voiceflow.com/ )

No code
Platform to build chatbots
Maybe most popular platform to build chatbots
Chatbots → enhanced to agents

StackAI (https://www.stack-ai.com/ )

No code
Platform to build chatbots
Chatbots → agents

Relevance AI (https://relevanceai.com/ )

No code

MindStudio (https://www.mindstudio.ai/ )

No code

n8n (https://n8n.io/ )

No code

131 of 169

CrewAI (https://www.crewai.com/ )

132 of 169

CrewAI GitHub example: https://github.com/crewAIInc/crewAI | Defining agents via yaml file

133 of 169

CrewAI GitHub example: https://github.com/crewAIInc/crewAI | Defining tasks via yaml file

134 of 169

CrewAI GitHub example: https://github.com/crewAIInc/crewAI | Using AI agent and task declaration via python decorators

135 of 169

CrewAI GitHub example: https://github.com/crewAIInc/crewAI | “Starting” the AI crew with API keys to LLM inference providers

136 of 169

CrewAI on GitHub: https://github.com/crewAIInc/crewAI | CrewAI telemetry

137 of 169

AutoGen: https://microsoft.github.io/autogen/stable/

138 of 169

AutoGen example chat using AI agents: https://microsoft.github.io/autogen/stable/

139 of 169

Zapier: https://zapier.com/

140 of 169

Zapier: https://zapier.com/

141 of 169

Zapier: https://zapier.com/ | Example agent workflows

142 of 169

Zapier: https://zapier.com/ | Example agent workflows

143 of 169

Voiceflow: https://www.voiceflow.com/

144 of 169

AI copilot with voiceflow example: https://www.voiceflow.com/

145 of 169

Voiceflow console (https://www.voiceflow.com/ )

146 of 169

StackAI: https://www.stack-ai.com/

147 of 169

StackAI templates: https://www.stack-ai.com/templates

148 of 169

Relevance AI: https://relevanceai.com/

149 of 169

MindStudio: https://www.mindstudio.ai/

150 of 169

MindStudio console: https://www.mindstudio.ai/

151 of 169

n8n: https://n8n.io/

152 of 169

N8n: Creating an AI agent workflow: https://n8n.io/

153 of 169

Current developments

…

154 of 169

Key aspects: Industry focus

Building sophisticated no-code AI agent frameworks.
Enhancing existing coding frameworks

Information logistics = central aspect of agentic systems? (contextual awareness of AI agents):

“Required data needs to be at the required location at the required time.”
Speed, correctness, completeness etc. of data

155 of 169

3. (Research) Software Engineering

156 of 169

Cloud Software Engineering

157 of 169

Cloud computing fundamentals

Software as a Service (SaaS)

Chat interfaces via web-browser.
Example: ChatGPT, Claude, Meta AI web based chat.

Platform as a Service (PaaS)

REST-APIs for inference against instruct models (chat-completions).
Example: opanai.platform, anthropic console, nvidia models, gemini REST-APIs, together.ai, groq

Infrastructure as a Service (IaaS)

Running lower level api code on cloud machines.
Example: Loading model into memory and doing inference on a Google Colab machine.

On-Premise

Hardware, networking, system administration → doing everything by myself.

158 of 169

Cloud Computing (II): Wikipedia, 31.01.2025, URL: https://en.wikipedia.org/wiki/Software_as_a_service#/media/File:Comparison_of_on-premise,_IaaS,_PaaS,_and_SaaS.png

159 of 169

Software development

… required key aspects

160 of 169

Distributed software engineering

HTTP
REST-APIs
Networking (latency, availability, …)
Security (Basics of OAUTH2)
Monitoring (Pricing per requests)
Software environments (local / remote environment variables)

161 of 169

Dependency Management

Semantic versioning
Risk assessment
Software metadata (requirements.txt, project.toml)
Build systems

162 of 169

Version control

GIT fundamentals
GIT based software (GitLab, GitHub, HuggingFace, etc.)

163 of 169

Software Lifecycle

164 of 169

Programming

… required key aspects

165 of 169

Programming language basics

React to networking errors
Using REST-APIs
Abstractions (OOP / functional programming)

166 of 169

Dependencies

Common libraries and tools for the programming language of choice
Available tools in context of AI-Agent-Engineering

167 of 169

Reproducible, portable development environments

IDEs
Apache Maven for Java
Venv, Anaconda, uv , rye (Python)

168 of 169

Resources

169 of 169

LLM Engineering: Master AI, Large Language Models & Agents, https://www.udemy.com/course/llm-engineering-master-ai-and-large-language-models/ (19.02.2025)
18 months of building autonomous AI agents in 42 Minutes, https://www.youtube.com/watch?v=8N2_iXC16uo&t=126s&ab_channel=DevinKearns%7CCUSTOMAISTUDIO (06.02.2025)
Stanford Webinar - Agentic AI: A Progression of Language Model usage (https://www.youtube.com/watch?v=kJLiOGle3Lw&t=13s&ab_channel=StanfordOnline )
Deep Dive into LLMs like ChatGPT, https://www.youtube.com/watch?v=7xTGNNLPyMI&t=3557s&ab_channel=AndrejKarpathy (19.02.2025)
Agentic Design Patterns, https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-5-multi-agent-collaboration/ (28.02.2025)
Münzer, Leona; Schiller-Stoff, Sebastian David; Dittmann, Christina; Sagadin, Suzana; Der Einfluss von AI Pair-Programmers auf die Digital Humanities: Potentiale und Limitationen. In: Reiter, Nils; Haider, Thomas; Kababgi, Daniel; Buschmeier, Hendrik; (Hg.): DHd 2025: Under Construction. Bielefeld. Zenodo. 2025. 307-311. doi:10.5281/zenodo.14887461