1 of 169

AI-Agent-Engineering

Introduction

2 of 169

Introduction

3 of 169

Showcase (I): Automatic generation of teaching material in collaboration of different AI agents. (For the digital edition DERLA: https://erinnerungslandschaft.at/ )

4 of 169

Showcase (II): Automatic generation of teaching material in collaboration of different AI agents. (For the digital edition DERLA: https://erinnerungslandschaft.at/ )

5 of 169

Showcase (III): Automatic generation of teaching material in collaboration of different AI agents. (For the digital edition DERLA: https://erinnerungslandschaft.at/ )

6 of 169

Aim(s)

  • Basic understanding of AI-Agent-Engineering
    • Based on Python using (primarily) commercial LLM providers.
    • Modeling AI agents in (object oriented) python code.
    • Solve domain problems via using different specialised AI agents.

  • Build semi to fully automated systems based on AI agents (using LLMs)?

7 of 169

Recommendations / How to get started

8 of 169

Workshop structure

  1. Introduction
    1. Showcase
    2. Aim(s)
    3. Recommendations / How to get started
  2. Subdomains (AI - Agent - Engineering)
    • AI: Generative AI basics
    • Agent: Software design, abstractions and reuse
    • Engineering: (Research) Software Engineering
  3. Conclusion
    • Resources / references

9 of 169

Limitations of the workshop

  • Focus on certain frontier-models (OpenAI, Anthropic, etc.)
  • Focus on Python

10 of 169

Main section

11 of 169

Subdomains (AI-Agent-Engineering)

  1. AI: Generative AI
  2. Agent: Software design, abstractions and reuse
  3. Engineering: (Research) Software Engineering

12 of 169

  1. Generative AI

13 of 169

Large Language Models (LLM)

14 of 169

LLM (established commercial) products

  1. AI coding assistants
    1. Examples: Tabnine, GitHub Copilot, IntelliJ Coding Assistant, Codeium
  2. Domain specific AI “copilots”
    • Examples: Integration in photoshop, microsoft office etc.
  3. Chatbots
    • Examples: ChatGPT, Claude, Mistral, Meta AI, Google Gemini, Deepseek, etc.
  4. (Web REST-APIs)
    • Paying per request for model inference.

15 of 169

LLMs: Everyday-user exposed concepts

“Concepts that are required for using basic LLM products (chats, coding assistants)”

  1. Prompting
    1. Prompt engineering techniques to improve results.
    2. Chain of thought (done manually OR automatically — let another LLM break down prompt into smaller tasks)
  2. Context
    • Providing relevant context(s) will improve LLM result
    • Might be static or dynamic
    • “Chat history”
  3. Evaluation
    • “does the result make sense” → metrics
    • Human evaluation
    • LLM evaluation

16 of 169

LLM: Basic concepts

  • Training time (not covered)
  • Inference time

17 of 169

LLM Training

18 of 169

LLM training: phases

  1. Pre-Training
    1. LLMs are trained with a large corpus of text → predicting next tokens.
    2. Pre-trained models are not easy to use → need for post-training.
  2. Post-Training
    • Instruction-following-training:
      • LLM adapted to follow specific instructions and commands
      • Also called supervised-fine-tuning
      • Makes models easier to use
      • Trains model to answer in a specific style
    • RLHF (Reinforcement Learning with human feedback)
      • Model is being fine-tuned using human preference → better alignment of output with human values and intentions.

19 of 169

Stanford Alpaca: Instruction following training dataset: https://github.com/tatsu-lab/stanford_alpaca (14.02.2025)

20 of 169

(Frontier) model inference

… via REST-APIs

21 of 169

(Frontier-)LLM inference: (REST-)APIs

  • Difference between pricing models for chatbots and access to REST-API
  • Developer access is usually separated from “everyday user” access
    • chatgpt.com VS platform.openai.com
    • claude.ai VS anthropic.platform
    • gemini.google.com VS ai.google.dev

  1. Chatbots: Fixed monthly payment rates (e.g. 25$ per month, pricing depends on included features and rates)
  2. REST-APIs: Pay per request (=Millions of tokens for input AND output), consider limited context length.

22 of 169

(Frontier-)LLM inference via anthropic console (https://console.anthropic.com | 28.02.2025)

23 of 169

(Frontier-)LLM inference: OpenAI platform (https://platform.openai.com/ )

24 of 169

(Frontier-)LLM inference: Google cloud https://ai.google.dev/ (28.02.2025)

25 of 169

LLM-inference: Pricing per used model (I)

  • Pricing is based on millions of tokens per input AND output.
  • Typically the pricing per token varies per provider and used model.
    • E.g. “Amazon has cheaper mini models than anthropic”, …
    • Larger > smaller models
    • Newer > older models
    • New product > established product
    • General LLM capabilities > specialised capabilities (“This model has limited capabilities in terms of reasoning, but for my use case it performs as well as gpt-4o-mini for a fraction of the price.”)

26 of 169

Pricing per used model (II)

https://www.vellum.ai/llm-leaderboard

27 of 169

DeepSeek platform: https://platform.deepseek.com/usage (19.02.2025)

28 of 169

DeepSeek REST-API: Inference = chat https://api-docs.deepseek.com/api/create-chat-completion (19.02.2025)

29 of 169

Speed, pricing, “intelligence” comparison of DeepSeek: https://artificialanalysis.ai/models/deepseek-v3 (19.02.2025)

30 of 169

Domain Model “Generative AI”

…about REST-API designs

31 of 169

Frontier REST-API design

  1. Chat - /v1/chat
    1. Completions - /v1/chat/completions (Request form parameter “messages”)
    2. Tools (Request form parameter “tools”)
    3. Temperature (Request form parameter “temperature”)
  2. Images - /v1/images/generations
    • Generate images
    • DALL-E
  3. Audio - /v1/audio/speech
  4. Embeddings /v1/embeddings
    • “Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms”

https://platform.openai.com/docs/api-reference/introduction

32 of 169

Frontier REST-API design (II): 1. Chat completions

/v1/chat/completions - HTTP Multipart form “messages”

Basically: “tokens in → tokens back”

33 of 169

Frontier REST-API design (II) - 1a. Tools

/v1/chat/completions – HTTP Multipart form “tools”

Allow LLM to call functions

Example workflow:

  1. Describe (python) function (e.g. fuzzy_match_person_name) via json-schema (LLM is trained on those tool definitions!)
  2. Initiate chat (“What is the full name of Sebi Schillerstoff?”)
  3. REST-API responds with the wish call the tool (“Call fuzzy_match_person_name with Sebi Schillerstoff”)
  4. Call tool locally → send result to REST-API: fuzzy_match_person_name(“Sebi Schillerstoff”)
  5. REST-API responds with finished chat:
    1. “The full name of Sebi Schillerstoff is Sebastian David Schiller-Stoff”

34 of 169

AI Domain Model: LLM usage differentiation

  1. LLM as interface: Using LLM as natural language based interfaces to automatically access computer systems
    1. “How much is the train ticket from Graz to Vienna” → translate to system call → lookup_placeids_in_database(Graz,Vienna) → calculate_date_ranges() → retrieve_prices() → LLM build answer → “The prices vary between 50€ and 200€ depending if you want to go tomorrow or in the next few days.”
  2. LLM as knowledge base:
    • Using LLMs for their “weltwissen”.
    • “How much is the train ticket from Graz to Vienna” → respond directly based on trained data → “...”

Both concepts are established in the AI domain models (REST-API designs)

35 of 169

Common tools: Domain APIs

“Everyday tools used in context of AI-Agent-Engineering”

36 of 169

Google Colab

  • https://colab.google.com
  • Jupyter Notebook in the cloud
  • Collaboration
  • Integration with other Google services (e.g. google drive)

Different runtimes:

  • CPU boxes
  • GPU (free GPUs are available but limited):
    • Lower spec -> $
    • Higher spec -> $$$

37 of 169

Google Colab (II): Change runtime

38 of 169

Google Colab (III): Pricing

39 of 169

Google Colab (IV): Secrets

  • Allow to securely share secrets between colab notebooks
  • Environment variable workflow
    • Use for API keys:
      • Hugging face API key
      • Github API key
      • OpenAI platform
      • Gemini
  • Shared notebooks don’t share secrets!
    • Need to be setup by users!

40 of 169

Google Colab (V): Setup API keys

41 of 169

Hugging Face (I)

  • https://huggingface.co/

  • Hugging Face Platform:
    • Models: ~800.000 Open Source Models
    • Datasets: ~200.000 datasets
    • Spaces: Apps
  • Hugging Face libraries:
    • hub
    • datasets
    • transformers
    • peft
    • trl

42 of 169

Hugging Face (II): Pipelines

Provide easy to use abstractions to simplify everyday AI related use cases:

https://huggingface.co/docs/transformers/en/main_classes/pipelines

  • Sentiment Analysis
  • Classifier
  • Named Entity Recognition (NER)
  • Question Answering
  • Summarizing
  • Translation
  • Generate content:
    • Text
    • Image
    • Audio

43 of 169

Hugging Face (III): Pipelines

44 of 169

Hugging Face (IV): Pipelines

45 of 169

Hugging Face (V): Pipelines

46 of 169

Hugging Face (VI): Pipelines

47 of 169

LLM Open Source Inference

48 of 169

Cloud computing fundamentals

  1. Software as a Service (SaaS)
    1. Chat interfaces via web-browser.
    2. Example: ChatGPT, Claude, Meta AI web based chat.
  2. Platform as a Service (PaaS)
    • REST-APIs for inference against instruct models (chat-completions).
    • Example: opanai.platform, anthropic console, nvidia models, gemini REST-APIs, together.ai, groq
  3. Infrastructure as a Service (IaaS)
    • Running lower level api code on cloud machines.
    • Example: Loading model into memory and doing inference on a Google Colab machine.
  4. On-Premise
    • Hardware, networking, system administration → doing everything by myself.

49 of 169

50 of 169

About inference costs: https://a16z.com/llmflation-llm-inference-cost/ (28.02.2025)

51 of 169

LLM cloud inference options

52 of 169

Open source inference: PaaS solutions

E.g. using managed REST-APIs and pay per million tokens for instruct model endpoints and request limits.

53 of 169

54 of 169

55 of 169

PaaS via Together AI (I): https://www.together.ai/

56 of 169

PaaS: Together AI (II)

  • https://www.together.ai/
  • Provides open source model inference
  • Deepseek R1, Llama, mistral …
  • Payer per millions of tokens

57 of 169

PaaS: Together AI (III)

Pros

Cons

Access to open source models

Might not be as “strong” as frontier models

Fast and easy setup

Rate limit for requests

1$ free trial

For the trial phase no credit card required

Playground and documentation

58 of 169

PaaS: Together AI (IV)

PaaS via Together AI (IV): https://www.together.ai/playground

59 of 169

PaaS: Fireworks AI

  • https://fireworks.ai/
  • “The fastest and most efficient inference engine to build production-ready, compound AI systems.” (03/02/2025)

60 of 169

PaaS: Fireworks AI

61 of 169

PaaS: Fireworks AI - Pricing https://fireworks.ai/pricing

62 of 169

PaaS: Fireworks AI - Pricing https://fireworks.ai/pricing

Pros

Cons

Access to open source models

Might not be as “strong” as frontier models

Fast and easy setup

Rate limit for requests

1$ free trial

For the trial phase no credit card required

Playground and documentation

63 of 169

PaaS: Mistral

  • https://mistral.ai/
  • “We release open-weight models for everyone to customize and deploy where they want it. Our super-efficient model Mistral Nemo is available under Apache 2.0, while Mistral Large 2 is available through both a free non-commercial license, and a commercial license. “
  • https://console.mistral.ai

64 of 169

65 of 169

66 of 169

PaaS: Hugging Face Inference Endpoints

  • Deploy any model on Hugging Face Hub via Hugging Face Inference Endpoints for model inference via HTTP (REST-APIS). Includes:
    • Custom endpoints
    • Custom model selection
    • HuggingFace python library integration.
  • Payment
    • Different pricing per hour uptime (GPU) per model (type, size, quantization etc.)

67 of 169

Hugging Face Inference Endpoints catalogue: https://endpoints.huggingface.co/catalog

68 of 169

Hugging Face Inference Endpoints catalogue (pricing): https://endpoints.huggingface.co/catalog

69 of 169

Deploy open source model as PaaS via HuggingFace: https://huggingface.co/meta-llama/Llama-3.1-8B

70 of 169

Open Source inference: “Climbing down the API(s)”

… about lower level APIs

71 of 169

Tokenizer (Hugging Face lib)

Maps between text and tokens for a particular model

  • Translates between text and tokens with encode() and decode()
  • Contains a vocab that can include special tokes to signal information to the LLM, like start of a prompt. (special character mechanism → training!)
  • Can include a chat template that knows how to format a chat message for this model.

72 of 169

Tokenizers for key models

You need to run the same tokenizer at inference-time as in training-time! (otherwise results will be gibberish)

  • LLama 3.1 → Meta
  • Phi3 → Microsoft
  • Quen2 → Alibaba Cloud
  • Starcoder2 → Model for generation code (own tokenizer)

73 of 169

LLama tokenizer

Llama Tokenizer

74 of 169

Llama Tokenizer

75 of 169

Open source inference: instruct models

  • instruct models = “chat models”
  • using Hugging Face’s lower level APIs
  • Using Google Colab’s cloud GPUs to run the inference (loading models, apply tokenizers etc.)

Models:

  • Llama (Meta)
  • Mixtral (Mistral)
  • Phi3 (Microsoft)
  • Quen2 (Alibaba Cloud)
  • Gemma (Google)

76 of 169

Open source inference (instruct models): key aspects

  1. Quantization:
    1. Basic idea: Limiting the memory footprint of a model (running on lower-tier machines)
    2. Mechanism: Reducing the precision of the (numbers of the) weights in a model..
    3. Example: Reducing 32 bit numbers of weights in a model -> down to 8 or 4 bits; Instead of float 2.1233232412 -> 2.12
    4. Drawback: Accuracy goes down (but the decrease is limited )
  2. Model Internals:
    • Pytorch in background of hugging_faces’s transformers library
    • Deep neural network(s) with lower level concepts: Activation function, perceptron layer, attention layers …
  3. Streaming: Streaming back results
    • Building interactive chats via text streaming

77 of 169

Quantization in Google Colab using hugging face’s transformers library

78 of 169

Tokenization in Google Colab using hugging face’s transformers library

79 of 169

Load model (for inference) in Google Colab using hugging face’s transformers library

80 of 169

Print model details in Google Colab using hugging face’s transformers library

81 of 169

“Doing the inference”: Via Google Colab using hugging face’s transformers library

82 of 169

Comparing LLMs

“Choosing the right LLM for my use case(s) “

83 of 169

LLM comparison (for inference)

There is no simple answer! -> all about picking the right LLM for your task

In general: LLMs need to be evaluated for any given task

  1. Basic comparison
    1. Open source / closed source
    2. Release date / knowledge cut-off
    3. Parameters
    4. Training
    5. Context length
    6. Pricing
  2. Looking at the results
    • Benchmarks
    • Leaderboards
    • Arenas

84 of 169

  1. Basic comparison: Primary questions
  • Open source/closed source model
    • Rights, licenses, etc.
    • Transparency
    • E.g. LLama not allowed to be used commercially in EU
  • Release date and knowledge cut-off
    • Model has knowledge of current events?
  • Parameters
    • Shows “strength ” of the model / costs of the model
    • How much training data is needed for fine-tuning
  • Training
    • Size of the training dataset, size of training parameters
    • Depth of expertise
  • Context length?
    • Size of context window - total size of memory that a model is able to keep in it’s memory.

85 of 169

  1. Basic comparison (II): Involved costs

E.g. costs to consider when deciding to use a different model for your use case:

  • Inference
    • API charges (e.g. requesting against Anthropic console),
    • monthly subscription fees (e.g. using Claude / ChatGPT ),
    • computing costs (e.g. running models on your own via compute units on Google Colab )
  • Knowledge requirements
    • effort to acquire the necessary skills involved using certain LLMs / certain LLM providers: Technical, jurisdictional, scientific expertise etc.
  • Development
    • Effort needed to actually create the solution

86 of 169

  • Basic comparison (III): Involved costs

  • Time to Market
    • One of the core arguments for using frontier models and providers, like OpenAI or Anthropic.
    • Quickly setup a powerful solution via OpenAI platform.
    • Fine tuning of own open source model would be much harder and slower
  • Rate limits
    • Networking requirements
    • Reliability: Up-time of APIs? REST-APIs might be overloaded,
  • Speed
    • How quickly can a response / new tokens generated?
  • Latency
    • Does the user have to wait for a response?

87 of 169

  • Basic comparison (IV): Involved costs

  • License:
    • Most of open source model have a very open license
    • Very common are user limits: “Allowed to use free of charge up to 1 billion users”
    • LLAMA: restricting for commercial use in EU

88 of 169

LLM Comparison - 2. Looking at the results: benchmarks

7 common benchmarks

ARC

Reasoning

Evaluates scientific reasoning, multiple-choice questions

DROP

Language comprehesion

Distill details from text then add, count or sort

HellaSwag

Common Sense

“Harder Endings, Long Contexts and Low Shot Activities”

89 of 169

MMLU

Understanding

Factual recall, reasoning and problem solving across 57 subjects

TruthfulQA

Accuracy

Robustness in providing truthful replies in adversarial conditions

Winogrande

Context

Tests if the LLM understands context and resolves ambiguity

GSM8K

Math

Math and word problems taught in elementary and middle schools

90 of 169

Comparing Open and Closed Source models

91 of 169

Agentic tool use benchmark: https://scale.com/leaderboard/tool_use_chat

92 of 169

Instruction following benchmark: https://scale.com/leaderboard/instruction_following

93 of 169

Chatbot Arena: https://lmarena.ai/

94 of 169

Chatbot Arena: https://lmarena.ai/

95 of 169

Chatbot Arena: https://lmarena.ai/

96 of 169

Evaluating generative AI: About metrics

  1. Model centric metrics
    • Tend to measure the direct performance of a model that can be used for model optimization straight away.
    • Examples:
      • Loss
      • Perplexity
      • Accuracy
      • Precision, Recall, F1
      • AUC-ROC
  2. Outcome or domain centric metrics
    • Ability to solve a domain problems.
    • Not obviously related to LLM performance. (Lot of unknowns in-between like used prompting techniques, understanding of the domain problem, usage of certain LLMs, etc.)
    • Examples:
      • Improvement in time, cost or resources
      • User satisfaction

97 of 169

RAG in agentic systems

Retrieval Augmented Generation

98 of 169

RAG: Basic idea

(Some) Techniques to improve (prompt) results:

  • Multi-shot prompting
  • Usage of tools (LLM tool concept)
  • Additional context

RAG = follow-up improvement?

  • Knowledge base: Expert information
  • Retrieve relevant information from knowledge base
  • Add relevant information to prompt (context)

99 of 169

RAG: Basic idea (II)

Bigger Picture:

  1. Auto-Regressive LLMs
    1. Input token → output token
    2. Examples: GPT series,
  2. Auto-Encoding LLMs
    • Whole input → whole output
    • Sentiment Analysis, classification (hugging face pipelines)
    • Examples: BERT from Google, OpenAIEmbeddings (REST-API)

Vector Embedding:

  • Represent an understanding of the text (similarity search)
  • Central use-case for vector databases: “Return similar documents to add useful context to my prompt.”

100 of 169

LangChain’s description of RAG: Data ingestion - https://blog.langchain.dev/tutorial-chatgpt-over-your-data/ (10.02.2025)

101 of 169

LangChain’s description of RAG: Data querying - https://blog.langchain.dev/tutorial-chatgpt-over-your-data/ (10.02.2025)

102 of 169

RAG with LangChain: Key abstractions

  1. LLM = Abstraction around different Large Language Models
  2. Retriever = Interface on something like a vector store
  3. Memory = Represents some kind of history of a discussion like a chat history.

103 of 169

Creating a RAG chain via LangChain: Key abstractions

104 of 169

Model Fine-Tuning (frontier models)

… about fine-tuning existing frontier models

105 of 169

How to improve model accuracy?

  1. Model inference
    1. Prompt Engineering (Multi-Shot prompting, Prompt chaining, …)
    2. Tools / function calling
    3. RAG (knowledge base)
  2. Model training
    • Train a new model
    • Fine-Tuning (Transfer Learning)

106 of 169

Model training: Dataset types

Types of datasets:

  1. Proprietary (own) data
  2. Open data
    1. Kaggle
    2. Hugging Face
    3. Zenodo?
  3. Synthetic data
    • “Frontier model generates synthetic data for a cheaper model”
  4. On demand data (commercial)
    • E.g. companies crafting datasets for your project (Scale.com)

107 of 169

Model training: Understanding the data

  1. Investigate
  2. Parse
  3. Visualize
  4. Assess data quality
  5. Curate
  6. Save and publish

108 of 169

Model training as part of optimizing LLM results

  1. Requirement Engineering
    1. Decide on metrics: measure success → how should success be measured for given task?
    2. Non-functional requirements: Scalability, latency, etc.
  2. Preparation (=architecture?)
    • Overview: Complete products - standard software - custom development
      1. Existing products: Problem already solved? “Pay somebody”? Delegate parts to standard software?
    • Methodology
      • Non-LLM solution?: Traditional data-science / machine learning methods (feature, engineering, linear regression, …)?
    • LLM comparison (context-length, pricing, license, …)
      • Benchmark, leaderboards, arenas,
      • Specialist scores
    • Data curation
  3. LLM selection
    • Choose LLMs
    • Experiment with LLMs and different tasks
    • Train and validate (using curated data)
  4. Customization
    • Optimization of LLM results
      • Inference time (Prompt Engineering, RAG)
      • Training time (fine-tuning, decide to train own model?)
  5. Production
    • Architecture suitable for production? (Security, reportability, etc.)
    • API design etc.

109 of 169

Frontier model fine-tuning: OpenAI

Three steps :

  1. Dataset needs to be provided as JSONL (JSON Lines)
  2. Run training
    1. Recommendation by OpenAI: (use between 50 and 100 datapoints)
  3. Evaluate results (tweak and repeat)

110 of 169

Example for JSONL: https://jsonlines.org/examples/ (14.02.2025)

111 of 169

OpenAI Fine-tuning: Training API

Simplified workflow:

  1. Create training and validation dataset in JSONL format.
  2. Send both to OpenAI → Starts an asynchronous training process (Job).
    1. Additional parameters: Model name (to be created), training options etc.
  3. Wait for the process to finish (automatic mail to account)
  4. Use model name to start inference.

https://platform.openai.com/docs/guides/fine-tuning

112 of 169

OpenAI fine-tuning using OpenAI’s python client (14.02.2025)

113 of 169

Fine-tuning frontier models: Objectives and challenges

  1. Setting response style
  2. Improve reliability (produce expected / similar outputs)
  3. Correcting failures with complex prompts
  4. Better performance with edge cases
  5. Perform skill that is hard to describe within a prompt

Challenges:

  • Fine-tuning datasets are not going to have a large impact compared to the huge datasets used for model training.
  • Fine-tuning might erode model performance.

114 of 169

“Fine-tune models for better results and efficiency” - https://platform.openai.com/docs/guides/fine-tuning (14.02.2025)

115 of 169

Fine-tuning open source models

116 of 169

LoRA

  • Lower Rank Adaption
  • Basic idea:
    • Gather training data (“input token → prediction token”)
    • Freeze all layers in neural network
    • Target modules: Select a few of the layers
    • Create own fewer dimension layers: = Lower rank adapters are applied to the target modules to shift the weights to predict the expected training token,

117 of 169

QLoRA

  • Quantized variant of LoRA

118 of 169

QLoRA: Hyperparameters

  • There are three essential hyperparameters in case of QLoRA fine-tuning

  1. R
    • Count of dimension in the lower-rank matrices
  2. Alpha
    • Scaling factor
    • Multiplies with the lower rank matrices → bigger Alpha means more weight shift.
  3. Target Modules
    • Which layers of the neural networks are targeted by QLoRA.

119 of 169

Demo Python based setup for QLoRA fine-tuning

120 of 169

2. Agents

121 of 169

Definition(s)

122 of 169

AI-Agent definition (I)

First consider:

  • Umbrella term
  • Different meaning in different contexts (term “agent”)

Most common understanding in context of “AI-Agent-Engineering”:

  1. Software entities that can autonomously perform tasks
  2. AI agents as part of an agent framework to solve complex problems with limited human involvement.

123 of 169

AI-Agent definition (II)

Common characteristics (“Autonomously performing tasks”)”

  • Autonomous
  • Goal-oriented : “Thing to do”
  • Task specific: “Specialised to be good on one thing”

124 of 169

AI-Agent definition (III)

AI agents as part of an agent framework

LLM software interacting with traditional software and other LLMs

Some key aspects:

  • Memory / Persistence
    • E.g. traditional software handling writing output to file
  • Decision-Making / orchestration
    • E.g. LLM assigning task to specific agent (traditional software or LLM)
  • Planning capabilities
    • E.g. LLM breaking problems down for other models or software
  • Use of tools; potentially connecting to databases or the internet.
    • E.g. LLM calls tool calculator

125 of 169

Demo agent architecture: Good deal spotter

  • Planning agent (Coordinating activities)
  • Scanner agent (Identify promising deals in the web via web scraping from given URLs)
  • Price analyser agent(s) (Different agents trying to judge if given price is a good deal for given product description )
  • Ensemble agent (Compares different price estimations)
  • Messaging agent (Sends push notifications)

126 of 169

AI-Agents (I): Course approach

Perspective of software design / software architecture / object oriented programming:

  • OOP → model the “real world problem” into code.
  • Real world problem? → automate human workflows?

127 of 169

AI-Agents (II): Course’s definition

Code abstractions around LLMs based on (human) roles in a domain, like “data scientist”, “accountant” or “data steward”. The abstractions should mirror a domain’s pattern of division of labor (simplifying the implementation of semi-automated workflows).

Explicitly excluded:

  • No need for an agentic framework
  • Degree of autonomy might vary a lot

Important consideration:

  • Involved agents (based on roles) to solve a specific task might be assigned / created completely autonomously.

128 of 169

AI-Agents (III): LLM usage

Reminder:

  1. LLMs used for their “Weltwissen”
  2. LLMs used as natural language interfaces to computer systems
    1. LLM tool concept

129 of 169

Established AI agent frameworks and products

About libraries, frameworks, tools and web interfaces.

130 of 169

Overview

  • CrewAI (https://www.crewai.com/ )
    • Coding (Python) skills required = Python framework
  • AutoGen (https://microsoft.github.io/autogen/stable/ )
    • Coding (Python) skills required = Python framework
  • Zapier (https://zapier.com/ )
    • No code
    • Graphical user interface to create agents
  • Voiceflow (https://www.voiceflow.com/ )
    • No code
    • Platform to build chatbots
    • Maybe most popular platform to build chatbots
    • Chatbots → enhanced to agents
  • StackAI (https://www.stack-ai.com/ )
    • No code
    • Platform to build chatbots
    • Chatbots → agents
  • Relevance AI (https://relevanceai.com/ )
    • No code
  • MindStudio (https://www.mindstudio.ai/ )
    • No code
  • n8n (https://n8n.io/ )
    • No code

131 of 169

132 of 169

CrewAI GitHub example: https://github.com/crewAIInc/crewAI | Defining agents via yaml file

133 of 169

CrewAI GitHub example: https://github.com/crewAIInc/crewAI | Defining tasks via yaml file

134 of 169

CrewAI GitHub example: https://github.com/crewAIInc/crewAI | Using AI agent and task declaration via python decorators

135 of 169

CrewAI GitHub example: https://github.com/crewAIInc/crewAI | “Starting” the AI crew with API keys to LLM inference providers

136 of 169

CrewAI on GitHub: https://github.com/crewAIInc/crewAI | CrewAI telemetry

137 of 169

138 of 169

AutoGen example chat using AI agents: https://microsoft.github.io/autogen/stable/

139 of 169

140 of 169

141 of 169

Zapier: https://zapier.com/ | Example agent workflows

142 of 169

Zapier: https://zapier.com/ | Example agent workflows

143 of 169

144 of 169

AI copilot with voiceflow example: https://www.voiceflow.com/

145 of 169

Voiceflow console (https://www.voiceflow.com/ )

146 of 169

147 of 169

148 of 169

149 of 169

150 of 169

MindStudio console: https://www.mindstudio.ai/

151 of 169

152 of 169

N8n: Creating an AI agent workflow: https://n8n.io/

153 of 169

Current developments

154 of 169

Key aspects: Industry focus

  • Building sophisticated no-code AI agent frameworks.
  • Enhancing existing coding frameworks

Information logistics = central aspect of agentic systems? (contextual awareness of AI agents):

  • Required data needs to be at the required location at the required time.”
  • Speed, correctness, completeness etc. of data

155 of 169

3. (Research) Software Engineering

156 of 169

Cloud Software Engineering

157 of 169

Cloud computing fundamentals

  • Software as a Service (SaaS)
    • Chat interfaces via web-browser.
    • Example: ChatGPT, Claude, Meta AI web based chat.
  • Platform as a Service (PaaS)
    • REST-APIs for inference against instruct models (chat-completions).
    • Example: opanai.platform, anthropic console, nvidia models, gemini REST-APIs, together.ai, groq
  • Infrastructure as a Service (IaaS)
    • Running lower level api code on cloud machines.
    • Example: Loading model into memory and doing inference on a Google Colab machine.
  • On-Premise
    • Hardware, networking, system administration → doing everything by myself.

158 of 169

159 of 169

Software development

… required key aspects

160 of 169

Distributed software engineering

  • HTTP
  • REST-APIs
  • Networking (latency, availability, …)
  • Security (Basics of OAUTH2)
  • Monitoring (Pricing per requests)
  • Software environments (local / remote environment variables)

161 of 169

Dependency Management

  • Semantic versioning
  • Risk assessment
  • Software metadata (requirements.txt, project.toml)
  • Build systems

162 of 169

Version control

  • GIT fundamentals
  • GIT based software (GitLab, GitHub, HuggingFace, etc.)

163 of 169

Software Lifecycle

164 of 169

Programming

… required key aspects

165 of 169

Programming language basics

  • React to networking errors
  • Using REST-APIs
  • Abstractions (OOP / functional programming)

166 of 169

Dependencies

  • Common libraries and tools for the programming language of choice
  • Available tools in context of AI-Agent-Engineering

167 of 169

Reproducible, portable development environments

  • IDEs
  • Apache Maven for Java
  • Venv, Anaconda, uv , rye (Python)

168 of 169

Resources

169 of 169