1
2
The slides are meant as visual support for the lecture.
They are neither a documentation nor a script.
Please do not print the slides.
Comments and feedback at n.meseth@hs-osnabrueck.de
3
ORGANIZATION
4
ILIAS
Microsoft Teams
5
sessions
6
group work
7
examination
8
working environment
9
visual studio code
python
tinkerforge
git
10
DIGITAL TECHNOLOGIES
11
solution
input
output
%%problem_model_input_solution_output%%
a model for solving problems
12
cyber physical systems
artificial intelligence
software prototyping
13
cyber physical systems
sensors
actuators
temperature
humidity
co2
uv light
ambient light
sound pressure
thermal image
camera
...
led
speaker
display
motor
…
artificial intelligence
software prototyping
14
artificial intelligence
computer vision
generative ai
natural language processing
cyber physical systems
software prototyping
15
artificial intelligence
computer vision
generative ai
natural language processing
cyber physical systems
software prototyping
image classification
image segmentation
object recognition
object tracking
face recognition
face identification
emotion recognition
pose estimation
text recognition
16
artificial intelligence
computer vision
generative ai
natural language processing
cyber physical systems
software prototyping
text generation
text summary
text analysis
image generation
image description
video generation
music generation
17
artificial intelligence
computer vision
generative ai
natural language processing
cyber physical systems
software prototyping
speech-to-text
text-to-speech
translation
18
artificial intelligence
user interfaces
cloud services
databases
cyber physical systems
software prototyping
19
introductory example
20
visual studio code
programs
python
21
LEDs
22
large language models
23
speech-to-text
24
user interface
25
SENSORS
27
temperature / humidity
28
th = BrickletHumidityV2(UID, ipcon)…
29
th.get_humidity()
th.get_temperature()
30
th.register_callback(th.CALLBACK_HUMIDITY, cb_humidity)
th.register_callback(th.CALLBACK_TEMPERATURE, …)
31
th.set_humidity_callback_configuration(250, False, "x", 0, 0)
th.set_temperature_callback_configuration(...)
32
rgb led button
33
btn = BrickletRGBLEDButton(UID, ipcon)…
34
btn.set_color(255, 0, 0)
35
btn.get_button_state()
36
btn.register_callback(...)
37
camera
38
OpenCV
import cv2
39
# Get video capture device (webcam)
webcam = cv2.VideoCapture(0)
40
# Read a frame
success, frame = webcam.read()
41
# Show the image from the frame
cv2.imshow("Webcam", frame)
42
# Save the frame as .png
cv2.imwrite("screenshot.png", frame)
43
thermal imaging camera
44
OpenCV
Tinkerforge
45
ti = BrickletThermalImaging(UID, ipcon)
ti.set_image_transfer_config(...)
img = ti.get_high_contrast_image()
46
ti.register_callback(...)
47
microphone
48
import pyaudio
49
# Define recording parameters
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
CHUNK = 1024
50
# Get access to the microphone
audio = pyaudio.PyAudio()
51
# Start listening
stream = audio.open(...)
52
# Read a chunk of frames
stream.read(CHUNK)
53
# Stop and close stream
stream.stop_stream()
stream.close()
54
# Terminate access to microphone
audio.terminate()
55
keyboard
56
import keyboard
57
# Define a callback function for a key
def record_audio():
print("Recording audio…")
58
# Add key listener
keyboard.add_hotkey("r", record_audio)
59
# Wait until a specific key was pressed
keyboard.wait("esc")
60
ACTUATORS
61
62
rgb led
63
led = BrickletRGBLEDV2(UID, ipcon)
led.set_rgb_value(255, 0, 0)
64
OLED display
65
oled = BrickletOLED128x64V2(UID, ipcon)
oled.clear_display()
oled.write_line(0, 0, "Welcome!")
66
speaker
67
import simpleaudio as sa
68
# Create a wave object from .wav-file and play it
wav = sa.WaveObject.from_wave_file("sound.wav")
wav.play().wait_done()
69
COMPUTER VISION
70
finding oranges in images
?
output
71
Image source: Wikimedia
73
what set of rules can solve this?
74
machine learning algorithms
75
rule-based program
rules
answer
data
76
rule-based program
rules
answer
data
machine learning
rules
data
answers
77
images in a computer
78
79
80
81
82
83
?
84
R
G
B
85
R
G
B
172
137
9
86
image classification
87
Q: Does an image belong to one or the other class from a fixed set of classes?
88
Cat or Dog?
model
"cat"
89
Cat or Dog?
model
"cat"
model
"dog"
90
Google's teachable machine
91
pip install keras
pip install tensorflow==2.12.0
92
# Load the classifier and class names
model = load_model("my_model.h5")
class_names = open("labels.txt", "r").readlines()
93
# Convert the image t0 224 x 224
image = cv2.resize(image, (224, 224), interpolation=cv2.INTER_AREA)
# Turn into a list of pixels
image = np.asarray(image, dtype=np.float32).reshape(1, 224, 224, 3)
# Normalize each pixel's color value (-1/1)
image = (image / 127.5) - 1
94
# Make a prediction for the class
prediction = model.predict(image)
# Get the class with the highest confidence value
index = np.argmax(prediction)
class_name = class_names[index]
# Get the confidence score for the predicted class
confidence_score = prediction[0][index]
95
96
YOLO v8 Image Classification
97
pip install ultralytics
98
# Load the classifier
from ultralytics import YOLO
model = YOLO("yolov8n-cls.pt")
99
# Make a prediction
results = model('cat.jpg')
100
# Show result
results[0].show()
101
# Get the top result
top = results[0].probs.top1
class_name = results[0].names[top]
print(class_name)
102
zero-shot image classification
103
Q: Which classes do you train your model on?
104
GPT-4 Vision
105
pip install openai
106
# import openai API and set api key
from openai import OpenAI
os.environ["OPENAI_API_KEY"] = "..."
client = OpenAI()
107
# define a suitable prompt for the task
prompt = "Classify the image into 'dog' or 'cat'. Return only the word for the class of the image."
108
# This function is needed to encode an image to base64 for OpenAI's API
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
image_path = "cat.webp"
image = encode_image(image_path)
109
response = client.chat.completions.create(
model="gpt-4-turbo",
messages = [
{ "role": "user", "content": [
{ "type": "text", "text": prompt },
{ "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{image}" } }
]
}
],
max_tokens=300,
)
110
# Show the answer of the classification
print(response.choices[0].message.content)
111
object detection
112
Q: Which objects are in the image and where?
113
AI
dog
bee
114
AI
cat
frog
115
YOLO v8 Object Detection
116
# Load the detector
from ultralytics import YOLO
model = YOLO("yolov8n.pt")
117
# Make a prediction one each frame
results = model(frame)
# Annotate frame
annotated_frame = results[0].plot()
118
119
Q: Which objects do you teach your model to recognize?
120
zero-shot object detection
121
"Simple Open-Vocabulary Object Detection with Vision Transformers"��https://arxiv.org/abs/2205.06230
122
# Load the open world detector
from ultralytics import YOLO
model = YOLO("yolov8s-world.pt")
123
# Define custom objects to look for
model.set_classes(["person with glasses"])
124
# Make a prediction one each frame
results = model(frame)
# Annotate frame
annotated_frame = results[0].plot()
125
optical character
recognition (OCR)
126
tesseract
127
GPT-4 Vision
128
# define a suitable prompt for the task
prompt = "Extract all food and beverage items with their quantity and price from this receipt into a JSON list. The receipt is in German."
129
response = client.chat.completions.create(
model="gpt-4o",
response_format={ "type": "json_object" },
messages = [
{ "role": "user", "content": [
{ "type": "text", "text": prompt },
{ "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{image}" } }
]
}
],
max_tokens=300,
)
130
GENERATIVE AI
131
LARGE LANGUAGE MODELS
132
what has been said so far?
(history + prompt)
prediction of next token based on learnt probability distribution
133
what has been said so far?
(history + prompt)
prediction of next token based on learnt probability distribution
(randomness)
134
what has been said so far?
(history + prompt)
prediction of next token based on learnt probability distribution
(randomness)
(filter)
(discriminating, insulting content)
135
what has been said so far?
(history + prompt)
prediction of next token based on learnt probability distribution
(randomness)
next word (token)
(filter)
(discriminating, insulting content)
136
what has been said so far?
(history + prompt)
prediction of next token based on learnt probability distribution
(randomness)
next word (token)
(filter)
(discriminating, insulting content)
137
PROMPTING
138
Prompt
Answer
Language
Model
139
elements of a prompt
<instruction>
<context>
<input data>
<output indicator>
140
example prompt
Explain the binary number system.
elements of a prompt
<instruction>
<context>
<input data>
<output indicator>
141
example prompt
Explain the binary number system.
elements of a prompt
<instruction>
<context>
<input data>
<output indicator>
start simple
142
example prompt
You are a friendly tutor and your task is to explain complex concepts as simple as possible.
Explain the binary number system.
elements of a prompt
<instruction>
<context>
<input data>
<output indicator>
143
example prompt
You are a friendly tutor and your task is to explain complex concepts as simple as possible.
Your answers are never longer than 10 sentence.
Explain the binary number system.
elements of a prompt
<instruction>
<context>
<input data>
<output indicator>
144
ZERO-SHOT PROMPTING
145
example prompt
Classify the text into neutral, negative or positive.
Text: "What a great dinner!"
Sentiment:
elements of a prompt
<instruction>
<context>
<input data>
<output indicator>
146
example prompt
Classify the text into neutral, negative or positive.
Text: "What a great dinner!"
Sentiment:
elements of a prompt
<instruction>
<context>
<input data>
<output indicator>
this will be replaced with data later…
147
FEW-SHOT PROMPTING
IN-CONTEXT LEARNING
148
examples in the context to learn from
Extract all references to countries and their continent in the following text using the format from the examples below.
Example 1: "They played the team called 'Die Mannschaft' in the world cup final"
Correct answer: Germany, Europe
Example 2: "The Three Lions once again lost to Germany in a semi final"
Correct answer: England, Europe, Germany, Europe
Text: "The Selecao was destroyed 1:7 by the DFB selection in their home stadium."
Answer:
149
examples in the context to learn from
Extract all references to countries and their continent in the following text using the format from the examples below.
Example 1: "They played the team called 'Die Mannschaft' in the world cup final"
Correct answer: Germany, Europe
Example 2: "The Three Lions once again lost to Germany in a semi final"
Correct answer: England, Europe, Germany, Europe
Text: "The Selecao was destroyed 1:7 by the DFB selection in their home stadium."
Answer:
150
more prompting strategies
chain-of-thought (CoT)
self-consistency
generate knowledge prompting
prompt chaining (subtasks)
tree-of-thoughts (ToT)
retrieval-augmented-generation (RAG)
…
151
OpenAI
152
pip install openai
153
from openai import OpenAI
import os
os.environ["OPENAI_API_KEY"] = "<YOUR_API_KEY>"
client = OpenAI()
154
# define a system message
system_message = """� You are a world-famous 5-star chef. Based on ingredients the user has at home,
you suggest easy-to-cook recipes. """
155
# define a prompt for the task
prompt = """� Suggest a recipe for lunch.
List of ingredients:
- butter
- eggs
- flour
- salt
- milk
Recipe: """
156
response = client.chat.completions.create(
model="gpt-4o",
messages = [
{"role": "system", "content": system_message },
{"role": "user", "content": prompt },]
}
],
max_tokens=2000
)
157
USER INTERFACES
158
streamlit
https://docs.streamlit.io/ # official documentation
https://streamlit.io/components # third-party extensions
159
pip install streamlit
160
- Home.py
- pages/
- 1_Speech.py
- 2_Webcam.py
- 3_Microphone.py
- lib/
- speech_to_text.py
- text_to_speech.py
- vision.py
161
entry point to our UI
- Home.py
- pages/
- 1_Speech.py
- 2_Webcam.py
- 3_Microphone.py
- lib/
- speech_to_text.py
- text_to_speech.py
- vision.py
162
- Home.py
- pages/
- 1_Speech.py
- 2_Webcam.py
- 3_Microphone.py
- lib/
- speech_to_text.py
- text_to_speech.py
- vision.py
more pages in our app
entry point to our UI
163
- Home.py
- pages/
- 1_Speech.py
- 2_Webcam.py
- 3_Microphone.py
- lib/
- speech_to_text.py
- text_to_speech.py
- vision.py
entry point to our UI
our custom functions we'd like to use from our UI
more pages in our app
164
Home.py
import streamlit as st
st.title("My first UI")
st.write("This is a simple UI for prototyping our application.")
name = st.text_input("Enter your name")
if st.button("Greet me"):
st.write(f"Hello {name} 🤞")
165
Home.py
import streamlit as st
st.title("My first UI")
st.write("This is a simple UI for prototyping our application.")
name = st.text_input("Enter your name")
if st.button("Greet me"):
st.write(f"Hello {name} 🤞")
166
1_Speech.py
import streamlit as st
from pages.lib.text_to_speech import text_to_speech
st.title("Speech demo")
st.write("Enter a text and it will be converted to speech.")
text = st.text_input("Enter some text")
voice = st.selectbox("Select a voice", ["alloy", … "shimmer"])
if st.button("Turn to speech"):
audio_file = text_to_speech(text, voice=voice)
st.audio(audio_file.as_posix(), format="audio/mpeg")
167
1_Speech.py
import streamlit as st
from pages.lib.text_to_speech import text_to_speech
st.title("Speech demo")
st.write("Enter a text and it will be converted to speech.")
text = st.text_input("Enter some text")
voice = st.selectbox("Select a voice", ["alloy", … "shimmer"])
if st.button("Turn to speech"):
audio_file = text_to_speech(text, voice=voice)
st.audio(audio_file.as_posix(), format="audio/mpeg")
from lib/text_to_speech.py
168
from openai import OpenAI
import os
os.environ["OPENAI_API_KEY"] = "..."
client = OpenAI()
def text_to_speech(text, voice="alloy"):
speech_file_path = Path(__file__).parent / "speech.mp3"
response = client.audio.speech.create(
model="tts-1",
voice=voice,
input=text
)
response.write_to_file(speech_file_path)
return speech_file_path
lib/text_to_speech.py
169
from openai import OpenAI
import os
os.environ["OPENAI_API_KEY"] = "..."
client = OpenAI()
def text_to_speech(text, voice="alloy"):
speech_file_path = Path(__file__).parent / "speech.mp3"
response = client.audio.speech.create(
model="tts-1",
voice=voice,
input=text
)
response.write_to_file(speech_file_path)
return speech_file_path
lib/text_to_speech.py
setup OpenAI API
170
lib/text_to_speech.py
from openai import OpenAI
import os
os.environ["OPENAI_API_KEY"] = "..."
client = OpenAI()
def text_to_speech(text, voice="alloy"):
speech_file_path = Path(__file__).parent / "speech.mp3"
response = client.audio.speech.create(
model="tts-1",
voice=voice,
input=text
)
response.write_to_file(speech_file_path)
return speech_file_path
setup OpenAI API
define custom
function
171
2_Webcam.py
import streamlit as st
from pages.lib.vision import ask_gpt4o
st.title("Video camera test")
picture = st.camera_input("Take a picture")
if picture:
st.image(picture)
answer = ask_gpt4o("What is in this picture?", picture)
st.write(answer)
172
2_Webcam.py
import streamlit as st
from pages.lib.vision import ask_gpt4o
st.title("Video camera test")
picture = st.camera_input("Take a picture")
if picture:
st.image(picture)
answer = ask_gpt4o("What is in this picture?", picture)
st.write(answer)
from lib/vision.py
173
lib/vision.py
from openai import OpenAI
import os
os.environ["OPENAI_API_KEY"] = "..."
client = OpenAI()
def encode_image(image_buffer):
def ask_gpt4o(prompt, image_buffer):
image = encode_image(image_buffer)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user", "content": [
{ "type": "text", "text": prompt },
{ "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{image}" } }
],
}
]
)
return response.choices[0].message.content
174
lib/vision.py
from openai import OpenAI
import os
os.environ["OPENAI_API_KEY"] = "..."
client = OpenAI()
def encode_image(image_buffer):
def ask_gpt4o(prompt, image_buffer):
image = encode_image(image_buffer)
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "user", "content": [
{ "type": "text", "text": prompt },
{ "type": "image_url", "image_url": { "url": f"data:image/jpeg;base64,{image}" } }
],
}
]
)
return response.choices[0].message.content
setup OpenAI API
define custom
function
175
3_Microphone.py
import streamlit as st
from streamlit_mic_recorder import mic_recorder
from pages.lib.text_to_speech import speech_to_text
st.title("Microphone test")
def callback():
if st.session_state.my_recorder_output:
audio = st.session_state.my_recorder_output
text = text_to_speech(audio)
st.success(text)
audio = mic_recorder(key='my_recorder', callback=callback)
176
3_Microphone.py
import streamlit as st
from streamlit_mic_recorder import mic_recorder
from pages.lib.text_to_speech import speech_to_text
st.title("Microphone test")
def callback():
if st.session_state.my_recorder_output:
audio = st.session_state.my_recorder_output
text = text_to_speech(audio)
st.success(text)
audio = mic_recorder(key='my_recorder', callback=callback)
pip install streamlit-mic-recorder
177
3_Microphone.py
import streamlit as st
from streamlit_mic_recorder import mic_recorder
from pages.lib.text_to_speech import speech_to_text
st.title("Microphone test")
def callback():
if st.session_state.my_recorder_output:
audio = st.session_state.my_recorder_output
text = text_to_speech(audio)
st.success(text)
audio = mic_recorder(key='my_recorder', callback=callback)
from lib/speech_to_text.py
pip install streamlit-mic-recorder
178
lib/speech_to_text.py
from openai import OpenAI
import os
import io
os.environ["OPENAI_API_KEY"] = "..."
client = OpenAI()
def speech_to_text(audio):
audio_bio = io.BytesIO(audio['bytes'])
audio_bio.name = 'audio.mp3'
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_bio
)
return transcription.text
179
lib/speech_to_text.py
from openai import OpenAI
import os
import io
os.environ["OPENAI_API_KEY"] = "..."
client = OpenAI()
def speech_to_text(audio):
audio_bio = io.BytesIO(audio['bytes'])
audio_bio.name = 'audio.mp3'
transcription = client.audio.transcriptions.create(
model="whisper-1",
file=audio_bio
)
return transcription.text
setup OpenAI API
define custom
function