2 of 42

Content

Kafka basics recap
Using Kafka's Producer/Consumer API and its limitations
What is Kafka Streams API
Basic feature and architectural overview
Case Study: Trafsys

Problem description and why Kafka Streams was selected
Object status updates

3 of 42

Basic recap

Kafka broker
Topic
Partition
Producer
Consumer
Consumer group

4 of 42

Simple use-case

5 of 42

Simple use-case

6 of 42

Simple use-case

7 of 42

Simple use-case: Inside the application

8 of 42

Extending the use-case

What if we want to:

Count received messages by specific parameter?
Process additional input and calculate outcome based on both?
Calculate statistics for specific time period? (e.g., messages per hour)
Make sure that no message is lost after application crashes?
Make sure no message is processed twice?
Scale the application out/up?

9 of 42

Enter: Kafka Streams API

Application library for stream processing on top of Kafka clusters.

Features:

Streams DSL (stateless/stateful transformations) / Processor API
Fault-tolerant local state
Exactly-once delivery semantics
One-record-at-a-time processing

11 of 42

What is Stream Processing?

Event

12 of 42

Basic Kafka Streams Application Architecture

Foo	Axx	Add	Bar

Foo	Bar

Source processor

Sink processor

Stream

processors

Topology

13 of 42

Topology example

Foo	Axx	Add	Bar

Foo	Bar

Sink processor

Stream

processors

Chart made with: https://zz85.github.io/kafka-streams-viz/

14 of 42

DSL Stateful/Stateless transformations

Stateless

don’t require state
map, flatMap, filter, peek,...

Stateful

use fault-tolerant state stores
aggregation (aggregate, reduce, count), join, windowing

15 of 42

Aggregation 101

Key	Value

16 of 42

Aggregation 101

F

1 \| 1

Key	Value
1	1

17 of 42

Aggregation 101

F	Ba

1 \| 1	2 \| 1

Key	Value
1	1
2	1

18 of 42

Aggregation 101

F	Ba	Foo

1 \| 1	2 \| 1	3 \| 1

Key	Value
1	1
2	1
3	1

19 of 42

Aggregation 101

F	Ba	Foo	Bar

1 \| 1	2 \| 1	3 \| 1	3 \| 2

Key	Value
1	1
2	1
3	2

20 of 42

Aggregation 101

F	Ba	Foo	Bar	Ok

1 \| 1	2 \| 1	3 \| 1	3 \| 2	2 \| 2

Key	Value
1	1
2	2
3	2

21 of 42

Aggregation 101

F	Ba	Foo	Bar	Ok

1 \| 1	2 \| 1	3 \| 1	3 \| 2	2 \| 2

Key	Value
1	1
2	2
3	2

Stream as Table

Changelog of table events

Table as Stream

Snapshot of stream in time

22 of 42

count in detailed view

Create source for input topic
Key selection (groupBy)
Filter wrong keys/values (null)
Create sink for intermediate results
Push data to repartitioned topic (topic with new key-values)
Create source for repartitioned topic
Perform aggregation (count) and save it to count-by-word-length store
Create stream from KTable
Create sink for output topic

23 of 42

State store

Queryable
Per application instance

In-memory
Persistent (RocksDB)

May be backed by changelog topic

Internal compacted topic with unlimited retention period
Is read when application crashes and restarts

24 of 42

Exactly-once delivery semantics

At-least-once

Records are never lost but may be redelivered.
Offsets are committed directly after processing message
Default for every Kafka Streams application

Exactly-once

All processing happens exactly once, including the processing and the materialized state created by the processing job that is written back to Kafka.
With minimum performance impact
Guaranteed only within Kafka processing ;-)

25 of 42

Trafsys:�Graphics Engine

Case study

27 of 42

Old Architecture

OPC Client

Processing Engine

Web Server

Database is the main storage and communication hub

28 of 42

New Architecture

OPC Client

Processing Engine

Web Server

Database still essential and accessed by every component, but not used for messaging anymore

Graphics Engine

29 of 42

How to calculate update for an object?

ObjectType "Emergency phone area":

- TELEPHONE – at its place

- FIRE EXTINGUISHER – at its place

- DOOR – closed --> open

{� "objectId": 1,� "variable": "DOOR",� "state": "CLOSED",� "timestamp": 12341231321�}

{� "objectId": 1,� "variable": "DOOR",� "state": "OPEN",� "timestamp": 12341231322�}

Single state update:

30 of 42

Calculating object coloring

Receive state update message (change of variable on an object)
Replace current variable state for an object
Recalculate current style based on the object's total state

We need to remember state of all variables within the object

31 of 42

Remembering state

Database?

Same problem all over again

In-memory?

How to share locale state between instances?

Kafka!

Use local state stores

32 of 42

Storing calculated state

Already grouped!

33 of 42

Storing calculated state

Key	Value

34 of 42

Storing calculated state

Key	Value
1	{DOOR: OPEN}

{� "objectId": 1,� "variable": "DOOR",� "state": "OPEN"�}

35 of 42

Storing calculated state

Key	Value
1	{DOOR: OPEN, PHONE:OK}

{� "objectId": 1,� "variable": "PHONE",� "state": "OK"�}

36 of 42

Storing calculated state

Key	Value
1	{DOOR: OPEN, PHONE:OK}
2	{DOOR: CLOSED}

{� "objectId": 2,� "variable": "DOOR",� "state": "CLOSED"�}

37 of 42

Storing calculated state

Key	Value
1	{DOOR: CLOSED, PHONE:OK}
2	{DOOR: CLOSED}

{� "objectId": 1,� "variable": "DOOR",� "state": "CLOSED"�}

38 of 42

Querying state

Async updates are just one part of scenario
We need to be able to query state of screen (all objects)

39 of 42

Querying state

Local state is local to application instance
Solution: GlobalKTable

Backed by single-partitioned topic
Updated via separate background process
State is replicated across all instances

Lower performance / higher network usage
Higher memory consumption

40 of 42

Performance

Parallelization

Topic with multiple partitions
Multiple application instances
Multiple threads per instance

Consumer lag monitoring

Number of offsets between produced and consumed data

41 of 42

Summary

Leaky abstraction (Hibernate of stream processing world)
Powerful but complex API
Best choice for writing complex stream processing applications
Scalability out of box
Well maintained

Bugfixes
Improvements

1 of 42

2 of 42

3 of 42

4 of 42

5 of 42

6 of 42

7 of 42

8 of 42

9 of 42

10 of 42

11 of 42

12 of 42

13 of 42

14 of 42

15 of 42

16 of 42

17 of 42

18 of 42

19 of 42

20 of 42

21 of 42

22 of 42

23 of 42

24 of 42

25 of 42

26 of 42

27 of 42

28 of 42

29 of 42

30 of 42

31 of 42

32 of 42

33 of 42

34 of 42

35 of 42

36 of 42

37 of 42

38 of 42

39 of 42

40 of 42

41 of 42

42 of 42