1 of 48

Kafka, Producer 부터�Consumer 까지

2 of 48

Index

Producer
Broker
Consumer
Q&A

3 of 48

Producer

4 of 48

Producer Code

5 of 48

KafkaProducer 생성

6 of 48

KafkaProducer 생성

Accumulator�사용자가 send한 Record를 메모리(RecordBatch)에 차곡차곡 쌓아주는 역할

Network Thread�Accumulator에서 만든 RecordBatch를 Broker로 전송하는 역할

7 of 48

Producer Code

8 of 48

Accumulator

9 of 48

Accumulator

record를 topic + partition 별로 모아두는 대기 공간(?)

Accumulator에서 사용하는�메모리 크기

10 of 48

Network Thread

11 of 48

Network Thread

12 of 48

Network Thread

max.in.flight.requests.per.connection = 3

max.in.flight.requests.per.connection = 1

왼쪽이 Network Thread, 오른쪽이 Broker

13 of 48

Broker

14 of 48

어떻게 저장되나

[Topic name]-[partition] 폴더 구조

15 of 48

어떻게 저장되나

Segment 단위로 파일 저장

*.index, *.log, *.timeindex

16 of 48

어떻게 저장되나

Topic

Partition

Segment

17 of 48

어떻게 저장되나

출처: https://thehoard.blog/how-kafkas-storage-internals-work-3a29b02e026

18 of 48

어떻게 저장되나

19 of 48

어떻게 저장되나

20 of 48

Consumer

21 of 48

Consumer Code

22 of 48

Kafka Consumer 생성

23 of 48

Kafka Consumer 생성

Fetcher�poll 함수가 실행되면 적절한 크기의 records 리턴하고,�내부에 records가 없다면 Broker에게 records를 요청하고 저장.�그리고 적절한 크기의 record 리턴하는 역할�
Coordinator�어떤 토픽, 파티션을 consume 할지 �Broker의 group coordinator와 통신하는 역할�heartbeat, offset commit, consumer group join 도 합니다~

24 of 48

데이터 가져오기

25 of 48

데이터 가져오기

26 of 48

데이터 가져오기 (fetcher에 records가 있는 경우)

27 of 48

데이터 가져오기 (fetcher에 records가 있는 경우)

28 of 48

데이터 가져오기 (fetcher에 records가 없는 경우)

29 of 48

데이터 가져오기 (fetcher에 records가 없는 경우)

30 of 48

데이터 가져오기 (fetcher에 records가 없는 경우)

max.partition.fetch.bytes: 하나의 topic + partition에서 가져 올 데이터의 최대 크기, default 1MB

fetch.min.bytes: 하나의 Broker에서 가져올 데이터의 최소 크기, default는 바로 리턴, default 1byte

fetch.max.wait.ms: fetch.min.bytes만큼 쌓일때까지 기다리는 최대 시간, default 500ms producer의 linger.ms 와 비슷하지 않나요?

�fetch.max.bytes: 하나의 Broker에서 가져올 데이터의 최대 크기, record batch 단위로 가져오기 때문에 대략적인 값이다, default 50MB�

ex) 하나의 Broker에서 3개(TopicA_0, TopicA_2, TopicB_1) 파티션 데이터를 가져온다고 가정 하면 Broker에서는 아래와 같이 동작

readPartitionInfos = List((TopicA, 0), (TopicA, 2), (TopicB, 1))

maxBytes = fetch.max.bytes

readPartitionInfos.foreach{ case(topic, partition) =>

size = math.min(max.partition.fetch.bytes, maxBytes)

result += read(topic, partition, size)

maxBytes -= size

}

31 of 48

데이터 가져오기 (fetcher에 records가 없는 경우)

32 of 48

데이터 가져오기 (fetcher에 records가 없는 경우)

33 of 48

부록

34 of 48

Consumer Rebalance

35 of 48

Consumer Rebalance

36 of 48

Consumer Rebalance

37 of 48

Consumer Rebalance

38 of 48

Consumer Rebalance

39 of 48

Consumer Rebalance

40 of 48

Consumer Rebalance

41 of 48

Consumer Rebalance

42 of 48

Consumer Rebalance

43 of 48

Consumer Group offset

44 of 48

Consumer Group offset

Ver. 0.9 미만 에서는 zookeeper에 consumer offset을 저장

참조 : https://elang2.github.io/myblog/posts/2017-09-20-Kafak-And-Zookeeper-Offsets.html

45 of 48

Consumer Group offset

Ver. 0.9 이상 __consumer_offset 토픽을 사용��

Group name	Topic name	partition	offset	Commit time
test-01	my-topic	1	0	1551191950
test-02	my-topic	1	0	1551193842
test-01	my-topic	1	10	1551203421
test-01	my-topic	1	19	1551243229

OLD

NEW

46 of 48

Consumer Group offset

47 of 48

못다한 이야기

Exactly-Once(Transaction)
LogCleaner and Log Compaction
Purgatory
Controller in Broker and Leader Election
Metrics
Kafka Connect

48 of 48

감사합니다.