1 of 48

Design Topics in

Instant Messenger

Shijie Zhang

2 of 48

A bit about myself

  • Infrastructure engineer, ~5y
    • Enterprise instant messenger project
    • Auth & resilience: ~10 types of tokens per day
  • Motivations
    • Opportunity to learn big picture

3 of 48

Approach

  • A talk between SD interview and real sys. e.g.
    • Mini scenario: Group chat
    • Most general solution: e.g. storage - relational DB
    • Logical designs, not choosing tech stack
    • No common cross-cutting concerns e.g. sharding, monitoring
    • Q&A anytime

4 of 48

History

Ref 1995-2020

5 of 48

Now

  • Why a folder of chat apps?

6 of 48

Problem def

7 of 48

Group chat - Scenario

8 of 48

Problem def: Comparison on group chat scenario

Snapchat

WhatsApp

Broadcast chatroom

Slack

What we want to design

Max chat group size

256

No membership concept. 11.2M for Royal wedding)

10K level (e.g. Microsoft Teams limitations 25K)

Fixed group membership.

500

Send multi-media

Voice, video, image (filters)

Yes (most typical files)

Chat only and reactions

All file types

No

Anything security / Privacy

Disappearing msgs

E2E encryption

No

No consumer functions. But enterprise compliance

No

History msg on cloud

No

No. All on client devices

No. Even fine to drop msgs. No search func

Yes. Complex query demands

1.Stores on cloud.

2.Search around a time range

9 of 48

Typical IM system architecture

  • Additional connection layer
    • Avoid client to reconnect
    • Separation of concerns

10 of 48

Outline

11 of 48

Group chat - Scenario

12 of 48

Group chat - Basic components

13 of 48

Group chat - Scenario

14 of 48

Group chat - User2 as online receivers

  • Online users
    • Right on chat conversation
  • Push or Pull?
  • Need storage?

15 of 48

Group chat - User2 as online receivers

  • Basic flow

16 of 48

Group chat - User2 as online receivers

  • How do we save group messages?
    • Read amplification
      • Write to each group
    • Write amplification
      • Write to each user
    • Prefer write amplification
      • 1 write: 100 reads, try to make read easier
      • Group size not too big

17 of 48

Group chat - Scenario

18 of 48

Group chat - User3 as offline receivers

  • Offline users
    • On other chats
    • App in background
    • App closed
  • Push or Pull

19 of 48

Group chat - Scenario

20 of 48

Group chat - Message roaming

Online/Offline message buffer

Message roaming

Diff

Sync triggers

  • User opens the app
  • After network jittery
  • User installs a new IM app and scrolls up a group chat

Freq

High

Low

Retention limit

  • Several weeks to months.
  • Upper num of msgs (e.g. 10K)
  • Years or forever
  • No limitation

Solution

Storage

pattern

  • Write amplification
  • Stored by user
  • Read amplification
  • Stored by chat group

Storage

engine

Write throughput, timeout, range query

E.g. Redis SortedSet Ref 1, Ref2

Traditional SQL

Persistence, range query, large vol

E.g.HBase Ref1, Timeline model Ref2

Traditional SQL,

21 of 48

Group chat - Message roaming

  • Message roaming:
    • Get history msgs from group chat on scroll up
  • Get 2 on log in and 3 on history

22 of 48

Overall naive flow

23 of 48

Outline

24 of 48

Group chat - Storage

Online/Offline message buffer

Message roaming

Diff

Sync triggers

  • User opens the app
  • After network jittery
  • User installs a new IM app and scrolls up a group chat

Freq

High

Low

Retention limit

  • Several weeks to months.
  • Upper num of msgs (e.g. 10K)
  • Years or forever
  • No limitation

Solution

Storage

pattern

  • Write amplification
  • Stored by user
  • Read amplification
  • Stored by chat group

Storage

engine

Write throughput, timeout, range query

E.g. Redis SortedSet Ref 1, Ref2

Traditional SQL

Persistence, large vol, range query

E.g.HBase Ref1, Timeline model Ref2

Traditional SQL,

25 of 48

Group chat - Simplified storage requirements

  • Redis vs SQL

  • On user login to a chat app,

  • Requirement1:

For a given user, load all active group chat after last sync

  • Requirement2:

For each active group chat, load msgs

after last sync

26 of 48

Group chat - Initial scheme with write amplification

27 of 48

Group chat - Only store one copy of msg_content

28 of 48

Group chat - Storing only the latest msg id per user/group

Only store last msg id?

29 of 48

Group chat - Storing only the latest msg id per user/group

Could we do

better?

30 of 48

NonFunc

  • Approach:
    • Encapsulated problem
    • Tradeoffs

31 of 48

NonFunc - Realtime

  • Goal:
    • For online receivers, P99 ~ 300ms
  • Three ideas

32 of 48

NonFunc - Realtime - Protocol

Short polling

Long polling

Websockets

MQTT

SSE

Webhook

Streaming protocol

Throughput

Low

Low

Moderate

Low

Moderate

Low

High

Birectional

No

No

Yes

Yes

No

Yes

No

Initiator

Client

Client

Client

Client

Client

Server

Server

Example

use case

REST

REST

Chat/location apps, HTML5

IOT

Live events

update

Email notification

Kafka based logs

  • No messaging specific protocols (XMPP, AMQP, etc)
  • Real time needs to be event driven APIs

33 of 48

NonFunc - Realtime - DB latency

  • Biggest delays usually in storage
    • Asynchronously write to DB
      • Failure?
    • Replace RDBMS with Redis (Or Add) Consistency?
    • ……

34 of 48

NonFunc - Realtime - App layer heartbeat

  • Scenario: Weak network,
    • Online clients’ conn to server broken
    • App in background, not get notifications
  • TCP keep-alive not enough
  • Need app layer heartbeat

35 of 48

NonFunc - Ordering

  • Goals:
    • Consistent order

36 of 48

NonFunc - Ordering

  • Source of misorder:
    • Network / multi-threading / multi-sender/receiver

37 of 48

NonFunc - Ordering

  • Possible approaches
    • No client/sender ts
    • Special clock
    • NTP clock sync (ms level)
    • Message sequence id

38 of 48

NonFunc - Ordering - MessageID

UUID

Meituan Leaf

snowflake

QQ

Unique dim

Global

Global

Global

Per user

Per chat

Increasing

No

Yes

Yes

Yes

Yes

Continuous

No

Yes

No(for security)

No (for perf)

No

Internal

NA

DB step

Snowflake

Customized

Customized

  • Requirements:
    • Ordered per chat
    • Only (time) range query, No complex query like eComm
    • High Perf, reliability

39 of 48

NonFunc - Ordering

  • Buffer queue
  • Received inorder msg

40 of 48

NonFunc - Ordering

  • Buffer queue
  • Msg out of order

41 of 48

NonFunc - Reliability

  • Goal: No miss, No dupe
  • Two ideas

42 of 48

NonFunc - Reliability - Resend and dedup

  • Love good msgId
    • Easy if global unique
    • A bit complicated if per conversation
  • Scenario:
    • Network jittery

43 of 48

NonFunc - Reliability - Local sequence counter

  • Scenario:
    • Server crash

44 of 48

NonFunc - Scalability

  • Large group? 10K size
  • not like telegram 200K
  • Two ideas

45 of 48

NonFunc - Scalability - Batch operations

  • Batch operations

46 of 48

NonFunc - Scalability - Batch operations

  • Database is still bottleneck
    • Delayed msg queue tasks for writing to DB
    • Different speed queue
    • Change DB?

47 of 48

Future topics

  • Anything related to Multimedia: video/audio
  • Popular features in IM products
    • Unread count
    • Typing indicators
    • Sharing real time location
    • Sharing files
    • Message reactions
    • 1:1 chat
  • Popular 2B messenger features
    • Recall message in longer time period
    • Org chart
    • Read receipt
  • Platform level optimization

48 of 48

Summary

References: