1 of 50

WhatsApp

2 of 50

Motivation

  • Curiosity, been using WhatsApp for years
  • Building a bot for fun and profit
  • No one uses Telegram

2/50

3 of 50

Research

Something something Art of War

4 of 50

Agenda

  • Understand how the app communicates with the server
  • Understand how messaging works (DMs, groups, media)

  • We’re not going to cover WhatsApp Web

4/50

5 of 50

Architecture

5/50

Client 1 control stream

Client 2 control stream

Message stream

Message stream

WWW

leave_group

set_profile_picture

video

recording

text

E2E

6 of 50

Rules

  • We have more than one device
  • At least one of them is rooted
  • We don’t have access to the app’s source code
  • We don’t have access to the server or its source code
  • Just make it work (use any possible shortcut)

6/50

7 of 50

Getting started

  • Official website

7/50

8 of 50

Getting started

  • Official website
  • Download the apk
    • Open zip

8/50

9 of 50

Getting started

  • Official website
  • Download the apk
    • Open zip
    • Prepare jadx

9/50

10 of 50

Getting started

  • Official website
  • Download the apk
    • Open zip
    • Prepare jadx

10/50

11 of 50

Getting started

  • Official website
  • Download the apk
    • Open zip
    • Prepare jadx
  • Blog posts and old projects
    • Yowsup
    • Chat-API

11/50

12 of 50

Getting started

  • Official website
  • Download the apk
    • Open zip
    • Prepare jadx
  • Blog posts and old projects
    • Yowsup
    • Chat-API
  • Record app internet traffic (sniffing) using this app and analyzing it using Wireshark

12/50

13 of 50

Sniffing

13/50

14 of 50

Sniffing

14/50

?

Length = 255 (0xff) bytes

15 of 50

Sniffing

15/50

?

16 of 50

Protobuf

  • Serialization protocol
  • Predetermined message layout
  • Generating a serialization code for a specific message layout

16/50

17 of 50

Decoding messages

  • Constant round size, maybe some headers or encryption keys
  • Content seems to be random and after one round trip all communication seems random
  • Probably encryption keys

17/50

18 of 50

Encryption

18/50

19 of 50

Noise

  • Noise spec

19/50

20 of 50

Noise

  • Noise spec

  • Key exchange
  • Too complicated
  • Found a Noise client/server code written in python
  • Wrote a client using the server
  • Wrote a server using own client
  • A quick way to learn a protocol

20/50

21 of 50

Key exchange

Global scheme:

n (field), g (generator)

g

Alice:

generates random x

computes gx mod n

x (Alice’s private key)

gx (Alice’s public key)

Bob:

generates random y

computes gy mod n

y (Bob’s private key)

gy (Bob’s public key)

using y, gx computes gxy

using x, gy computes gxy

gxy (shared secret)

gxy (shared secret)

gx (Alice’s public key)

y (Bob’s private key)

gy (Bob’s public key)

x (Alice’s private key)

22 of 50

Key exchange

Global scheme:

n (field), g (generator)

g

Cannot compute goal:

gxy

???

Goal:

gxy (shared secret)

Eve:

Gets

Gets

gx (Alice’s public key)

gy (Bob’s public key)

23 of 50

Key exchange (recap)

  • Each of the two participants generates a private key and a public key
  • The public keys are exchanged
  • A secret can be generated from one private and one public key (the same in both)
  • The secret cannot be inferred without any private key

23/50

24 of 50

Noise Summary

  • Uses Curve25519
  • Two pairs for each party
    • Static for identity verification (like certificates in HTTPS)
    • Ephemeral for secrecy (e.q. encryption)
  • Lightweight (unlike TLS/SSL)
  • Always encrypt with ephemeral, to ensure secrecy
  • Two modes:
    • Client doesn’t know
      • The server must provide some sort of proof of authenticity
    • Client already knows the server’s static key
      • Authenticity is ensured
  • All four keys are mixed together to generate the shared secret
  • Encryption using AES GCM with chain key

24/50

25 of 50

Encryption library

25/50

?

26 of 50

Patching

  • Everything is data, even code
  • If we can change any file, we can also change any code
  • Because code is usually already compiled and assembled, it might be hard to do so in practice
  • Small changes are relatively easy to do

26/50

27 of 50

Patching

27/50

28 of 50

Applying

28/50

29 of 50

Encoding

29/50

30 of 50

Encoding

  • Same as older versions, wrapped in Noise

30/50

Indicates a successful login, even though we didn’t provide any credentials…

Where are they?

31 of 50

Requests and events

31/50

32 of 50

End-to-End encrypted messaging (E2E)

32/50

33 of 50

Signal Protocol

33/50

Alice:

requests Bob’s public keys from server

generates random x

computes shared secret #1 and sends the encrypted message along with her generated public key

shared #1

“What’s up?”

Bob:

computes shared secret #1

generates a new key y

computes shared secret #2 using the newly generated key and Alice’s most recent key, combines it with the old secret and discards the old one

send the encrypted message along with the new public key

Alice:

computes shared secret #2

generates a new key z

computes shared secret #3

“Nothing much”

shared #2

shared #3

.

.

.

34 of 50

Signal Protocol

34/50

35 of 50

Signal Protocol (recap)

  • X3DH
  • 3 key pairs for each party
    • Identity key, generated at install time, forever
    • Signed pre-key, generated every 30 days, signed using the identity
    • One-time keys, publics are stored at the server, generated when the server runs out
  • Sender asks for the identity key, pre-signed key and a single one-time key of the recipient from the server
  • New public keys are generated and added to the encryption key every round trip
  • Old encryption keys are discarded
  • Backwards secrecy
  • Forwards secrecy

35/50

36 of 50

Group messaging

  • We don’t want to encrypt the same message with many different keys (E2E to each member)
  • Each participants generates a ‘sender encryption key’ and shares it with each member of the group using E2E messaging

36/50

Group

Alice

Carol

Bob

Dave

‘Hi’

‘Hi’

‘Hi’

‘Hi’

37 of 50

Group messaging

  • We don’t want to encrypt the same message with many different keys (E2E to each member)
  • Each participants generates a ‘sender encryption key’ and shares it with each member of the group using E2E messaging
  • Encrypt using ‘sender encryption key’ (chain key)
  • Just make sure to replace your key when someone is removed from the group
  • Took it too literally, got a few numbers blocked

37/50

38 of 50

Media

  • Media is encrypted and uploaded to the server
  • Key is shared using Private/Group messaging

38/50

39 of 50

Live Location

  • Normally, many messages are sent but none are received
  • More efficient chaining using a grid
  • <image of 1d chain, with count>
  • <image of 2d chain, highlighted path to multiple results, with count>

39/50

40 of 50

Registration and Login

  • Done in HTTPS (same as older versions)
  • Harder to sniff (self signed certificate)
  • Static analysis

40/50

41 of 50

Registration and Login

  • Done in HTTPS (same as older versions)
  • Harder to sniff (self signed certificate)
  • Static analysis
  • https://v.whatsapp.net/v2/code?ENC=B1JA59mdz_mVEJjS8B1E...

41/50

42 of 50

ENC

  • Generating a key pair and encrypting the query string with it
  • Base64
  • cc=972&in=521231234&lg=en&lc=US&hasav=1&token=CZy7UfX...scbE%3d&method=voice&mistyped=6&authkey=lNOCX%2dv...0IQ1GI

42/50

43 of 50

token

  • Must be on every HTTPS request for some reason

43/50

PkTwKSZqUfAUyR0rPQ8hYJ0wNsQQ3dW1+3SCnyTXIfEAxxS75FwkDf47wNv/c8pP3p0GXKR6OOQmhyERwx74fw1RYSU10I4r1gyBVDbRJ40pidjM41G1I1oN

b

md5(classes.dex)

44 of 50

Login

  • Using the ‘exist’ action
  • e_ident is the client identity key
  • e_skey is the signed pre key used in messaging
  • Now the Noise handshake itself is the login
  • The server associates our static key with our phone number
  • authkey is the static client key in the Noise handshake

44/50

https://v.whatsapp.net/v2/exist?ENC=A8q2fyeB2h...

cc=972&in=521231234&lg=en&lc=US&hasav=1&token=CZy7...cbE%3d

&e_regid=Qf35kA&e_keytype=BQ&e_ident=T93...f4Hc

&e_skey_id=AAAA&e_skey_val=qS3lH...a7nw&e_skey_sig=NPrW...CBjQ&authkey=76qJ...8DGj0

45 of 50

Recap

  • We sniffed
  • Implemented the Signal protocol
  • Found Protobuf
  • Decoded binary XML (and PHP code)
  • Duped the keys
  • Made Noise
  • Hashed the WhatsApp logo
  • Did some static analysis to uncover the registration process
  • Explored how group messaging and media works

45/50

45/61

46 of 50

Actual Applications

  • This is where the fun begins
  • What would you do?
    • WhatsApp over TCP
    • Faceoff

46/50

47 of 50

47/50

48 of 50

48/50

49 of 50

Insights

  • Recon (white papers, ChatAPI)
  • Think before you choose (static analysis of obfuscated code vs. sniffing)
  • Learning via implementation (doing is better than reading)
  • You can patch binaries for research even if you’re not going to use it in the final product
  • Even small/leading/innovative companies have a lot of legacy code

49/50

50 of 50

שאלות?