JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 6

Some Thoughts on Client Side Scanning for CSAM

Nicholas Weaver

2 of 6

Scanning for known CSAM

The status quo: Server side (semi) voluntary bulk scanning using secret & proprietary image hashing algorithms

Algorithms are secret to prevent evasion
Not all companies do this: Apple clearly doesn’t based on congressional testimony

Detects the “bottom feeders” rather than the sophisticated attackers

Sophisticated attackers have Tor hidden services...
But attackers need to become savvy: this probably detects a lot early on
Hypothesis is this is valuable, but we need more data from NCMEC or DOJ to verify this

Even the most token encryption, even with “exceptional” access, eliminates this surveillance

If we want to continue this practice the scanning must be built into the applications or operating systems

3 of 6

Concept of Client-Side Scanning

Client software computes a hash of each image

Checks for a probabilistic match in a bloom filter with single-sided error and which does not actually keep the hashes

If local database match…

Send complete hash to central authority to check for exact match

If remote hash matches…

Send complete image to central authority in an automatic report

When database is updated…

Include in the existing OS/app update channel
Check the already computed hashes to see if they are now in the updated database

Can queue-up any online checks until there is available network connections

4 of 6

Is Evasion A Realistic Problem?

Current view is that it is hard to make client-side scanning resist evasion

Resulting in straw-men argument that you need two-party computation or similar...
But sophisticated adversaries should already avoid any platform which does scanning

Probably only need to deal with “inadvertent” evasion

Images transcoded to a different compression factor, a smaller size, or (perhaps) cropped
Plus only a couple of compression algorithms need to be considered:�Can probably take advantage of the block-structure of common JPEG encoding routines

Again, more data necessary

It may be that there is no need to be robust even to inadvertent evasion:�Perhaps we could just use SHA-3?
Also useful for the conceptual analysis of the actual privacy harm: �IF it is just SHA-3, or re-JPEG at low quality & then SHA-3 the blocks, �there are no final false positives (assuming SHA-3 isn’t broken)

5 of 6

Are These Reasonable Searches? (Warning: Troll)

Current argument is that they are “voluntary” by the service providers…

Needed for the current fiction that these aren’t government searches.

Routine searches where the government interests are substantial and the burden otherwise minimal are already accepted in 4th-amendment law and society and regularly survive court challenge

Airport security, DUI checkpoints, inland “border” checkpoints, all of which are far more invasive

Burden really is minimal

A fraction of a CPU-second per image to generate the hash: impact is less than a watt/second of energy
4 MB of bloom filter can match against 1M images with a false positive rate of 1 in 1,000,000 without making the hashes actually available to the client

A bloom filter FP only send the whole hash to the central authority

Can limit the slippery slope/auditability by including notification to the user!
And certainly far less oppressive a global surveillance system than Google...

6 of 6

Questions We Need NCMEC To Answer

How many entries are in the database? �(With various versions of “uniqueness”, not just SHA-256/SHA-3 hash)
How are the reported images modified in practice?

What percentage are completely unchanged?
What percentage are transcoding? Rescaled?
What percentage are cropped or otherwise changed?

How many individuals are reported (rather than just images)?
Does the reporting work?

Take a significant (~100-1000) random sample of reports involving US persons that are two years old
Evaluate what the outcomes were:

Number of offenders who are arrested, indicted, and convicted
Number of offenders arrested/indicted/convicted for actual abuse, not just CSAM possession