1 of 6

Some Thoughts on Client Side Scanning for CSAM

Nicholas Weaver

2 of 6

Scanning for known CSAM

  • The status quo: Server side (semi) voluntary bulk scanning using secret & proprietary image hashing algorithms
    • Algorithms are secret to prevent evasion
    • Not all companies do this: Apple clearly doesn’t based on congressional testimony
  • Detects the “bottom feeders” rather than the sophisticated attackers
    • Sophisticated attackers have Tor hidden services...
    • But attackers need to become savvy: this probably detects a lot early on
    • Hypothesis is this is valuable, but we need more data from NCMEC or DOJ to verify this
  • Even the most token encryption, even with “exceptional” access, eliminates this surveillance
    • If we want to continue this practice the scanning must be built into the applications or operating systems

3 of 6

Concept of Client-Side Scanning

  • Client software computes a hash of each image
    • Checks for a probabilistic match in a bloom filter with single-sided error and which does not actually keep the hashes
  • If local database match…
    • Send complete hash to central authority to check for exact match
  • If remote hash matches…
    • Send complete image to central authority in an automatic report
  • When database is updated…
    • Include in the existing OS/app update channel
    • Check the already computed hashes to see if they are now in the updated database
  • Can queue-up any online checks until there is available network connections

4 of 6

Is Evasion A Realistic Problem?

  • Current view is that it is hard to make client-side scanning resist evasion
    • Resulting in straw-men argument that you need two-party computation or similar...
    • But sophisticated adversaries should already avoid any platform which does scanning
  • Probably only need to deal with “inadvertent” evasion
    • Images transcoded to a different compression factor, a smaller size, or (perhaps) cropped
    • Plus only a couple of compression algorithms need to be considered:�Can probably take advantage of the block-structure of common JPEG encoding routines
  • Again, more data necessary
    • It may be that there is no need to be robust even to inadvertent evasion:�Perhaps we could just use SHA-3?
    • Also useful for the conceptual analysis of the actual privacy harm: �IF it is just SHA-3, or re-JPEG at low quality & then SHA-3 the blocks, �there are no final false positives (assuming SHA-3 isn’t broken)

5 of 6

Are These Reasonable Searches? (Warning: Troll)

  • Current argument is that they are “voluntary” by the service providers…
    • Needed for the current fiction that these aren’t government searches.
  • Routine searches where the government interests are substantial and the burden otherwise minimal are already accepted in 4th-amendment law and society and regularly survive court challenge
    • Airport security, DUI checkpoints, inland “border” checkpoints, all of which are far more invasive
  • Burden really is minimal
    • A fraction of a CPU-second per image to generate the hash: impact is less than a watt/second of energy
    • 4 MB of bloom filter can match against 1M images with a false positive rate of 1 in 1,000,000 without making the hashes actually available to the client
      • A bloom filter FP only send the whole hash to the central authority
    • Can limit the slippery slope/auditability by including notification to the user!
    • And certainly far less oppressive a global surveillance system than Google...

6 of 6

Questions We Need NCMEC To Answer

  1. How many entries are in the database? �(With various versions of “uniqueness”, not just SHA-256/SHA-3 hash)
  2. How are the reported images modified in practice?
    • What percentage are completely unchanged?
    • What percentage are transcoding? Rescaled?
    • What percentage are cropped or otherwise changed?
  3. How many individuals are reported (rather than just images)?
  4. Does the reporting work?
    • Take a significant (~100-1000) random sample of reports involving US persons that are two years old
    • Evaluate what the outcomes were:
      1. Number of offenders who are arrested, indicted, and convicted
      2. Number of offenders arrested/indicted/convicted for actual abuse, not just CSAM possession