EN | DE
Last updated: 2026-03-22 · Research notes · ~15 min read

Research: How Surveillance Systems Build Durable Identities

Technical and structural analysis of fingerprinting, identity graphs, metadata, and the economics of modern tracking infrastructure.

Fingerprinting Is a Correlation Problem, Not a Single Data Point

Browser fingerprinting is often described as though it extracts one unique identifier. The reality is more subtle — and harder to defend against. Fingerprinting works by combining many semi-stable parameters, each individually common, into a combination that is rare or unique.

A 2020 survey by Laperdrix et al. in ACM Computing Surveys analyzed 37 fingerprinting attributes and found that combinations of screen resolution, installed fonts, canvas rendering, and timezone alone achieve identification rates above 90% on desktop browsers. The key insight: you don't need high entropy in any single attribute — you need low correlation between attributes.

The five fingerprinting layers

  • Passive (HTTP headers): User-Agent, Accept-Language, Accept-Encoding — transmitted with every request, no JavaScript required.
  • Active (JavaScript API): Screen dimensions, color depth, timezone, platform, hardware concurrency (CPU core count), device memory.
  • Canvas fingerprinting: A hidden HTML5 canvas element is drawn and read back — subtle GPU/driver rendering differences create a unique hash per device.
  • WebGL fingerprinting: Queries GPU vendor strings, renderer details, and 3D rendering behavior. More stable than canvas across browser updates.
  • Behavioral biometrics: Mouse movement velocity, keystroke timing, scroll patterns, and touch pressure (mobile) — these change slowly and are very difficult to spoof.

Mitigation requires attacking all five layers simultaneously. The Tor Browser does this by normalizing most API responses to fixed values and routing through Tor. Firefox with privacy.resistFingerprinting addresses most active layer attacks but not behavioral biometrics.

Identity Graphs: Why Data Becomes More Valuable When Joined

One of the least intuitive aspects of modern data systems is that economic value increases with linkage rather than precision. A rough location pattern, a stable device fingerprint, repeated session timing, and a payment-adjacent event stream become more valuable together than any single accurately-known field.

Identity graphs (also called "identity resolution" or "customer stitching" in industry language) work by probabilistically linking different data fragments to the same real-world person. Key techniques:

"The question is not whether a person is named. The question is whether that person can be recognized, scored, influenced, or filtered with commercially meaningful confidence." — Paul Ohm, Georgetown Law, "Broken Promises of Privacy" (2010)

Metadata Survives Successful Encryption

This is the most persistently misunderstood aspect of communications privacy. End-to-end encryption protects the content of a message. It does not protect the fact that a message was sent, between which parties, at what time, from what location, and of what size.

The 2014 Stanford/Princeton study "Metaphone: The Sensitivity of Telephone Metadata" analyzed call metadata from 500 volunteers and found:

Messaging metadata analysis is equally revealing. Even with Signal's sealed sender feature, network-level traffic analysis — observing when packets are sent and received at the Tor exit node level — can reconstruct conversation timing and frequency with sufficient resources.

Decentralization's Hidden Choke Points

The assumption that decentralized systems are immune to surveillance or censorship is technically incorrect. Every distributed system has dependencies that create de facto centralization:

True censorship resistance requires operating at every layer simultaneously: decentralized naming (ENS, OpenNIC), decentralized hosting (IPFS, Freenet), direct P2P protocols without app store dependency, and funding that doesn't rely on traditional payment rails.