DRAFT: Browser Fingerprinting

A general overview of browser fingerprinting and resistance methods

date: NOV 2025

author: thorin

peer review: (many thanks)

Pier Angelo Vendrame (Tor Project)
Tom Ritter (Mozilla)
add more names here

PART 1: TRACKING
PART 2: FINGERPRINTING
PART 3: FINGERPRINTING RESISTANCE
PART 4: TESTING

FOREWORD

This article's use of the word fingerprint refers to browser fingerprinting, also known as device fingerprinting, unless explicitly stated, and focuses on active (also known as client-side or browser) JavaScript fingerprinting.

PART 1: TRACKING

Before we talk about fingerprinting, we need to know what its purpose is, and how that constitutes a threat. Some of those purposes are to help detect bots and prevent fraud. Another is to facilitate web tracking. It is this later purpose that this article is mostly interested in, although the others come into play with fingerprint resistance techniques.

Web Tracking

So what is web tracking and why is it a threat? Web tracking ^wikipedia is “the practice by which operators of websites and third parties collect, store and share information about visitors' activities on the World Wide Web”. This information can be analysed to build profiles based on a UUID (Universally Unique Identifier) ^wikipedia.

A profile could be based on a device, a browser, an IP address, to name a few, or any combination thereof. It's merely an end-point on which to assign the information and doesn't necessarily mean an individual, e.g. an IP address may be used by many people and/or on many devices, and a browser may be used by more than a single person.

Keep in mind that this article is about browser fingerprinting, and will focus on browsers and individuals.

Profile data points can quickly de-anonymize a person. A 2019 paper ^nature.com reports that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. And any information can be used against you, from pricing to discrimination to influencing algorithms to prosecution (and what may be legal today, might be illegal in the future).

“If you give me six lines written by the hand of the most honest of men, I will find something in them which will hang him.” - (disputed) Cardinal Richelieu, prime minister for King Louis XIII ^wikiquote

“Show me the man and I’ll show you the crime.” - Lavrentiy Beria, Stalin’s secret police chief ^wikipedia

There is also usually no way to correct or remove any of the information should it be erroneous, especially if no-one knows what it is or that it even exists. One example is no-flight lists. Another is gang databases. There are examples of this being frequently abused and erroneous, even leading to incarceration ^techdirt.

Note that the lack of a profile can also be used to influence results: e.g. rewards, promotions, pricing algorithms. The world isn't perfect, sorry.

State Tracking

There are other web tracking techniques such as navigational, correlation/timing, network fingerprinting, logging in including with a SSO (Single Sign-On) .. to name some more. In this section we will focus on state tracking to illustrate how tracking works.

So what is state tracking? It refers to anything client-side (your browser) that is created and persisted and used by websites. It can be stored in memory for the session, or written to disk. So state basically means something created client-side that the website can ask for and use. A typical example is a "cookie".

Server-assigned UUIDs are cheap to generate and guaranteed to be unique. And state tracking is/was a cheap, easy and reliable way for trackers to work. Whilst commercial surveillance will always take the lowest hanging fruit and cheapest options when they can, browser mitigations are becoming standard, old web standards updated for more privacy, and consumers more privacy conscious.

Let's look at a simplistic generalized example. First party refers to the website you visit, such as supercutekittens.com. Third party would be any other website that is connected to, e.g. providing content such as scripts, images, widgets, discussion boards, avatars, and adverts.

Example (state and IP address tracking)

You visit supercutekittens.com, which checks if you have a "cookie" with an assigned UUID, and if not it gives you one (first party). They also log your IP address and the date and time and will associate these with your UUID.
Whilst on supercutekittens.com, your browser also connects to tracker.com (third party) which does the same (checks for or sets a "cookie", logs your IP address and the date and time) and notes that you are at supercutekittens.com
Now you visit your local news site, a travel agency and a pregnancy help center: all of which also connect to tracker.com which asks for it's "cookie", gets the UUID it assigned you, and takes note of what sites you visit (sometimes even what specific web page) and when, along with your IP and timestamps.
This is how a third party tracker can accumulate information: it gave you a UUID and every site it has a presence on, it recognises you and adds more information to your profile. And if a "cookie" is due to expire, they can just replace it with a new one with what is known as a prolongation attack.
The third party tracking is the danger here as it links your traffic. Whilst first party tracking can also be used to re-identify, this is not what we're focused on.
Now as you browse the web you might start seeing adverts for travel deals, new-born baby supplies ^{Time Magazine}, and of course ... super cute kittens 🐈.

State data is not just "cookies". There are other "site data" APIs including localStorage and IndexedDB. And the way to "reset" the tracker, is to sanitize i.e you clear the tracker's "site data". In our example, the tracker wouldn't recognise you, as there is no UUID for that tracker, so the process starts over.

However... there are a lot of state APIs (which we will cover below), and anything and everything that can be state has been shown to either have been used (or able to be used) to create what is termed a supercookie or zombie cookie ^wikipedia. A zombie cookie is a piece of data placed on the device or in memory (i.e it has state) that is similar but not a regular HTTP cookie, with the purpose of re-identifying a user even if they sanitize cookies (and site data).

An example of a zombie cookie technique would be utilizing ETags ^wikipedia. ETags are used to determine whether the client-side content is the same as the server-side content. They are IDs sent by the server as headers in the HTTP response, and the client sends them back in subsequent requests to the same resource. A server can generate a unique ETag when a user first visits a website and then use that ETag to identify the user in subsequent visits, even if cookies are cleared. This is because the ETag is stored separately from cookies and is automatically sent by the browser in requests to the same resource.

Example (ETag)

You visit supercutekittens.com
Whilst on supercutekittens.com, your browser also connects to tracker.com which provides a tracker.png with a unique ETag. This resource is not in your cache, so you cache it with the unique ETag
Now you visit your local news site, a travel agency and a pregnancy help center: all of which also connect to tracker.com which provides the same tracker.png. This resource is in your cache, so your browser automatically responds that you have this resource, along with it's ETag value.
You sanitize all "site data"
You visit supercutekittens.com, where tracker.com provides a tracker.png. This resource is in your cache, so your browser automatically responds that you have this resource, along with it's ETag value.
You were "re-identified even though you sanitized cookies and "site data".

The "zombie cookie" refers to the re-identification being used to help re-create a cookie with your original UUID, and tracker.com will carry on accumulating data to your existing profile. And then there's a whole world of data brokers behind the scenes working feverishly to enlarge and link shadow profiles to profiles - not to mention first party sites selling data to said brokers.

IP Addresses

IP addresses can also be used to help link traffic. An IP addresses doesn't have to be exact, it can simply be collected and later rendered into something more stable, e.g. known tor exit node, VPN company X, proxy, or even ISP.

Keep in mind, that whilst an IP address is part of fingerprinting, this article is focused on client-side or JavaScript fingerprinting.

Tracking Mitigations

Over the years as users have become more privacy savvy, browsers and services have been increasingly working towards tracking solutions

IP address
- VPNs - traffic is mixed with hundreds and thousands of others and the user can change endpoints and even VPN providers (and per app)
- tor - traffic is obscured by being assigned a random exit node per website per session and on new identity
sanitizing (clearing of "site data")
- Incognito and Private Browsing windows
- Sanitizing on close with exceptions
state partitioning
- Tor Project and Mozilla developed what is known as FPI (First Party Isolation) for Tor Browser
  - Every website as a first party would have everything required for that site isolated to itself
  - First party is defined (or keyed) by eTLD+1 ^MDN, scheme (e.g http, https), port and type of window (normal, private). In normal windows when using containers (a Firefox only feature), it is also keyed by a contextID - i.e the same site visited in a private window, a normal window, and a container are different partition keys - i.e the same site visited as secure and insecure are different partition keys
- Browsers eventually took this concept and adapted it
- In Firefox, for example this is made up of
  - TCP ^{Total Cookie Protection}, also known as dFPI (dynamic First Party Isolation) which allows some state to be relaxed in some circumstances: e.g. in a cross-site login flow to enable a cross-site login
  - network partitioning which covers everything else
- The State Partioning tests section at privacytests.org lists all the different types of state covered
other
- blocking known trackers
- deprecating (or blocking) third party "cookies"
- Cookies Having Independent Partitioned State ^MDN

So how does this work in reality? Let's revisit our example.

Example (third party and IP address tracking mitigations)

You're using a reputable VPN or Tor Browser.
In a normal window you visit supercutekittens.com, your local news site, a travel agency and a pregnancy help center (first party), where each site logs your IP address and timestamps and checks for any existing "state" or sets some to track you.
You open a private or incognito window and visit supercutekittens.com again.
Each site also connects to tracker.com (which in this example wasn't blocked by the browser's tracker blocker). On each site tracker.com also logs your IP address and timestamps and checks for any existing "state" or sets some.
All state is double-keyed to the requesting site's origin (our defined first party)
- on all five sites (one site was in two different window modes), tracker.com's state was partitioned to that first party key, and it can only set and retrieve it's own state that matches that key: e.g. on supercutekittens.com it found nothing, generated a UUID and set a cookie for tracking and then on the news site, it found nothing, generated another UUID and set a cookie for tracking and so on
Now you close the browser and sanitize on close (Private Browsing and Incognito windows automatically sanitize)
Visit any of the four sites again, and both the first party and tracker.com will have no state, so they start over
Any logging of IP address will be either hidden in the noise of hundreds or thousands of other VPN users' traffic or different per first party per session
Now as you browse the web you probably won't start seeing adverts for travel deals, new-born baby supplies, and alas ... super cute kittens 🐈. If you do, that's likely a coincidence.

So we won? 🎉 Unfortunately, no! And what replaces the effective death of third party state tracking is even worse, because you have no control over it.

PART 2: FINGERPRINTING

In Part 1 we talked about mitigating state tracking. Say hello to stateless tracking. Because it is stateless, sanitizing has no effect, partitioning has no effect, and to make matters worse, you have little to no control over it as a user.

In order to resist fingerprinting, we need to know what it is, how it works, and how effective it can be. It also helps to know what fingerprinters want and how they work.

“Know your enemy...” - Sun Tzu

“Think evil to defeat evil.” - thorin

What is a fingerprint?

A fingerprint is a collection of metrics (properties and values) that the browser provides automatically, when asked for or after interaction. There are two types

passive (server-side): traffic originates from the browser
- e.g. HTTP Headers, SSL/TLS fingerprints, TCP/IP stack fingerprints, IP address, requests for assets
active (client-side): the fingerprinter initiates connections with the browser
- e.g. JavaScript

Passive metrics are provided automatically, and active metrics are requested. Some active metrics require interaction; such as permission prompts, user gestures (transient user activation), or user specific activity (typing).

While passive metrics also need to also be addressed, many are duplicitous and automatically reflect the active metric (e.g. navigator.languages, navigator.userAgent are also provided in HTTP Headers) or they are equivalency of values we cannot hide or lie about, or they hold no entropy.

To keep things simple, this article is focused on active fingerprinting.

Have you heard of "Where's Wally?" ^wikipedia. Think about how you search for Wally. You scan the image for a distinctive red and white bobble hat. When you find one you then check his hair is brown, then if he's wearing glasses, a red and white stripped top, blue pants, has a walking cane and finally brown boots. As soon as you find one missing, you stop checking and you start over with the next red and white bobble hat wearer. Eventually you find Wally. This assumes that the correct combination of hat, hair, glasses, top, pants, cane and boots are enough to make Wally unique in this crowd.

But fingerprinters are not trying to find Wally, they're trying to find everyone. So unlike "Where's Wally" where you stop checking after a matching characteristic fails, fingerprinters treat all users the same and check all the metrics. And they check as many metrics as they can or deem enough. A fingerprinting script will typically get 50 to 100 metrics (depending on how you count them) in 100-150ms.

How does a fingerprint work?

When JavaScript is enabled, your browser provides information, consistently and on demand. It doesn't need to be in any state (sanitizing won't work), and that's why you have little to no control over it.

The combined metrics (information) create fingerprints. There isn't a standard format for a fingerprint, and a fingerprint can change over time (e.g. the userAgent changes with an update or fonts change). A fingerprinter will want to store all the details for later data analysis, metric weighting and other tooling. A fingerprint can be a subset of the metrics, and metrics altered for linkability. A fingerprint is typically hashed ^wikipedia for comparison.

In this example, if a metric's value were to change or we add or remove metrics, the hash would also change.

Example (some common metrics)

fingerprint = {
devicePixelRatio": 1,
languages: ["en-US", "en"],
maxTouchPoints: 10,
"prefers-color-scheme": "dark",
screen_height: 1440,
screen_width: 2560,
timeZoneName: "America/Phoenix",
userAgent: "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:135.0)Gecko/20100101 Firefox/135.0",
}

If we only check timeZoneName, we would have somewhere around 70 possible values or buckets. There are almost 600 valid timeZoneNames ^IANA, but platforms typically limit choices to some form of metaZones ^github). Each bucket would most likely have many potential browser users - the number in each would vary. As we combine more metrics, the higher the number of possible buckets as users would start to splinter into smaller buckets. The aim of the fingerprinting game is to get everyone into their own bucket - i.e unique.

Let's look at desktop Tor Browser

3 userAgents (one each for Windows, Mac, Linux)
8 likely screen dimensions with letterboxing ^{tor project} and you don't resize the window
- more like 18 possible but we'll ignore portrait and older/legacy monitor sizes
41 languages
984 (3 x 8 x 41) potential buckets in Tor Browser so far and we've only checked three metrics, and those are metrics we've added fingerprint resistance to!

Every additional metric that differentiates any browser users multiplies the potential buckets. Even a binary choice such as prefers-color-scheme (light or dark) doubles it.

Keep in mind that not all metrics are independent. This limited and conservative example is to illustrate how rapidly users can differentiate.

Example (potential desktop buckets)

Imagine desktop browsers without any fingerprint resistance: lets conservatively say

150 locales and language preference combinations
100 canvas and WebGL rendering hashes
100 graphics card names and properties
100 enumerated font lists
60 timeZoneNames
60 screen and available screen combinations
20 userAgents
10 devicePixelRatios
5 default permission combinations
3 autoplay policies
3 touch capabilities
2 prefers-color-scheme

That's 9.72e+15 (or 9.72 quadrillion) potential buckets and we've barely scratched the surface. That's over one million times more than people on the planet (assuming my math is correct)!

Now not all potential buckets are possible and not all potential buckets will have users. And users are not uniformly spread. That's the point. Fingerprinting works by adding metrics that multiply potential buckets, until they have enough users hopefully unique to serve their purpose.

How effective is fingerprinting?

A study ^NDSS done in 2016 showing shows a 99.24% unique hit rate, and that is excluding IP addresses ^article. If you do nothing, you're going to be uniquely identifiable given a good enough fingerprinting script. See the previous section for some conservative back-of-the-envelope math.

It's now 9 years later, and fingerprinters haven't been sleeping. Fingerprinting scripts have only gotten faster, more common, more sophisticated, and more effective.

“Be Afraid. Be Very Afraid.” - The Fly ^wikipedia

Reasons for driving fingerprinting adoption include

less effective state tracking
- deprecation of old web standards
- changes to web standards to make them more private
- rejection of web standards e.g. ^{Mozilla Standards Positions}
- the almost-end-of ^w3c but-likely-never-end-of ^{privacysandbox} third party "cookies"
- adoption of browser mitigations such as partitioning
- stricter regulations on privacy like GDPR and CCPA
more effective fingerprinting
- increased web standards and APIs
- increased platform and device performance making more PoCs (Proof of Concepts) feasible
- new PoCs and research
- increased use and sophistication of data analysis and tooling: e.g. AI and heuristics
increased use of bot/fraud detection
- e.g. Cloudflare ^{2023: Announcing Cloudflare Fraud Detection}
policy
- e.g. Google Policy Change ^{Lukasz Olejnik}

Not only is fingerprinting a very effective and accurate threat, but it's a growing threat.

Stats (using Alexa top 100,000)

2013: Cookieless Monster ^doi.org
- 0.4% used one of three common scripts: BlueCava, Iovation, ThreatMetrix
2014: The Web Never Forgets ^doi.org
- 5.5% (5542 sites) performing Canvas fingerprinting
2021: Fingerprinting the Fingerprinters ^doi.org
- 10.18% deployed by 2,349 diffferent domains
- 25%+ on the top 10,000
2025: Beyond the Crawl ^arxiv.org
- automated crawls miss almost half (45%) of the fingerprinting websites encountered by real users (3,000 top sites)

Follow the money

fingerprint.com's revenue grew 3,654% in 2024 ^{fingerprint.com}
the market is projected to grow at a compound annual growth rate of 25% from 2024 to 2030, hitting USD 5.6 billion ^{verifiedmarketreports}

So how does this work in reality? Let's revisit our example.

Example (fingerprinting)

Your browser is sanitized on close (all state data is deleted), your browser uses partitioning, and you are masking your IP address
You visit supercutekittens.com, which also connects to tracker.com
Now you visit your local news site, which also connects to tracker.com
Now you close the browser and sanitize all state. You change your IP address
Now you visit a travel agency and a pregnancy help center: both of which also connect to tracker.com
On each website using tracker.com, a fingerprint was recorded, a hash calculated and tracker.com recorded your visit and fingerprint. Each time, the traffic was linked by your fingerprint hash
Now as you browse the web you might start seeing adverts for travel deals, new-born baby supplies, and of course ... super cute kittens 🐈.

What is entropy?

When people or articles talk about fingerprinting, they talk about entropy. So what is it?

Ask 10 different people (mathematician, physicist, statistician, etc) what entropy is and you'll get 10 different answers. Basically, it is information theory ^wikipedia. We don't really like to think of or use entropy as such - it can be useful, but is most effective analysing large scale real-world data sets, not measuring individual metrics since most metrics are not independent. We prefer to think in buckets and uniformity. If a metric splits any users into more buckets, then it adds entropy.

Terms

probability $p_{x} or P (x)$ Likelihood of an event, which in our case is a user to have a certain metric value (x). It is always normalized between 0 and 1 and does not have any units.
information $I (x) = - l o g_{2} (P (x)) = l o g_{2} (\frac{1}{P (x)})$ It tells how "surprising" it is to see a certain metric value. The lower the probability, the higher will be its information. Theoretically, the logarithm could be any base (as long as it's higher than 1), but when you use base 2, the unit of the information is the bit.
entropy $H (X) = 𝔼 [I] = - \sum_{x \in 𝒳} p (x) \log p (x)$ It's the expected value (basically, the mean) of the information. It gives an overview about whether a metric will give out a lot knowledge.

A fingerprinting script will get as much information as it thinks it needs, by testing metrics that add entropy. If a metric splits any users into smaller buckets, then it adds entropy. If it adds entropy, the distribution of users across the fingerprint affects each bucket differently, usually measured in bits. Enough bits (33 bits: 33² is 8.59 billion, which is more than the number the humans on the planet), and theoretically you are unique. This is not a perfect analogy: not everyone is online or using browsers, and we're not identifying people, we're identifying browser profiles (multiple devices, multiple browsers and profiles). But the point remains. Think of it as the less users in a bucket, the more they stand out because the bucket/crowd is smaller, so the higher the entropy and bits.

Here's my definition of entropy

Entropy is the average uncertainty (using probabilities) represented in the least amount of bits

Let's look at a simple example

Example (entropy)

imagine a metric that every browser has that can only result in one of two values: true or false

100% true
- the entropy is zero (i.e everyone is the same)
50% true and 50% false
- everyone would have the same probability of 0.5
- 1 in 2 = 1 bit of information: (−log₂(0.5))
- entropy is 0.5 x 1 + 0.5 x 1 = 1 bit
- this is the maximum entropy case: random variable with uniform probability (e.g., the same as flipping a coin or rolling a dice)
25% true and 75% false
- true: probability of 0.25
  - 1 in 4 = 2 bits of information: (−log₂(0.25))
- false: probability of 0.75
  - 3 in 4 = 0.42 bits of information: (-log₂(0.75))
- entropy is 0.25 x 2 + 0.75 x 0.42 = 0.815 bits
5% true and 95% false
- true: probability of 0.05
  - 1 in 20 = 4.322 bits of information: (−log₂(0.05))
- false: probability of 0.95
  - 19 in 20 = 0.074 bits of information: (-log₂(0.95))
- entropy is 0.05 x 4.322 + 0.95 x 0.074 = 0.2826 bits

Fingerprinters want to add or raise entropy. Fingerprint resistance would then logically want to reduce or lower entropy - and it can absolutely do that by reducing buckets. Now compare the above two-bucket examples. The 50/50 uniform probability has an entropy of 1 which is higher than the 25/75 probability's 0.815. But the higher entropy here actually offers the best protection.

The lowest entropy isn't always the best entropy.

This is because smaller buckets are quickly (log₂) disadvantaged in what I like to call the long thin tail - the lower the probability, the higher the bits of information, so when we have smaller and smaller buckets, they rapidly gain higher bits of information. Take the above two-bucket example with a 5/95 probability split. Those in the 95% bucket have great protection with only 0.074 bits of information, but the 5% bucket is in bad shape at 4.322. So whilst the 5/95 average entropy of 0.2826 looks great, again a maximum entropy of 1 would be better.

In other words average entropy isn't all that helpful. Ideally we want uniform or maximum entropy. Unfortunately, we can't always control unformity, but it is something we should strive for.

What is a surprisal?

Our earlier definition of information started with “It tells how "surprising" it is to see a certain metric value.”. And not all metrics are independent.

When a metric adds information, there is also a calculation that reduces entropy by a certain amount. This is called self-information ^wikipedia or a surprisal, because it is a measure of how "surprising" or unexpected the new information is.

For example, if your timeZoneName is America/Los_Angeles and your language is en-US, that would be a higher probability and less surprising then returning fr-FR (French). Also see mutual information ^wikipedia: the mutal dependence between two random variables.

Example (self-information / surprisal)

random variable: JavaScript is disabled
event: all information normally gained via JavaScript is now replaced with a single piece of information: the random variable, i.e "JavaScript is disabled"
probability of the event: 100%
this is not surprising to say the least, in fact it's 100% expected

In the case of disabling JavaScript, so much information and entropy is avoided (vs allowing JavaScript) that little remains to differentiate users - i.e the potential buckets of users shrinks to a very few and therefore it takes less users to create a crowd.

Pattern Recognition and Machine Learning by C. Bishop (2006)

“The amount of information can be viewed as the ‘degree of surprise’ on learning the value of x. If we are told that a highly improbable event has just occurred, we will have received more information than if we were told that some very likely event has just occurred, and if we knew that the event was certain to happen we would receive no information.”

In other words, when JavaScript is disabled, a thousand JavaScript metrics are certain to be equivalency of "JavaScript is disabled", so no further entropy is gained beyond the fact that JavaScript is disabled. In our preferred language example, America/Los_Angeles timeZoneName users having an en-US language locale is likely but not certain.

What is equivalency?

If a metric doesn't add any new information, it's equivalency. Or in terms of surprisal, it's 100% unsurprising.

Example (dependent metric)

dependency: platform
- already known from e.g. userAgent and/or feature detection
new metric: -apple-system font
new information: none
- It is 100% certain* (probability 1) that if you are on an apple device you have the font, and if you are not you don't - therefore the entropy of having this font, or not having it, is zero
- * Assuming no-one is silly enough to alias -apple-system on a non-apple device

When resisting fingerprinting there are metrics that cannot be hidden or lied about, or may not be feasible to protect

hidden: e.g.
- the browser engine (gecko, blink, webkit, edgeHTML, trident etc)
- browser version
  - feature detection and changes that cannot be user controlled)
- platform (Windows, Mac, Linux or Android)
- font enumeration
  - a font either exists or is aliased to one that exists, or it doesn't exist or is blocked
lied about: e.g.
- Not to be confused with limiting values; i.e the web-content uses those values and this affects the user experience
  - web-content language preferences
  - locale
  - timeZoneName
  - inner window dimensions
feasible: e.g.
- metrics that are impractical or currently not feasible to do anything with

These "base" metrics are a known limitation. When a metric only reinforces that, i.e it doesn't add any new information, then it's equivalency of something already known. Here are some more examples.

Example (timezoneOffset)

assume every browser provides a timeZoneName (usually from the platform) and a timezoneOffset
the timeZoneName options may vary with platform
the timezoneOffset information (we can check a range of dates) is determined by the timeZoneName and may vary based on the browser and browser version: it's dependent on the implementation and the bundled tzdata ^IANA
this makes a timezoneOffset(s) metric equivalency of already known information (timeZoneName, platform, browser and version)

Example (audio)

Math has entropy caused by
- floating point differences: the math library can differ between platforms and architecture
offlineAudioContext has entropy caused by
- floating point differences in Math
- the ac-sampleRate: e.g. 44100 or 48000
audio uses a subset of Math functions and a limited range
the floating point entropy in audio is therefore equivalancy, or a subset if you like, of Math

However, that's not to say including these metrics isn't useful. Additional entropy can be dependent on another metric value - e.g. checking touch on macs might be pointless, but not on other devices. Additionally, using multiple methods and equivalency can catch outliers and paradoxical edge cases, such as extensions messing with values, or users setting environment variables. But the effectiveness of these are a case of dimininishing returns vs performance.

What do fingerprinters want?

Part 2: Intro: In order to resist fingerprinting... [it] helps to know what fingerprinters want and how they work.

Fingerprinting scripts require websites to host or allow them, so what websites want here is important. Websites want fingerprinting scripts to be:

Seemless
- no unexpected events alarming users
  - e.g. DRM or other permission prompts especially when they don't make sense given the website's purpose, blank new windows or tabs opening, unexpected high CPU cycles, or unhandled errors leading to site breakage
Performant
- fast and not affect the website's performance
  - e.g. the script can be fetched/run after other events
Unblockable
- hard or impossible to bypass
  - e.g. use random script names, run as first party, embed the code inside core functionality or other scripts.
  - This can seem counterintuitive to being seemless (no breakage), but there is a balance to be had between protecting the investment and purpose (anti-fraud, bot-dectection, etc) versus "leaving money on the table". If the percentage of affected users (that break the site trying to circumvent the script) is small, it can be justified.

Fingerprinters also want to attract websites to spread their reach and effectiveness, so they also want their scripts to achieve certain attributes in order to sell themselves. Fingerprinting scripts want their metrics to be:

Robust
- handle any browser, any browser configuration, and any extension
  - e.g. handle all possible errors and make no assumptions (use feature detection, try catches, type checking)
  - handle any adversarial (such as paradoxical) results. It is not incumbent on scripts (and certainly not on test sites) to show or calculate this, it can be done server-side or by a data broker
Correct
- produce a correct result
  - e.g. when measuring elements, the script should handle injected and custom website styles
  - e.g. detect a randomized canvas as randomized and record it as such (i.e a stable metric) vs not and therefore always record a unique random value and effectively poison your own data
- correctness is not strictly required if the result is consistent
  - i.e as long as all users are subject to the same "flaw" and data is not poisoned, but may affect potential entropy
  - e.g. if font enumeration tests are based on the platform, and in some cases the platform is incorrectly determined by the script (hence why metrics should be correct), then the font enumeration tests would not be as designed (e.g. checking Mac fonts on a Windows system, or skipping the test), but still return a consistent result
Reproducible
- produce the same result when run again or in a new session (without any browser/OS changes)
  - e.g. when measuring characters the script may need to await async font fallback ^bugzilla on first use in a browser session where a character may be rendered with the default font, and only fallback to the correct font afterwards
Stable
- the more stable a metric, the easier to weight, or to link over time
- a metric or fingerprint is a snapshot in time and doesn't have to be stable
  - metrics can change for a number or reasons such as in a session (e.g. zooming, resizing windows, moving windows) or over time (e.g. updating the browser to a new version, installing new system fonts) or due to per site settings (permissions, zoom levels) or external factors (IP address)
  - metrics can be weighted and changed server-side - e.g. an IP address can be recorded in full and an additional property added server-side to reflect "tor node", "MullvadVPN", "proxy" etc. There is a whole science dedicated to linking changing fingerprints over time
Add Entropy
- a metric that adds no entropy is redundant
  - e.g. pointless to record a metric for navigator.onLine as it will always be true (otherwise how did the script load?)
- metric should do the bare minimum if maximizing entropy
  - e.g. if recording timezoneOffsets, checking all four seasons (for daylight saving changes) for every year (tzdata goes back to the 1800s) would amount to hundreds and hundreds of dates being tested but would produce maximum entropy. However, research and PoCs shows this can also be achieved with just ten dates.
- a metric should be optimized and balanced against performance
  - i.e where it is not feasible to gain maximum entropy, or maximum entropy is unknown
  - e.g. font enumeration is performance heavy as it touches the DOM ^MDN where each font has to be requested, loaded, rendered, and measured. A very large font test would likely add some extra entropy by catching outliers compared to a smaller selective font test, but may not be worth the performance cost. Since the platform can't be hidden, scripts could also use targeted platform-specific font tests.
- code should be optimized
  - e.g. in font enumeration, you compare measurements from a font name against three fallback generic font-families (monospace, sans-serif, serif). If you have already detected the font, you can skip checking any remaining generic font-families. When detecting a font, you are measuring a string of characters to see if it changes. The less unique characters in the string the faster it will render, and the larger the font size the more differences will be exposed - a combination of large font size with very short platform-specific strings can maintain detection whilst being more performant.
Universal
- the more universal a metric (and it's name and format), the easier it can be used by data brokers to link fingerprints and profiles
  - e.g. returning navigator.languages as an array
- likewise, the more universal the metric's method
  - e.g. when you create a canvas there are many parameters you can set (canvas size, text, font and font size, colors, transformations etc) and many ways to read the value (getImageData, toBlob, toDataURL etc) - each of which affect the final result: so if there are ten scripts with ten different canvas tests, then you won't be able to link them using this metric

Fingerprinters also want their scripts to be universal - the more widely spread a script is, the more traffic it can easily link.

A fingerprinting script will try to extract consistent results with enough entropy to meet their target (e.g. marketing claims of 95% of users) to sell themselves to clients, as performant and unobtrusive, to become widespread.

Now we know what fingerprinters want, this helps us know how to resist them.

PART 3: FINGERPRINTING RESISTANCE

This part is a supplement to Tor Browser Design Doc > 3.3 Adversary Attacks > 2. Fingerprint browser properties

All resistance and discussions and terms are in the context of what websites and servers can see - i.e. in the relevant security contexts ^{Mozilla Blog: Principals} as defined by Mozilla

Now we know what entropy and equivalancy are, and the important of surprisals, and now we know what and how fingerprinting scripts work, we can form a fingerprint resistance strategy.

Resistance Basics

Fingerprint resistance relies on a "default crowd" (so users can have shared fingerprints). A crowd is a set of users being protected by an overall fingerprint resistance strategy. Notably both Tor Browser and Mullvad Browser with RFP (resistFingerprinting), and Brave's Shields fingerprinting protections are enabled by default. Firefox's fledgling FPP (fingerprintingProtection) is also default enabled in Private Browsing windows and with ETP (Enhanced Tracking Protection) in Strict mode.

Opt-in crowds do not work

There is no such thing as "no fingerprint". There is always fingerprint data even without JavaScript. And the fingerprint protections and techniques per metric in each crowd are themselves fingerprintable.

There is no such thing as "a single fingerprint" for all users in a crowd, because there are "base" metrics with differences that we cannot hide or lie about or are currently not feasible to protect.

There is no such thing as "defeating fingerprinting". This is why RFP is called resist fingerprinting and not defeat fingerprinting. Resistance is an ongoing process and not a zero sum game.

There are five steps fingerprint resistance can take

block known scripts
- the best fingerprinting code is no code, but this is easily bypassed, a form of enumerating badness, is inherently reactive, and can undermine legitimate uses causing breakage or rejection (anti-fraud and bot-detection)
engineer solutions to remove the fingerprint problem
- e.g. Firefox ships, for all users, the same same math library in audio across all platforms. This removes the audio entropy caused by floating points, and the only differences left are equivalency of platform architecture (which we can't hide)
- e.g. reduction in canvas entropy can be achieved by using software instead of hardware/GPU rendering
change existing and shape/reject new web standards
grow the crowd
- the bigger the crowd hopefully the more users you share a fingerprint with to help hide your traffic in
reduce ~~entropy~~ buckets

Resistance must come from built-in browser solutions

Resistance must come from built-in browser solutions. Extensions are not suitable as they often lack APIs to properly protect the real values and they have no default crowd. They can also provide additional bits of information (increased entropy) with prototype and proxy and other detectable tampering, can create performance issues and cause website breakage, and are likely to trigger anti-fraud and bot-detection scripts due to adversarial fingerprints (see "Resistance Methods" below).

An extension bundled for a crowd is an exception (e.g. NoScript for Tor Browser users), but care should be taken to ensure any measurable information configurable by the user is restricted (e.g. filter lists in uBlock Origin) or not persistent (e.g. per site exceptions should be session only). If the extension itself (e.g. NoScript) or it's assets (e.g. filter lists) are fingerprintable and that fingerprint changes over time, then it is important that users are always up-to-date.

Randomizing vs Static Methods

In pure resistance terms, it does not matter if a metric's fingerprint resistance is random or static, as long as the real value is protected. But it does matter as a choice of technique and has consequences.

Randomizing is said to "raise entropy". A metric that is different on each site per browser session makes that metric unstable and fingerprinting scripts want metrics to be stable (see "What do fingerprinters want?" above). A script that doesn't detect lies (called a naive script) collects an unstable fingerprint if that lie is randomized. The more metrics randomized, the greater the chance that a script is naive or "fooled" or collects "poisoned" data.

However, protected metrics can always be detected in a crowd - and tampering of some metrics can be detected individually. If the randomization is persistent per site (protected by a seed), a script can only infer it's actually randomized but given the inference example below this is not difficult. A script that does this is called an advanced script. When this happens and the metric was random, the value then becomes stable - i.e the script or backend tooling would record the metric's value as e.g. "random" or "untrustworthy". Any script or backend tooling that can do this, renders the purpose of fooling scripts moot.

Given any script could detect all protections, for all intents and purposes, raising entropy can be treated the same as lowering entropy.

Scripts can determine protections in a number of ways: here are a few

Examples (detecting protections)

mathematical proofs
- e.g. known pixel tests in canvas and WebGL
- e.g. known domrect measurements
inference
- you can't hide your browser version or platform
- you can't hide that you belong to a specific crowd
- e.g. knowing you are desktop Tor Browser and because you are open source we know exactly what you do for that platform
adverserial results
- deviations from expected or collected results
- e.g. there are a finite number of benign offlineAudioContext results for web audio per platform and engine, so altering it would stand out as being tampered with
third parties
- e.g. EFF's Cover Your Tracks redirects to firstpartysimulator.net and back
- e.g. a "gateway" such as Cloudflare could share data with customers (the websites)

Randomization is usually and best done per-site (eTLD+1), per-session, and per-window-type (normal vs private/incognito etc). However, this leads to yet another potential problem - this time with state ("site data and cookies").

Antoine Vastel ^{castle.io blog}

“ For example, if your session cookie stays the same but your fingerprint suddenly reflects a different OS or rendering engine, it violates basic assumptions about how real devices behave. This is exactly the kind of inconsistency that risk-based authentication systems are trained to catch, and it often results in:
More frequent CAPTCHAs
Secondary verification challenges
Session invalidation or outright blocks”

I'm not saying that built-in browser solutions would randomize the platform or engine - these are trivial, core, "base" metrics that can't be hidden or lied about (but I have seen plenty of non-browser solutions do this). The point is the inconsistency. Even saniziting here is not a panacea, as re-identification through logins would still provide a history of inconsistency.

Besides fooling naive scripts, randomizing can make sense depending on usability. For example in canvas, subtle randomizing can render a usable result for the end user versus rendering an unusable result to prevent averaging. Whereas randomizing the userAgent wouldn't provide any extra benefit for the user (and may cause compatability issues) versus restricting it to equivalency of platform, engine and version (all of which can't be hidden).

Another valid use case for randomizing is when it is currently the only feasible solution. So many variables contribute to canvas entropy (platform, architecture, graphics card and driver version, firmware, math libraries, available and default fonts, font versions, font smoothing or anti-aliasing ... to name but a few) that breaking the web standard completely and lying is the ony solution that makes sense.

Unless there is a net benefit, such as usability, or it's the only current feasible solution, randomizing is best left unused.

All built-in browser solutions already randomize canvas, so the "fool" naive scripts trick (for browsers that care about that thing) is mostly covered ¹. Otherwise randomizing is problematic:

it adds complexity
- protecting the seed
- staying plausible and non-adversarial
it incurs maintenance and performance burdens
it carries risk
- poor implementation can and has in the past (and in some cases STILL does) led to either direct bypasses or reverses the randomness
- e.g. missing protections in all methods and sources
- e.g. averaging or bucketing
- e.g. too subtle
- e.g. too simple to reverse

Because lowering entropy is ultimately the end result (assuming the worst - i.e an advanced script), as a consequence it is felt that randomizing (unless needed) is a risk with too many costs. This is why, for example, Tor Browser doesn't bother beyond it's per-execution non-seeded non-subtle randomized canvas.

¹ Detecting tampered canvas is trivial and scripts are getting smarter. So depending on the threat model, it might make sense to randomize some extra metrics. If robust engineering for per-site (seed) is already in place, then a few additional select metrics would help - ideally metrics than can randomize a lot of non-adversarial values and are not as easily detected as tampered with.

Resistance Methods

This article will focus on reducing buckets

Outside of blocking known scripts, engineering solutions, influencing web standards, and growing the crowd, we are left with manipulating metrics to reduce buckets in our crowd. Technically this is "breaking" web standards by not honoring the original or expected (real) values from a user perspective. But that is not (always) the case in the context of what websites and servers can "see".

The function of fingerprint resistance is to ultimately lower entropy (or more precisely to get the best entropy) for it's crowd. It does this by reducing buckets. Raising entropy by randomizing results can always be detected in a crowd (see "Randomizing vs Static Methods" above), and is treated as lowering entropy, given any script could detect it.

Reducing buckets works on the simple premise of limitation

Fingerprint resistance reduces buckets per metric by limiting

values returned
- e.g. userAgent, timeZoneName
resources used
- e.g. available fonts, codecs, APIs and web standards
note
- randomizing is also a form of limitation - i.e the result is limited to being "random" and can be values returned or resources used

Resources

Disabling an API or web standard should be a last resort, as it can lead to:

website breakage
- scripts may expect a web standard to exist and don't "feature check" first or provide error handling
less functionality
- e.g. blocking (or overly limiting) fonts can lead to tofu ^wikipedia
- tofu: 􏿮 􏿮 􏿮

As more metrics become resistant, this makes it harder and more costly for fingerprinters to gain enough buckets - they will need more methods, more tests, more metrics - and they will take a performance hit. When implementing resistance, to help maintain and grow the crowd, it is important to minimize user friction through information and managing expectations - this is best done in-browser with UX.

Another key component to fingerprint resistance, is to protect the real value, but this is not always fully possible (or the limitation has a bug that leaks). This leads to two possible outcomes (in the context of what websites can "see") in our limited values

restricted (truthful and robust)
- resources or values used by the browser (inputs) are limited
- limitations can be fully implemented to cover all methods and sources (if it isn't, that's a bug)
- outputs are therefore always "truthful" and can be benign (see below)
- i.e. there should be no difference in any fingerprinting between the real value vs using the same restricted value
spoofed (lies and a calculated risk)
- limitations cannot be fully implemented (all methods and sources)
- or the web standard is actually broken (i.e not truthful)
  - e.g. canvas where the only current feasible option is to lie
- results are therefore always "lies" if exposed and adversarial (see below)

Not all fingerprinting scripts (or threat models) are the same. An advanced script can expose the consequences or outcomes of not only when the real value isn't fully protected, but also when poor decisions are made as to what the limitations are.

Adversarial vs Benign

result: a metric/value or a set of metrics/values
benign: a real-world result
adversarial: not benign
- e.g. there are a finite number of benign offlineAudioContext results per browser engine and platform, which fingerprinters know - so randomizing that metric stands out
- e.g. touch capabilities on a Mac (there is no such thing yet)
paradoxical: conflicting results (also adversarial)
- e.g. userAgent says "Windows", but platform has "-apple-system" font
- e.g. a spoofed devicePixelRatio doesn't approximate to a real value calculated from the DOM

Limitations are not just returned but usually used. Metrics are not just looking up a value (e.g. locale) but also manifest in metrics that use those values (e.g. dates), and the user experience is subjected to those values. Restricted values offer the best experience and resistance - limiting input to control output as designed and expected by the engine creates compatible, expected, deterministic, and benign results (with the correct limitation).

Examples (used values)

languages: can be used on multi-language websites to determine what language to display web-content in
locale: used in Intl ^MDN which is used for formatting currencies, dates and times, numbers and more
timeZoneName: used in Intl
font enumeration: a font either exists (or is aliased to one that exists), or it doesn't exist (or is blocked)
inner window sizes: used by web-content and layout

Therefore, limiting values comes down to two options (always truthful and robust or a calculated risk with potential lies), and the solution chosen can differ based on resource constraints (e.g. bundling fonts), threat model (e.g. canvas averaging), feasibility (e.g. canvas), and complexity. When we limit we need to keep in mind the consequences of the values we return. Adversarial fingerprints can trigger anti-fraud and bot-detection scripts, so we should always use benign values where possible.

Using benign values does not guarantee a non-adversarial outcome as it may not be possible to fully protect the metric and if a script also detects the real value, they also detect a paradox. In these cases it becomes a risk assessment of how potential the lie is to be detected.

Let's look at some geolocation examples. This metric consists of "is the Geolocation API enabled", "what is the default permission" (from the Permissions API) and if permission is granted (and therefore a user choice per site) this provides location data (even coarse location data is high entropy). In these examples, we want to prevent the user from ever granting permission.

Example (restricted): `geolocation`

disabled API (poor)
- This is Tor Browser's current implementation and the default permission (exposed in Settings) is not protected
- all users are (hopefully) the same: geolocation disabled, the default permission, no location possible
default deny (simpler)
- enable the API, ignore default and site exception permissions and return "deny"
- all users are the same: geolocation enabled, default deny, no location possible
default prompt (more complex)
- enable the API, ignore default and site exception permissions and return "prompt", suppress the prompt if location requested and deny with a random human reaction time. Optionally indicate to the user that geolocation was requested
- all users are the same: geolocation enabled, default prompt, location denied looking like a human

In the above examples, disabling the API is causing website breakage. Additionally, permission needs protection. All major browsers ship with geolocation enabled with a default prompt. We can be smarter these days and alternatively could enable the API and instead restrict (input) what the context sees, calculates, uses or returns with benign values - i.e let the browser engineering behave as expected to output web standard compliant, compatible, plausible and non-adversarial results.

Here are a few more examples: each metric will have it's own possible solutions

Example (restricted): `ac-SampleRate`

all users are the same: return and use 44100 Hz

Example (restricted resource): `font enumeration`

Note: You can't lie about fonts: they either exist or are aliased to and are used - or they don't exist or are blocked and are not used

limit the fonts available to web-content per platform (i.e Windows, Linux, Mac, Android) to some set of system fonts (provided by the platform)
limit as much as possible to those expected across each platform (e.g. for Windows cover Windows 10 and 11) but try and cover modern writing systems ^wikipedia
bundle fonts with the browser (to cover writing system gaps)
- this is also "engineering a solution", similar to bundling and using the same math libraries across platform architectures
e.g in desktop Tor Browser most fonts are bundled to provide comprehensive coverage for writing systems. Some system fonts are also allowed on Windows and Mac, especially for CJK (Chinese, Japanese, Korean), for platform consistency ("looks and feels", e.g widgets) and to save on package sizes

this is an example of both a resource constraint as file sizes are not trivial (CJK font files for example are large and may be cost prohibitive to ship), and threat model, where Tor Browser deems font fingerprinting entropy to be too high without much tighter control

all users are (hopefully) the same per desktop platform, which is equivalency of something we cannot resist

Sometimes, however, it is not possible to fully protect a metric yet - usually due to physical device contraints (e.g. devicePixelRatio, screen resolution) or the web standard is broken - i.e not truthful (e.g. canvas). Here are some examples

Examples (spoofed: lies and a potential risk)

hardwareConcurrency
- All Tor Browser users currently return 8 for Mac and 4 for all other platforms
- This does not prevent the browser from using all available cores, which can be estimated with workers and timing attacks. Notably Tor Browser also has timing protections
screen dimensions
- Screen spoofs are typically based on a combination of screen and window dimensions for plausibility
- But the real screen resolution can be exposed (or inferred) if a user goes fullscreen (F11) or uses a fullscreenElement (e.g. clicking to view a video in fullscreen)
canvas (reads)
- Canvas is typically only protected (lies) when read, not drawn - drawn canvases are always truthful (to what it was told to draw)
- But when canvas data reads are used (e.g. to create more canvases or for a fingerprint), they break web standards and are detectable lies

It is not imperative that limitations make all users be the "same" (allowing for equivalency), as that may not be feasible or even desirable. The goal of fingerprint resistance is protect the real value (restricted robustly or spoofed as a calculated risk) through limitation in order to reduce buckets (values) which in turn makes it harder for fingerprinting scripts.

Examples (partial)

restricted
- languages and locale: not all users speak the same language, so the restriction needs to cover a range
- font enumeration
  - Bundling enough fonts may not be an option
  - Protection can be reliant on the system's default fonts which can vary per platform version (e.g. Windows 10 vs 11) or edition (e.g. Windows LTSC, SE, Enterprise, Home, Professional etc)
  - Supplemental fonts are often provided by default based on the platform locale/language
  - Restrict too tightly you get broken writing systems, restrict too loose you get ~~entropy~~ more buckets. Regardless, either way it eliminates high entropy items such as fonts installed with applications (e.g. Microsoft Office, LibreOffice, or Adobe)
spoofed
- fingerprinting scripts have to work harder in order to expose the spoof - e.g. run known pixel tests, averaging attempts, touch the DOM

Resistance Considerations

Fingerprinting resistance can lead to undesirable side-effects or outcomes. Solutions should provide a good user experience and as much as possible aim for:

usability (and reduced user friction)
- e.g. disable APIs only as last resort, allow fonts to cover writing systems, use subtle canvas randomization for usable canvases and to reduce user interventions (site exceptions), add UX for informative/educational purposes, provide site exceptions
accessibility
- e.g. force-underlining links by default in websites is useful for people with achromatopsia ^wikipeia
compatibility & plausibility
- always use benign values to minimize adversarial fingerprints
- ensure all methods are covered (for spoofed values, as much as possible, since they are not truthful by definition)
  - e.g. screen dimensions: JavaScript (such as screen properties and matchMedia) and CSS (media queries) should match
  - timeZoneName: protection should cover Date ^MDN, lastModified strings and other timestamps such as in performance APIs
  - e.g. userAgent should match in HTTP request header and navigator (including other relevant navigator agent properties such as appVersion)
  - e.g. outer window "size behavior" should match what happens in each OS when windowed, maximized, fullscreen (F11) and in fullScreenElement
- ensure all sources are covered
  - e.g. document, iframes, workers, service workers, first party, third party, about:blank

In other words

Try not to give scripts any reason to "break" things with disabled APIs.

Try not to "break" anything by always using benign values, preferably with restriction covering all methods and sources to always be truthful (just like a real value). Risk spoofing if required or unavoidable, knowing that it could be detected (adversarial).

Be mindful of usability and accessibility. Educate and inform users where this isn't possible.

The ideal end result is for users to enjoy a seemless experience - and happy users maintain and grow the crowd (see "Resistance Basics" above).

Tor Project Methodology

Tor Project always assumes advanced scripts and worst outcomes (high entropy, targeted attacks). Whether it is assessing a PoC (Proof of Concept) or new/changed web standards and APIs, the steps are to investigate, analyse the causes, evaluate resistance options, and prioritize any solutions based on resources, buckets, and difficulty/ability to resist. Each metric will have it's own unique set of circumstances.

INVESTIGATE

Confirm different results can be obtained
- e.g. detection of a content blocker, such as uBlock Origin is a binary outcome. This would not add entropy to a crowd if all users are the same: i.e uBlock Origin is bundled and enforced
Explore potential entropy
- e.g. entropy within a content blocker: such as detection of user-optional filter lists
Explore the method
- e.g. protecting the timezoneOffset in DOMParser's lastModifed string also led to the same type of protection (timestamps as strings) in EXSLT
Determine equivalency
- i.e. is it part of a larger issue or something we don't care to (or can't) protect
- e.g. you are using Tor Browser (equivalency of the crowd) or your OS is Linux (equivalency of platform)
- e.g. audio math is a subset (less functions and smaller range) of Math. If we patch audio math entropy, that entropy can still be gained via Math

Example (investigate): `device orientation`

This can be one of four values
- "landscape-primary", "portrait-primary", "portrait" (flipped), "landscape" (upside down)
Looking at desktop lets guess/estimate
- probability: 95%, 4.5%, 0.45%, 0.05%
- information bits: 0.074, 4.474, 7.796, 10.966
- entropy: (0.95 x 0.074 + 0.045 x 4.474 + 0.0045 x 7.796 + 0.0005 x 10.966 = 0.3122
Since all users are not the same, the metric illustrates entropy across all users. Those not in the vast majority provide disproportionally more bits the more uncommon their orientation
On Android this is not a stable fingerprinting metric, as users normally (but not guaranteed) have auto-rotate enabled and phones and tablets always have an obvious default

ANALYSE

Understand the causes
- e.g. measurements: multiple settings can alter sizes (and subpixels), such as system scaling, or zoom (to name a few). And depending on what you are measuring, additional settings, such as for fonts (language, default fonts and sizes per writing system and generic font-family, platform, etc)
- e.g canvas: can contain a combination of multiple underlying factors: incorporating math, css colors, fonts, subpixels, anti-aliasing, GPU/hardware/software, compression levels, and more into a single rendering
- e.g. Math entropy comes from floating points across platform architecture and different math libraries
Evaluate resistance options
- see "Resistance Methods" section above
- note: some solutions may be too hard to fully solve for now (e.g. subpixels), not practical (e.g. bundling all CJK fonts on Windows/Mac) or not legal (e.g. bundling all codecs due to licensing)
Prioritise: factors include:
- potential entropy
- threat assessment: e.g. see canvas subtle randomization vs Tor Browser's RFP approach
- risk: i.e. likelihood
  - e.g. passive and/or active | requires user activation/gestures | is behavioural (such as typing) | cost to fingerprinters
- difficulty/resources required to resist

Example (analyse): `device orientation`

Provided by screen.orientation* values and legacy mozOrientation
Deterministic from screen measurements and css
Since all methods are ultimately derived from (or equivalency of) the screen resolution (the platform provides the flipped values) and since we can't restrict the screen (i.e we can't change your real resolution), we have to spoof the screen, and all methods would need to use the spoofed screen values. Additionally we can limit the flipped information.

If resistance is added, there are additional steps: implementation, UX and testing.

IMPLEMENT

Due to Tor Browser's unique relationship with Mozilla and by virtue of using gecko, there are a number of possible options

Flag(s)
- for peer review and cleanup: ask pierov: not sure we use any of these specifically for FPing - but we do have e.g. no webrtc (but that could also be done with a pref) - but the buidl flag also means less dependencies bundled - e.g. maybe the widevine thingy is applicable here as an example?
Pref(s)
- Mozilla often implements changes behind prefs (settings in about:config) for testing, rollouts, finer control, different settings for platforms, and for future changes
- Sometimes prefs can help provide a solution and is an easy and quick method. The downside is the pref may be deprecated in future. Additionally users could change the preference (but it could also be locked in stable release if deemed vital)
Patch: ideally protections should:
- eventually be behind RFP and in Mozilla's codebase (to reduce rebasing), but that's not always possible. Patches can be applied in Tor Browser and uplifted, or patched in Firefox and backported. Either way, this needs to be managed in a timely fashion
- ignore any prefs - e.g. userAgent overrides and prefers-color-scheme
- have built-in tests

Example (implement): `device orientation`

Screen metrics on desktop are already spoofed based on the limited inner window
The spoofed screen measurements should determine the spoofed device orientation
- e.g. returning landscape* with screen measurements that are portrait* would be a paradox (breaking the spec)
The spoofed device orientation should be limited to primary
- web-content doesn't care if you're right-side-up or upside-down
The orientation.angle should match the spoofed device orientation per platform (it differs on Android)
This would need to be coded behind RFP

UX (if needed)

Anything that can be done to strengthen protections (which arguably should all be done in code), or to inform users and manage expectations, or to reduce user friction and confusion: such as:

informational: e.g. displaying letterboxing dimensions when appropriate
additional: e.g. letterboxing option settings, the UX for allowing canvas permission exceptions
removal: e.g. settings for "website appearance" (prefers-color-scheme: automatic, light, dark) which has no effect

Example (UX): `languages and locale`

all languages supported are bundled (not downloaded remotely which is blocked), and the application language is used to tightly control requested languages and locale, with the ability to use English for web pages (regardless of the application language)
strengthen protection by removing the setting to "choose your preferred language for displaying pages" as the dialog allows the user to directly alter the protected fingerprint
replace the dropdown selection options for both "Choose the languages used to display menus, messages, and notifications" and "Set Alternatives" with the list of bundled languages and remove the "search" feature for adding more remotely
remove "Use your operating system settings [for your OS locale] to format dates, times, numbers, and measurements" as this preference is ignored in code (reduce user confusion)
add an option for the choice to "Request English versions of web pages for enhanced privacy"

TEST

Tor Browser can only control what happens in their "crowd". If we cover and test all possible methods and sources, we know the crowd is protected as designed. Ideally this should include built-in test code, specialized test pages and sites, and automated testing for builds - not just to ensure the resistance is working, but to catch regressions and adversarial results.

Example (test): `device orientation`

All users are expected to be one of two results (based on screen spoofing which is based on inner) with the correct matching angle per platform (equivalency)
- Using our earlier desktop estimates
  - probability: 95.05% landscape*, 4.95% portrait*
  - information bits: 0.07324, 4.336
  - entropy: 0.2216 (reduced from 0.3122)
- Whilst the overall metric entropy wasn't greatly reduced, it all counts. More importantly, we have reduced buckets and conversely protected users with unusual orientations
Tests were built into the patch and manually verified with all possible device orientations on each platform
In the future it is expected that Tor Browser screen spoofing in desktop will always return landscape dimensions (still based on inner), meaning device orientation entropy will be 0 (within the Tor Browser crowd)

PART 4: TESTING

Tests provide data, and collective data can be analysed to help formulate fingerprint resistance methods, confirm protections are as expected and haven't regressed, and to find metrics that require some or more protections.

“... and know yourself” - Sun Tzu

“Knowledge is power.” - thorin

Data can also come from telemetry if available (e.g. Tor Browser has none). Note that telemetry data is clean, non-tainted data since it can bypass fingerprint protections and extension tampering. This part focuses on tests and test data.

Test Basics

Fingerprint resistance can only be achieved in a crowd, e.g. Tor Browser. Therefore, to test values, it only makes sense to test within the crowd - e.g. Tor Browser users. Any other browser data is irrelevant. Similarly, any analysis of data collected should be restricted to crowd users. Non-crowd data just creates unwanted noise - but can of course be useful to help determine benign values and to see what other browsers do - e.g. we can use ask Firefox for per-platform summary-data on common screen sizes or processor counts.

Crowd: (from "Resistance Basics")

... a set of users being protected by an overall fingerprint resistance strategy [such as] Tor Browser [or] Mullvad Browser ...

When resistance is added for a metric, testing should have been created (see "Tor Project Methodology" above). These tests should be robust, i.e. all methods and sources for the crowd (e.g. Tor Browser would check EXSLT for date/time protections since it is based on Gecko).

methods and sources: (from "Resistance Methods")

methods: all the ways to determine or infer the value
sources: all origins: document, iframe, workers, service workers, first party, third party etc

Collectively these tests, and additional ones covering metrics not yet fully protected or understood, can be used to create an overall fingerprint test optimized for the crowd in question.

Test Data

In testing there are two dataset outcomes that are useful

buckets (i.e values)
entropy (i.e uniformity)

A dataset can be for a single metric, a subset of metrics, an overall fingerprint, or any data "slicing and dicing" (multidimensional filtering) that is meaningful. A dataset is not guaranteed to get all possible buckets as it is a random sample. The larger the dataset, the better the chance more values, or the maximum buckets, are detected.

BUCKETS (values)

each value only needs to be recorded once
for protected metrics, any value outside expected results indicates protection is not working as expected
different values can indicate where protection may be lacking

Examples: (Tor Browser buckets)

languages + locale
- navigator.languages (plural) is tightly controlled which in turn tightly controls locale.
- Tor Browser currently has 41 language options
- therefore there should only be 41 possible languages + locale combined values
- any combined value that isn't one of the 41 expected values would indicate a protection failure
geolocation's default permission
- the default (from Firefox) is "prompt"
- the default is not currently limited by Tor Browser's fingerprint resistance.
- users can change the permission in the settings, even though it cannot be used (geolocation is currently disabled)
- a value or bucket for "deny" or "granted" would show this metric needs protection

ENTROPY (the number of buckets and their uniformity)

While buckets/values can tell us where there are differences of interest, only entropy (from large, real-world, and untainted, crowd-datasets) can tell us how bad it is.

occurences matter as the number of each value determines the probabilities used to caculate entropy
it is therefore important to not taint datasets with repeat-user data
- e.g. in studies and surveys, state tracking (e.g. an IP address and/or a cookie) is commonly utilized to help mitigate this, but becomes problematic due to sanitizing and IP address protections
datasets need to be real-world
- i.e. representative of reality and not unduely influenced by specific demographics or groups such as privacy and fingerprinting enthusiasts
datasets need to be large enough to make inferences about the crowd with confidence
see "Test Sites" below for bad examples

In other words, not only is what is tested, how it's tested, and how robust it's tested important for quality - i.e all the things that make a good test suite (e.g. acccurate, relevant, comprehensive, robust...) - but also how it's collected and how much is collected is also critical (i.e fit the needs of the survey).

No Data

What if you had no data? How can a resistance strategy, either overall or for an individual metric, know if it's resistance is working without large scale datasets of their users and without any telemetry? It depends on the metric and resistance method, but for some the effectiveness can be known or estimated with some certainty. Here are some Tor Browser examples.

If it is "known" there is only one bucket, then the entropy in the crowd "is" zero (assuming no leaks or bugs and the tests are robust), and the protection cannot resist more

Example (zero entropy): `timeZoneName`

everyone is timezone Atlantic/Reykjavik
tested for, hardcoded, can't be overriden by prefs or the system, is "restricted" meaning it's always "truthful" everywhere including deterministic results in Date andIntl etc

If it is "known" that the buckets (plural) are at the minimum possible (equivalency), then the protection cannot resist more, and entropy can only sometimes be estimated with any degree of certainty

Example (equivalency): `userAgent`

always one of four possible results based on equivalency of platform: i.e Windows, Mac, Linux or Android
tested for, hardcoded, can't be overriden by prefs etc
entropy can only be estimated e.g. based on downloads per platform

If it is "hopeful" that the buckets are at the minimum possible

Example (hopeful): `font enumeration`

on desktop, most or all fonts are bundled - Windows and Mac also allow some expected system fonts
all users should have the same fonts per platform
but there are protection gaps as the browser cannot control the system's fonts or users removing fonts

If there is a lack of knowledge

Example (unknown): `font measurements`

each language has it's own default fonts, style, sizes, and direction etc
devicePixelRatio (system scaling) affects sizes
zoom affects sizes
font settings affect sizes: such as antialiasing, clearType or fontSmoothing, etc
users can change font settings such as default sizes and font used
users can change trhe default font zoom
the unicode characters used in the font string being measured can fallback to different fonts based on more preferences
there is so much uncertainty and a lack of information, that tests and datasets are required to move forward

The only way to know with high confidence or certainty how effective resistance protections are is with large real-world datasets of crowd users, based on robust and sophisticated testing - to check protections are working, to indicate where differences exist for analysis and improvements, and with surveys to check entropy.

Test Sites

To reiterate from above

Not only is what is tested, how it's tested, and how robust it's tested important for quality ... but also how it's collected and how much is collected is also critical
In testing there are two dataset outcomes that are useful: buckets (values) and entropy (uniformity).

Test sites are an example of what NOT to use.

BUCKETS (values)

tests are not robust: therefore they do not show potential leaks or entropy
tests are not comprehensive; i.e not measuring enough metrics
tests may have bugs: e.g not stable, not reproducible, not accurate
tests are not universal between sites (so you can't compare)

For example, one site may report canvas as randomized, another site will report a hash (and claim it's unique). Who do you believe? What did they test (offscreenCanvas, toDataURL, and a hundred other parameters)?

IF YOU DON'T KNOW WHAT IS TESTED,
HOW IT'S TESTED,
OR HOW ROBUST IT'S TESTED,
THEN YOU CAN'T TRUST ANY VALUES

ENTROPY (uniformity)

if tests have bugs, are not comprehensive, or are not robust then the entropy is meaningless
the entropy is meaningless anyway as the data is tainted
- it is not real-world
- it is not one-result-per-user
- the datasets are too small

ANY PROBABILITIES OR ENTROPY
FROM TEST SITES ARE COMPLETE NONSENSE

Let's look at some examples: at the time of writing

Example: (test site)

coveryourtracks
- timeZoneName: Atlantic/Reykjavik = 1 in 6.17 (16.2%)
- language: en-US = 1 in 1.79 (56%)
amiunique (last 90 days)
- Firefox: 29.4%

Now lets look at reality. Note that whilst population and/or demographics and/or other methods for estimations are not pefect comparisions to internet traffic, in these examples it is sufficient to prove the point.

Example: (reality)

timeZoneName
- world population 2025: ~8.2 billion
- Icelandic (population ~400 thousand), Tor Browser (~5 million) and RFP users (?) would be lucky to number 6 million, but let's be generous and call it 8.2 million
- 1 in 1000 (0.1%)
language
- en-US: approx 1.5 billion (18.3%) people speak English either natively or as a second language, and not all those will use English by default, let alone English (United States)
Firefox
- April 2025: 2.55% market share ^{gs.statcounter.com)}, but let's be generous and call it 2.94%

Now lets compare the test site to reality. Consider by just how much each metric's probability (and entropy) are skewed, and be aware that each metric can affect many other metrics

Example: (test site vs reality)

language: 56% (0.84) vs 18% (2.48)
- test site probability is at least 3x higher
- entropy difference 1.64
- note: 18% is being extremely generous
Firefox: 29.4% (1.77) vs 2.94% (5.09)
- test site probability is approx 10x higher
- entropy difference 3.32
timeZoneName: 16.2% (2.63) vs 0.1% (9.97)
- test site probability is over 160x higher
- entropy difference 7.34

The language metric demonstrates how demographics are important and something so simple can easily and inadvertently start (call it 3x) to taint datasets from reality - something surveys and studies need to address and/or acknowledge. Many test sites are also language-centric.

Firefox is marketed and known as a privacy focused browser and has ties with Tor Browser and RFP - so it makes sense that this demographic is a bit more interested (call it 10x) in testing than the average.

Now consider timeZoneName - outside of real world use, only Tor Browser, Mullvad Browser and Firefox's RFP use Atlantic/Reykjavik in any meaningful numbers. This set of users are highly interested (call it 600x) in testing (and re-testing and tweaking and re-testing)

Now take into account that most metrics are not independent and they will skew and taint other metrics. For example: language choices affect default fonts and sizes as well as locale. Locale affects date and time formatting. RFP users will literally taint hundreds of other data points or metrics due to the wide resistance built into it. Firefox has thousands of differences to other engines.

Not only are these test site probabilities inaccurate, they cause high and very misleading entropy differences (log₂). Just testing with Firefox is underestimating entropy by approximately 3.32 bits which in turn affects metrics Firefox has major differences with. timeZoneName is a staggering 7.34 bits difference. Considering that 33 bits is supposedly the magical mark, just that one metric being off shows how futile test sites are for this task - it requires large scale, real-world, untainted surveys.

ANY PROBABILITIES OR ENTROPY
FROM TEST SITES ARE COMPLETE NONSENSE

The Future

I plan to write about this in more depth in a future article, but in general, fingerprint resistance overall has been the result of a decade-long haphazard, oft-times conflicting and contradictory, piecemeal mess of acculumated knowledge and application - and quite frankly, everyone can do better!.

This is not an indictment of any previous work, but a reflection that we have gaps but also that over that time we have experienced and learned and should now collectively be a lot smarter. I know we have gaps with built-in RFP tests - not a criticism, that's just the result of resources and piecemeal efforts over years long work. I know we have knowledge gaps. Some consensus would be also nice - imagine a Fingerprint Resistance API web standard that all engines adhered to.

“Knowledge is power.” - thorin

Moving forward, we need better tests suites. If all metric protections are deterministic (and in my opinion they should be), imagine a collaborative test suite all engines (and the public) could use - just like a Speedometer or a JetStream or an Interop. And for those without telemetry it could ultimately be used for surveys to check actual progress.

We need to be more proactive (I've been working on this problem for a few years) and hopefully get browser-space buy-in (I attended the 2025 Web Engine Hackfest and there was lots of interest over the fingeprinting space). It's not for want but a lack of time and resources. But it also seems to me that where it matters, there are discussions starting to mention and turn to these problems, and I hope to be part of it.

the end