(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=39653431

总而言之,“音频指纹识别”涉及将各种程序生成的音频发送到多个网站,收集和比较输出,并根据数字处理管道的实施引起的微小变化生成标识符。 虽然不一定需要音频输入,但该过程涉及使用浮点信号处理技术,由于浮点算术的不可交换性,导致输出略有变化。 这种方法可以揭示影响信号输出的实现中的细微差别,区分浏览器版本并为合法目的(包括欺诈检测)提供“私人”跟踪的机会。 有些人建议实施替代 API,为网站上的每个用户生成唯一的哈希值,尽管清除了 cookie 和存储,但可能消除跨网站跟踪的需要,尽管代价是牺牲用户隐私。 然而,最终,无论其潜在的应用如何,用户都不希望被采集指纹。

相关文章

原文
Hacker News new | past | comments | ask | show | jobs | submit login
Bypassing Safari 17's advanced audio fingerprinting protection (fingerprint.com)
245 points by valventin 1 day ago | hide | past | favorite | 220 comments










So they say this is for fraud prevention and that all other uses need consent.

On their front page they tell me how often I have visited and that my incognito mode does not prevent their tracking.

Isn’t that “other use”?

> Does Fingerprint Pro require consent?

> Our technology is intended to be used for fraud detection only; for this case, no user consent is required. However, any use outside of fraud detection must comply with GDPR user consent rules.



Another interesting technique to fingerprint users online is called GPU Fingerprinting [1] (2022).

Codenamed 'DrawnApart', the technique relies on WebGL to count the number and speed of the execution units in the GPU, measure the time needed to complete vertex renders, handle stall functions, and more stuff

________________

1. https://www.bleepingcomputer.com/news/security/researchers-u...



browsers should come with a default software renderer, and behave like the mic and camera where the site will require user permission to release the hardware GPU render path.


but nobody wants to use software rendering, that's the whole reason WebGL and WebGPU exist.


Alternatively: vanishingly few pages use WebGL and WebGPU so software rendering would work well as a default.

Safari still has show stopping perf bugs in WebGL 2 (a 2017 finalized spec): https://forums.developer.apple.com/forums/thread/696821 https://forum.unity.com/threads/unity-webgl-poor-rendering-p... so Mac/iOS users wouldn't notice a difference probably.



They should have thought of that before abusing these things to fingerprint us. Use of the GPU is a privilege and it can be revoked.


I sometimes unironically say that JavaScript is a privilege that should only be granted to websites that actually need it. Most of the web is text and images. No Turing-complete client-side runtime environment is required to display that.

But I would also accept all those multimedia APIs (canvas, WebGL, WebGPU, everything audio and video, including the ) and some others (e.g. service workers and everything else app-like) requiring a permission. Again, most websites don't need them, so given the abuse potential, there's no reason why they should be openly available.



You have noscript to block all that but it breaks the most simple sites these days. Part of it is legitimate like responsive design (though most can be done with css these days).

But most of it is bullshit tracking, anti-scraping and similar stuff.



You could just fingerprint the cpu then, every cpu behaves differently. Buy any number of the same CPU and you’ll see different aspects in every one of them.


This is why high-resolution timers are bad, but a website doesn't have the same level of access to the CPU as it does with something like webGPU.


there isn't a 'they' and an 'us' in this situation


"They" refers to web developers. "Us" refers to users. We are the owners of the machines where their code will run.

They have complete freedom on their servers. On my computer, I make the rules. They are lucky if I allow their code to run at all.



The point is that the "they" who abuse this and the "they" who use it for legitimate reasons usually aren't the same people, and so the "they" who abuse this have no incentive not to out of some concern about their legitimate uses being curtailed.


If "they" run ads which unfortunately most websites do then "they" are part of those who abuse the browser capabilities.


this is of course the ideal, but it is somewhat not to the point; as vidar says, the 'they' who are fingerprinting you have only a limited intersection with the 'they' who are doing awesome things with webgl like shadertoy or https://mitxela.com/projects/model-viewer, which doesn't even have google analytics

a somewhat bigger problem is that to a very significant extent the actual owners of the machines are microsoft, google, and apple, not the users; they make the rules, and the users are lucky if the owners allow their code to run at all. under those circumstances, blocking fingerprinting is practically quite difficult, because the 'they' who want to fingerprint you and the 'they' who make the rules about what code run on your machine are the same people, not two opposing groups

an additional problem is that an increasing part of the web is run by criminal elements like harvey weinstein and the rest of the mpaa, who will block you if they can detect you attempting to protect your privacy from them by blocking fingerprinting, even if apple decides it would be a good idea; cloudflare and google are perhaps the most prominent enforcers here, perhaps somewhat reluctantly



LibreWolf (a privacy focused modification of Firefox) disables WebGL by default for this reason.


Do you have any concept of how many gigawatts per day that would waste?


Very pedantic but I’d want to know. A watt is a unit of power, which means gigawatts per day is a rate of change of power.

If you want a unit of energy you need power multiplied by time not divided, so “gigawatt days” not “gigawatts per day”.



Maybe they meant "gigawatt-hours per day"


Gigawatt hours per day is just gigawatt hours per 24 hours, so just gigawatts but divided.


Which gives you power again, not work. "gigawatt-hours per day" = gigawatt/24


If you consider that watt hours is just a convenience unit for (3600) joules, then “1 gigawatt hour each day” correctly should be “3600 GJ/day” which works.


Which is a reasonable metric.


I’m not following. Why would disabling the GPU use more power? If anything, I would think it would reduce power consumption.


Hardware implantations of things like graphics routines can be hundreds of time more efficient than software implementations running on general purpose CPUs.

Try to decode mpeg video at HD resolution in software sometime.



The native GPU video decoding (and CSS and canvas GPU acceleration) can work without problems, if only webGL gets deactivated, what this was about here.

But you can probably also use those to fingerprint, but probably not as precise.



> Try to decode mpeg video at HD resolution in software sometime.

Firefox has only started to ship hardware accelerated video decode a few versions ago. Until very recently, all my video playback was software decoded.



surely, the user will be taught to enable the hardware for video if they start seeing stutter. Or the browser can prompt the user to switch to "high-end graphics" if it detects prolonged video decoding.

If a website that has no obvious case for using the GPU, but is instead using it to fingerprint, then the user won't experience any slow downs from a software renderer (as it is usually done relatively quickly).

If a website needs the GPU for their videos/graphics, but also incidentally wants to fingerprint you, you're shit out of luck in that case. But this is no worse than what we have current day.



Oh expect users to understand this? The same people who have spent the past 40 years getting confused and worked up about cookies?


> Do you have any concept of how many gigawatts per day that would waste?

It would still be way less than what large companies are burning on training proprietary LLMs. Do you think the ChatGPT model you use daily was the success at first go? And in that same world, consumers should not even try to protect themselves from GPU fingerprinting?

> The same people who have spent the past 40 years getting confused and worked up about cookies?

Stop with the condescension. It's not about being confused; it's about mitigating genuine privacy concerns. We're not idiots, and dismissing genuine worries won't make the issues disappear.



Maybe reflect on the irony presented by the two halves of your comment?


If the user experiences stuttering while decoding a video, they won't learn to enable special permissions for the website but instead switch to a different browser that hasn't yet implemented this "feature".

And most websites most users visit will need the GPU to be remotely usable. For them enabling specific permissions for every website they visit is very inconvenient.



Youtube disabling h264 because of licencing wastes about same amount of energy as very small country


Is that, in your opinion, a reasonable or unreasonable amount to flower to spend on video decoding?


Did actual test on macbook and IIRC it was extra 5-10watts. Rest you can multiply by hours spent on youtube worldwide. IIRC country was Haiti.


Perhaps a middle ground: Default to software acceleration when in Private Browsing mode, because obviously you want to be private, default to hardware acceleration otherwise.


Is it anywhere close to the amount wasted on mining shadycoins?


Just a guess, 1.21 GW?


LibreWolf does this, actually: it initially blocks websites from using WebGPU (and canvas) by default and then gives you a popup to grant them permission.


It’s almost like there should be a registry for naughty and nice websites and their capabilities


I feel like the constraints could open up an interesting demoscene too.


What's the point? The capabilities of browsers are so vast they'll just find other ways to fingerprint.

Privacy in browsers is a lost cause. It's a 30+ year old technology that has become ridiculously bloated in scope, with privacy and security only considered as an afterthought.



It's still possible to thwart it by random data into the fingerprint. Browsers should be capable of doing that without plugins ideally.

A bit like Ad Nauseam that loads ads in an invisible sandbox and clicks on all of them to mess up their reporting.



Can anyone explain why the results are different to begin with? E.g. why is this audio fingerprinting even possible in the first place?


The essence seems to be that the web audio API has a lot of algorithms that do a lot of math, and every browser has a slightly different implementation, and the exact results depend on the operating system and cpu too. So if you use the web audio API to generate a small signal all browsers will generate something that's really close, but the tiny differences can be used to help tell them apart.


But why would it vary in ways that are consistent run to run on one machine, but not consistent with the same process executed on another similar machine?


Every datapoint reduces the number of people it could belong to. CPU + browser + browser version + OS + major OS version can narrow it down by a lot.

Then add resolution, IP address location (which VPN they use is also a datapoint), which time they are active at, etc. and you can get a good almost-unique identifier.



Maybe we should make the browser implementations consistent to the point they can't be told apart. Alternatively, we can reduce the precision of the results so that the tiny differences are deleted.


Browser vendors already copy the Audio API code from each other, you can see it in their public GitHub repositories. It doesn't help.


that wouldn't help. you already know the browser and os through easier means


Like what? The voluntarily provided User-Agent? The browser is in control of that.


The browser in this adversarial scenario is also in control of the audio context too


Yet it is incredibly difficult to hide the underlying hardware or low level library differences. Not without slowing things down significantly.


Do you, as an end user, know how to change these settings compared to changing your user agent?


From the average user perspective those settings are equally impossible to change, as they neither know nor care that they even exist.


i think it comes from similar tricks that are played with webgl where there is a lot of entropy that comes from pc videocard drivers and the hardware itself.

it's a shame that browser people have to add noise to audio buffer handling to try and thwart it.



This was my first thought too, and they cover it in more detail here https://fingerprint.com/blog/audio-fingerprinting/#why-the-a...

TL;DR different codepaths even within the same codebase (e.g. SIMD variants) can result in subtly different floating point results (iiuc, likely related to to the fact that floating point math is unexpectedly sensitive to order of operations etc.)



Funny how when I posted more or less exactly this comment in another context recently a lot of people refused to take me seriously!

https://news.ycombinator.com/context?id=39633730



Floats are deterministic, though (if they weren't, this wouldn't be a workable fingerprinting vector). Reordering of operations (etc.) in a way that would actually change the results needs to be done by human edits, or with compiler options like ffast-math that explicitly allow the compiler to "break the rules" and make such changes. In either case, the concrete instructions emitted by the compiler will have deterministic behavior (and if they don't, that's a hardware bug)


What word would you use if sin(x) returns a different value on different platforms, or even different OS or librsry version? Sure smells like a fiction that depends on external state rather than simply its input.


The existence of platforms that do not conform to their own specifications does not make an operator non-deterministic.

Edit: I found this pretty great article on the subject https://randomascii.wordpress.com/2013/07/16/floating-point-...

We're both right, depending on how you frame the question.



sin(x) is platform-dependent.


Probably implementation details and compiler optimizations, float addition is not commutative for example. Implementing the same algorithm with the same formulas correctly can still lead to slightly different results


Floating point addition is not commutative, but it is still consistent. Getting different results is usually the result of using alternative algorithms or relaxing standards (that may, for example, reorder terms).


I don’t think the web spec implemented by the browser specifies the order of every operation, only the algorithm. If safari and chrome devs implement the audio api based on the spec, there can still be minor floating point differences because of the way they implemented the same calculations. That’s why they can fingerprint your browser versions with this.


I would have thought it might have yielded a machine and OS identifier - but more user specific than that?


That’s what it does, users on the same browser and same hardware should have identical fingerprints. It’s just one way of multiple to narrow down your fingerprint.


I feel like these days (especially given the recent focus on side channel attacks) it is basically a given that adding uniform noise to something that leaks data does not work, because you can always take more samples and remove the noise. Why did Safari add this? I understand that needing more samples is definitely an annoyance to fingerprinting efforts, but as this post shows it's basically always surmountable in some form or the other.


> Why did Safari add this?

A lot of Apple's "privacy" features nowadays are marketing. It's privacy theater. What matters is whether they can tell a plausible story to the public, not whether is technically effective.



Is iCloud Private Relay theatre? 3rd party cookie blocking? What specific features do you allege exist just to mislead the general public?


> Is iCloud Private Relay theatre?

https://fingerprint.com/blog/ios15-icloud-private-relay-vuln...

> 3rd party cookie blocking?

It's very funny that you should ask this question in response to an article about fingerprinting without cookies.

But yes, there are various workaround to use 1st party cookies or other storage to take the place of 3rd party cookies.

Perhaps the worst is the Safari "Privacy Report", which has always been misleading: https://www.simoahava.com/privacy/intelligent-tracking-preve...



The term “security theater” has a specific meaning which is not a bug or less than perfect protection: per its creator, “Security theater refers to security measures that make people feel more secure without doing anything to actually improve their security.”

https://www.schneier.com/blog/archives/2009/11/beyond_securi...

Public Relay is obviously not accurately described by that term and any rule which classifies it as such would be useless because it would classify all browser security as theater because everyone has had bugs, and everyone has had to adopt more sophisticated defenses to counter more sophisticated attackers.



> Your emotional reaction

Please refrain from violating the HN Guidelines. And I'm not sure which emotion you're talking about, other than my use of the word "funny".

> doesn’t make something security theater. The term has a specific meaning

I didn't use the term "security theater". I said "privacy theater".



Yes, and privacy theater is clearly an attempt to apply the same concept to a closely related topic. I edited my comment to focus on the problem here: you started with this absurdly sweeping claim which you’ve been unable to meaningfully substantiate throughout the thread. Trying to dismiss something as theater based on a bug fixed in the beta period is not only self-contradictory (you’re tacitly admitting that it’s not theater now) but also almost useless as a heuristic because very few products never have bugs.

Now if we want to talk about guidelines, consider that the broad claim you originally made would have to be widely accepted in the industry not to need supporting evidence, at which point it wouldn’t be contributing anything; since the opposite is true, the guidelines about flame bait cover it. It could have gone in a potentially useful direction if you’d been willing to define your terms and support them with evidence, and that would have helped suggest less hyperbolic terms. For example, if you said that Apple could do better at vetting and implementing their features I doubt many people would disagree with you.



Built in tracker blocking and the various ways Safari makes it hard to share user sessions with 3rd parties absolutely has real effect. It also has real costs: part of Safari’s poor compatibility reputation comes from websites that are broken by its tracking prevention features. This is why Google claims they haven’t rolled out the same. If Apple only cared about the problem at a superficial level, why wouldn’t they do the same as Chrome and talk a big game about the problem but continuously delay changes?

I’d say the privacy report is the only real false security feature, but Apple was a laggard in that market. For all we know, they could have been trying to match features with Ghostery or Brave that teach consumers this is a feature you should expect from your browser. Users may also have been needed education about that behavior in order to justify the compatibility regressions cookie blocking incurs. It’s impossible to know from the outside, but your body of evidence to support a really strong accusation is quite weak.



> If Apple only cared about the problem at a superficial level, why wouldn’t they do the same as Chrome and talk a big game about the problem but continuously delay changes?

If Safari behaved the same as Chrome, then Apple couldn't market Safari as more private than Chrome.



This is obviously untrue. People accuse Apple of marketing differences where none exist all the time. Thus the trope "X did it first" or "Y on Z is basically the same."


> People accuse Apple of marketing differences where none exist all the time. Thus the trope "X did it first" or "Y on Z is basically the same."

I don't know what you're talking about. What are X, Y, and Z specifically?



"Please note that this leak only occurs with iCloud Private Relay on iOS 15"

All software has bugs. I think it is more interesting to see how companies respond to reported issues. And how they improve things.

Is OpenSSL "theatre" because it had (bad!) bugs in the past?





Well I’ll add to my response that I think it is delusional to think that (security) features ship bug free. It is a bar that _nobody_ can or has met. It is not how the software world works at large.


> Well I’ll add to my response that I think it is delusional to think that (security) features ship bug free.

Have you considered that I'm not delusional, and my point may be more subtle than your straw man?

> It is not how the software world works at large.

Have you considered that I'm a software developer myself?



Do you have something more recent than a leak from over 2 years ago that has long been fixed? I'm curious why iCloud Private Relay is theatre at the moment.


> Do you have something more recent than a leak from over 2 years ago that has long been fixed?

Why are you letting Apple off the hook for shipping a feature that had a massive flaw from the start? It was advertised as being private, but it wasn't.

> I'm curious why iCloud Private Relay is theatre at the moment.

It's still true that iCloud Private Relay covers only a limited amount of traffic on your device from a limited number of apps. It's leaky by design, not a full VPN.



And a VPN leaks a lot of information about your network activity to the operator, so by your standard it is privacy theater. Do you see why you’re coming across as having inconsistent standards and thereby perhaps an axe to grind?

iCloud Private Relay is used for all network activity from Safari which does not seem like a “limited amount of activity.”



> And a VPN leaks a lot of information about your network activity to the operator, so by your standard it is privacy theater.

A VPN isn't designed to keep your IP address hidden from the operator. iCloud Private Relay doesn't hide your IP address from Apple either. That's not the point, and everyone knows this in advance. The point is to keep your IP address hidden from the request destination servers.



Your logic is that any flaw in an implementation renders it useless. In the case of VPNs, operators can and do share information about clients to destination servers, law enforcement, and more out of band. Just because it involves a spreadsheet and not a WebRTC request does not mean it can be forgiven if you're going around making absolutist claims regarding efficacy.


> Your logic is that any flaw in an implementation renders it useless.

I didn't say that. It's a straw man.



You said that iCPR is privacy theatre because of a resolved security bug from 2 years ago. Please spell out the implication of that claim for me then.


> Please spell out the implication of that claim for me then.

I'll spell out my views below, but I want to start by noting that I don't agree with the way you've characterized them. Going all the way back to your initial reply, I don't like the way this leading question was phrased:

> What specific features do you allege exist just to mislead the general public?

I think Hanlon's razor is a false dilemma. With a big company like Apple, there's typically a combination of bureaucratic incompetence and marketing exaggeration. Clearly, Apple leadership has decided to make privacy a consumer differentiator for their products, so they have a financial incentive to hype privacy features as much as possible. As a consequence, Apple management would be eager to be pitched any and all privacy features from engineering; these may even lead to bonuses and promotions, though that's purely speculation on my part. Regardless of the personal motivations of employees, the company is pursuing privacy features in earnest and isn't intending for them to be fake. Nonetheless, the company also has the unfortunate habit of shipping half-baked features and implementations. This is driven largely by the artificial, forced march of the annual release schedule, which demands that great new features be continually announced at a certain time, whether they're ready or not. The situation is not unique to privacy features either; Apple's entire software product line is suffering in quality. Engineering simply doesn't have enough time to do things right, which results in new features that are superficial and/or flawed. You could say it's marketing-driven incompetence.

Several commenters have mentioned that all software has bugs, as if that were somehow profound, or as if I were somehow ignorant of software development as a software developer. (I actually had to spend some time fixing a bug before I wrote this reply.) But not all bugs are created equal. From my perspective, a bug that's discovered relatively quickly by someone else is worse than a bug that's discovered only years later, in the sense that it suggests insufficient QA on the part of the developers, who themselves should have noticed the bug before it shipped. And a bug in the primary functionality of a product or feature is worse than a bug in a more obscure part of the software. This is why I'm not impressed by the length of time since a bug was fixed; if a feature or product was shipped with an obvious, fundamental flaw in its main functionality, that's a stain on the reputation of the developers. And if they keep making such mistakes, why should you ever trust them to be competent? No bug fix can fix the bug writers.

I don't want to focus too much on iCloud Private Relay, though. It wasn't what I had in mind when I was writing my original comment, and I don't even use iCloud Private Relay myself. I mostly don't use a VPN, except on rare occasions. I've discussed iCloud Private Relay here only because you asked me about it.

It's been a busy afternoon/evening for me, so I've kind of run out of steam now on this comment, but I promised I would reply.



> It was advertised as being private, but it wasn't

Signal had a bug once. Herego, it’s a scam?



> Signal had a bug once.

Are you referring to this?

https://www.forbes.com/sites/daveywinder/2019/10/05/signal-m...

It was a bad bug in the Android client, to be sure, but it didn't bypass Signal encryption.





That's an wild accusation to make without citations.

It doesn't even apply in this instance, since Apple's work on fingerprint resistance still results in real privacy improvements even when later shown to be imperfect. It means Apple has to improve what they've already done, not that what they've done so far is mere "marketing" or "theatre".



> That's an wild accusation to make without citations.

Shall I cite my list of CVE? Or perhaps it would be more interesting to cite my list of unfixed 0days.

> It doesn't even apply in this instance, since Apple's work on fingerprint resistance still results in real privacy improvements even when later shown to be imperfect. It means Apple has to improve what they've already done, not that what they've done so far is mere "marketing" or "theatre".

What does it say about Apple engineering that that they keep shipping features with very obvious and/or predictable flaws?



I haven't seen marketing related to audio fingerprinting protection. Maybe Hanlon's applies here.

As for your point about the pattern of vulnerabilities: I'd attribute this to being closed source. They keep shipping security features with limited auditing, and only discover flaws in production.



> I haven't seen marketing related to audio fingerprinting protection.

Apple announces powerful new privacy and security features: https://www.apple.com/newsroom/2023/06/apple-announces-power...

WebKit Features in Safari 17.0: https://webkit.org/blog/14445/webkit-features-in-safari-17-0...

In general, Apple is trying to market itself as the privacy company. "What happens on iPhone stays on iPhone", yadda yadda.

> Maybe Hanlon's applies here.

I think my view is in alignment with Hanlon's razor. I don't think it's necessarily malicious deception. Rather, Apple has a habit of shipping the laziest implementations and slapping a "privacy" label on them, but the public doesn't know that these are lazy half-measures.

> As for your point about the pattern of vulnerabilities: I'd attribute this to being closed source.

WebKit is open source.

> They keep shipping security features with limited auditing, and only discover flaws in production.

I don't think this is a closed/open source issue. It's just bad engineering.



Bad engineering yet state of the art. What are Chromium’s protections against web audio fingerprinting?

In the game of tracking, minor hurdles are great at stymying many actors.

And finally, your citation in response to someone saying they haven’t seen Apple market web audio fingerprinting protections has no references to said feature. Are you saying all the privacy features in that press release are a smokescreen? It’s quite unclear.



> What are Chromium’s protections against web audio fingerprinting?

I'm not aware of any. But they aren't advertising fingerprinting resistance either.

> In the game of tracking, minor hurdles are great at stymying many actors.

That's questionable.

> And finally, your citation in response to someone saying they haven’t seen Apple market web audio fingerprinting protections has no references to said feature.

There were multiple antifingerprinting methods in Safari 17. The linked articles referred to them collectively.



>> In the game of tracking, minor hurdles are great at stymying many actors.

>

> That's questionable.

It's basically indisputable. Ask any online advertising buyer about the effectiveness of audience targeting for Safari users versus the competition. Or consider the ability of the average website operator to adopt Fingerprint.js instead of whatever half-broken tool their usual audience measurement provider offers them.

https://blog.google/products/chrome/privacy-sandbox-tracking...

> Chrome is testing Tracking Protection, a new feature that limits cross-site tracking.



“Apple’s quantity and resolution rate of security bugs undermine its privacy marketing” and “Apple’s privacy marketing is a lie” are two very different claims, and it seems like you meant to make the first. Even that claim though is unsupported since Safari users are definitely harder to track across the web in practice than Chrome users.

> Shall I cite my list of CVE? Or perhaps it would be more interesting to cite my list of unfixed 0days.

The list of vulnerabilities is not very informative for the same reason a trackers blocked statistic is not. It doesn’t give any baseline for comparison and may just be a reflection of how important and interesting to security researchers the target is.



I think you missed the point of my comment. When I said my list of CVE and my list of unfixed 0days, I meant that literally: CVE attributed by Apple to me, and unfixed 0days that I personally discovered. I wasn't making a "wild accusation".


No I understood exactly what you meant. The number of reports is not helpful data without a lot of other context, but you offered it as if it would be convincing or definitive. How many CVEs and 0-days have you filed against Audacity? Is it because that software is security bug free?


> No I understood exactly what you meant.

That's a rather bold claim, unless you're a mind-reader.

> The number of reports is not helpful data without a lot of other context, but you offered it as if it would be convincing or definitive.

I didn't give a number. I only said I have a list. It seems that you're still missing my point, which was simply that my knowledge of and experience with these specific technologies means that my original comment was not a "wild accusation". That's it, that's the whole point.

> How many CVEs and 0-days have you filed against Audacity?

I don't use Audacity, and I have no idea how it's relevant here.



> That's a rather bold claim, unless you're a mind-reader.

It seems like you have me confused with someone else in the thread who used the phrase "wild accusation" and are responding rudely. I think your original comment was needlessly exaggerated and inflammatory and defending it, instead of clarifying it, is a bad look. Clearly you have an axe to grind with Apple, and my advice to you is you should put a little more effort into hiding it if you want others to take you seriously.



> It seems like you have me confused with someone else in the thread who used the phrase "wild accusation"

No, I'm not confused. But that comment was the context for my mentioning CVE and 0days, which you decided to discuss yourself.

simondotau: "That's an wild accusation to make without citations."

me: "Shall I cite my list of CVE? Or perhaps it would be more interesting to cite my list of unfixed 0days."

you: "The list of vulnerabilities is not very informative for the same reason a trackers blocked statistic is not."

If you don't want to discuss my previous quoted comment, that's fine, but you have in fact mentioned it and continue to mention it. Thus, the context is very relevant.

> and are responding rudely.

Where exactly was I rude?

> I think your original comment was needlessly exaggerated and inflammatory and defending it, instead of clarifying it, is a bad look.

I would be happy to clarify it, but the first time you asked for clarification was here: https://news.ycombinator.com/item?id=39661492

I'll respond to that comment, though it may take some time.

> Clearly you have an axe to grind with Apple

I've been a Mac user for more than 20 years, a professional Mac developer for more than 15, and I currently sell apps in the Mac App Store and iOS App Store. Do I have critiques of Apple? Yes, of course. However, they are the critiques of an insider who has no intention to leave the ecosystem.



Someone definitely correct me if I'm wrong, but the success of the fingerprinting workarounds here seem to boil down to the following choice wrt handling oscillator anti-aliasing in the Web Audio API spec:

"There are several practical approaches that an implementation may take to avoid this aliasing. Regardless of approach, the idealized discrete-time digital audio signal is well defined mathematically. The trade-off for the implementation is a matter of implementation cost (in terms of CPU usage) versus fidelity to achieving this ideal.

It is expected that an implementation will take some care in achieving this ideal, but it is reasonable to consider lower-quality, less-costly approaches on lower-end hardware."

AFAICT this means that the OscillatorNode output they are exploiting here is almost guaranteed to not be deterministic across browsers (or even in the same browser on different hardware). The non-determinism is based on whatever anti-aliasing method is chosen by the browser (or, possibly, multiple paths within the same browser which could get chosen based on the underlying hardware). This includes changes/fixes to the same anti-aliasing algos.

I don't really understand this choice of relegating anti-aliasing to the browser given that:

a) any high-quality audio app/library will want full control over how the signals they generate avoid aliasing and will not use these stock oscillators anyway, or

b) the kinds of web applications that would accept arbitrary anti-aliasing algos (and the consequent browser-dependent discrepancies therein) probably wouldn't care whether the aliasing algo is hardcoded SIMD instructions or some 20MB javascript web audio helper framework

1: https://webaudio.github.io/web-audio-api/#OscillatorNode

Edit: clarification

Edit 2: more clarifications. :)

Edit 3: I wonder if the same kind of solution could be used here as was used by Hixie to standardize the HTML5 parser. Namely, just have some domain expert specify an exact, deterministic algo for anti-aliasing that works well enough, then have all the browsers use that going forward. I'd bet the only measurable perf hit would be to tutorials that show how to use the web audio api to generate signals from the stock anti-aliased oscillators. :)



Quality anti-aliasing is expensive.

So you want to allow the implementation to decide how much to spend on it depending on available compute, battery and so on.



Putting a node graph audio API in the browser was silly. It should have been just audio worklets.


Wasn't Mozilla's proposed audio API simpler? AFAIK it was beaten out by Google's because people wanted a richer API and lower latencies.

https://web.archive.org/web/20120505042746/https://developer...



IIRC it turned out that way in large part because realtime audio is very sensitive to performance hitches, and idiomatic JS is hitchy by nature due to relying on garbage collection, so they wanted to hoist as much as possible up into native code provided by the browser. If WASM had existed at the time it would have been easier to make the case for just exposing a simple raw audio interface instead.


Well... Mozilla had ASM.js at the time. In part to showcase their superior performance with certain portions of JS compared to V8 - at the time I remember the things like console emulators preferring Mozilla's JS engine due to it offering more reliable performance than V8 on the tight loops and large switches. Mozilla was also demonstrating how their engine could offer comparable performance to Google NaCl in an image processing demo which was conceived to show how NaCL could cover limitations in V8 at the time.

I wonder if we might well have had more traction with Mozilla's approach and ASM.js if V8 had had similar features.

Oh well. Is what it is, and Mozilla (and Microsoft and Apple) did at least manage to get WASM which has been super useful even outside of browsers.



(That is, I'm pretty sure ASM.js uses same trick WASM does given it was the predecessor - just preallocate a ton of memory in an array, and work with primitive types, and no GC to worry about most of the time)


Did people want lower latencies though? It seems a bit absurd given other compounding factors that have a play on that.


Why's that?


This is gross.


Exactly my thoughts. Interesting, but gross.

I wonder why audio API's are even available without giving a website permission? It feels like this could easily be fixed with a simple "This site would like to use your sound devices"-dialog.



It raises the question of whether the current networking stack is the one we want to have for the next 100 years. The internet in its current form has ruined a lot of the dream of personal computing because companies (and the state) are so asymmetrically powerful versus individuals. Should it be possible for my technology to send data to a server without my explicit approval?


> Should it be possible for my technology to send data to a server without my explicit approval?

Of course not. We should be able to intercept every request, filter them, even modify them to send fake data instead if we wish it.



Yeah. Can't believe these guys are proud of it!

On the other hand, I did clear my browser cache and switched on the VPN, and they mis-identified me as a new visitor.

Still, despicable business model.



I assumed a level of irony here, from fingerprint.com. It’s like if a website popped up popularising loopholes to get around tax burdens as an attempt to disgust the world into closing those loopholes.

Even if that’s wishful thinking, there’s still immense virtue in publishing this research and getting it out in the open. If an article gets published explaining how a particular brand of green backpack helps with shoplifting do we worry that everyone’s going to shoplift more? I’d err more on the side of knowing shops are more likely to catch on to the tactic.



Unfortunately in this case, the website does content marketing with known, easy to fix vulnerabilities presumably to put competition out of business while keeping unknown, harder to fix vulnerabilities as part of their "pro" products.


It seems like rather than adding a random amount to each sample (which lets them compute a mean by recreating the same audio and extracting out the differences), Safari could instead add randomness that is based on a key that rotates every hour. (Function of audio sample and key, so the noise would be the same in a given session, but useless for tracking an hour later).


Wouldn’t it help if the noise added were deterministic based on origin? That way it can’t be averaged out by oversampling. So something like RNG_SEED = HMAC_SHA256(PERSISTENT_SECRET,Location.origin)


If you averaged together ten such samples, you'd get something that approaches the true values from the device. The more samples you have, the closer it would get.

Fixing this would require removing the information leak entirely, not just masking it under a layer of random deviations.



The grandparent post accounted for exactly that criticism. By having the source of randomness fixed for a limited time period, a fingerprinting algorithm wouldn't be able to gather enough unique samples for averaging to be useful. And given the extremely fine differences in the floating point numbers, any injected noise would so overwhelm the data that you'd need hundreds, perhaps thousands of samples in order for averaging to be useful.


I'm really ready to just be "that guy" that browses with JS disabled.


The problem is that that by being "that guy" you're probably giving them 10 bits or more of identification. If they can just scrape a few more bits from somewhere they'll have you uniquely identified.

But, yeah, these guys can get on Golgafrinchan Ark B with the rest of the adtech industry as far as I am concerned.



I need to be more offline then.


It won't save from fingerprinting: https://fingerprint.com/blog/disabling-javascript-wont-stop-.... Though I don't think it's used in practice, because disabled JS is a red flag on its own in situations where fingerprinting is used for security.


Good luck. It's amazing how little of today's web is good old HTML. A while ago I visited a website that used Markup - but that wasn't compiled into HTML and then statically served, oh no - it was rendered in JS client side. WTF.


Join me, and do it! There is a great Firefox extension called uMatrix, which makes it easy to disable JavaScript not just on a site-by-site basis, but also by subdomain (and easy to re-enable for sites that break without js).


Good luck, I recently gave up that fight after needing to disable it to view the content of nearly every single website I visited.

It's not even just cloudflare and similar DDOS checks, but now even things that should just be in the HTML of the page are loaded with JS.



It's exactly such reasons why TOR browsers would have JS disabled.

As the internet gets more and more hostile, this will become more and more correct.



they compute the deviceid on the server using signals from the browser. i think having js disabled is just another signal.


Couldn’t you just replace the prototype of the Audio API to return back whatever you wanted? The difficulty would be in getting enough fingerprints for your desired imitation but the article itself seems to have that information.


Does this technique fingerprint based on hardware/driver/OS differences with audio processing, or just the browser software?

I believe there are (or were, hopefully) similar techniques using that exposed differences between the underlying graphics devices.



This is similar. Audio algorithms often call OS functions and make use of CPU optimizations. One example they mentioned is the fast-fourier transform (FFT). All OS's include a version of that function but it tends to be optimized over time, and tends to behave differently on different CPUs depending on what SIMD instructions are available.


Does disabling web audio beat this fingerprinting?


I always assume that any difficult/annoying anti-fingerprint techniques make you more identifiable instead of less, since very few people do them.


This is why the Tor Browser attempts its best to only have one fingerprint. The more people use it, the more this argument looses its edge.

Or so they say.



This. That's why I feel we are all doomed regarding privacy. The only way we could maybe protect ourselves would be to all send manipulated but looking like plausible average data.


Disabling it would be weird and therefore make you more identifiable.


I really don’t see how this can come up with more than a few thousand unique combinations. Browser type x browser version x os version x accelerator version x … what else? That doesn’t seem like enough variation to create anything remotely unique. I don’t get it.


Combinatorics is a harsh mistress


I expected this article to be published by some hackers or defenders of privacy like EFF, not by a company whose goal is to fingerprint people. Such dystopian times.


Why am I supposed to want any website I visit to be able to render audio offline anyway?


There’s a push to make every single last thing a normal application can do, available to web apps through some half-standardized JavaScript API or another. Generally google comes up with use cases, implements it in chrome, and tries to call it a standard. Then everyone complains when Apple doesn’t implement these standards fast enough, and that Safari is “holding back the web” or “the new IE” because it’s not keeping up with every last feature Chrome implements.

I would prefer websites just be websites and that we don’t have every single damned API available to whatever trashy site I accidentally click on, but I guess you and me are outliers here. Most people on HN seem to welcome every single JS API because web development is the only platform anyone seems to care about any more.



Things like this seem niche enough to safely put behind a permission dialog. 99.9% of sites/web apps have no legitimate need for this functionality.


That’s how location services and notifications work today, and all it means is that websites just constantly nag me to enable them.

Things like this make for a more annoying web all around, because now it’s just one more tool sites can use to track me and increase engagement. (Edit: sibling poster chuckles said it way better than I can.)

If I had my way, JavaScript on the web would be limited to XMLHttpRequest and basic DOM manipulation and couldn’t do anything else. A totally separate “rich” JavaScript engine could be opted into by the user for any website that presents itself as an “application” like ones that legitimately want audio API’s like these. All these half-baked web app “standards” that google is forcing down our throats can be confined to that leper colony.

Then the most important bit: browsers could let me completely disable the “rich” engine, and I can go back to having a sane web experience again.



> That’s how location services and notifications work today, and all it means is that websites just constantly nag me to enable them.

It also means you can tell the browser to outright deny every request, thus avoiding even getting prompted. If a website detects the request was denied and still prompts you any other way, that’s an undeniable signal to close the tab and never return.



Right, I think the fact that these features exist at all means sites are gonna ask for them… even if your browser denies it, the site can easily pop up a dialog saying “hey you should give us notification access!”.

The result is that the web just keeps getting incrementally worse and worse. It’s all good intentions in creating these API’s but the result is that everything just gets more terrible.



I’m kinda surprised that no fork of Firefox has added both global and domain-scoped toggles for web feature support. I know there’s flags in about:config but that only covers some things and isn’t very user friendly.

That’d let users turn support for all the fancy bits off by default and enable them in the tiny handful of cases that they’re actually desired. This way as far as sites are concerned your browser simply doesn’t support those features and thus can’t nag you.



It’s not as extensive as you’re asking for, but check out the Firefox extension uMatrix.


Permissions dialogs solve for a problem product creators have and create more problems for users.


When a surveillance company (in this case Google) is leading the push, security against surveillance isn't on their list of requirements. In fact it's more of an anti-requirement, which escapes human judgement via design by committee or other anti-scrutiny technique. So then we end up with yet another insecure API that we've got to suffer for years as the browser makers who actually care about security painstakingly figure out how to mitigate the vulnerabilities in the original standard.

And I'm all for focusing on technical security, but it's worth mentioning that the biggest most concentrated win would be making commercial digital surveillance illegal (ie the path the GDPR tries to head in). Imagine if large public companies had to make their revenue by honest means instead of working as advanced persistent threats.



I think fewer people would be in favor of this if apple just let you download native apps and run them on your iPhone like any other computing device.


Apple lets you run native apps on the iPhone. You get them through the App store.




Audacity's an awesome piece of software that I've used many times. Never once have I thought "by golly this thing should be a website, and my web browser should be made to expose an audio graph API to every website I visit to that it can be so!"


I'm the opposite. I think website = sanboxed, native = pownage so whenever I can use a website version I often prefer it over a native app.

I use photopea all the time now. it's available on every machine, even machines I don't have permission to install software on



A "sandbox" that can freely send anything it wants to any computer connected to the internet


You can sandbox native apps too. Hell, even run them in an airgapped virtual machine.

I wouldn't trust a browser sandbox all that much given the high interest in subverting it.



What’s an easy way to sandbox apps on Windows?

Sorry, I’d prefer to stick with my operating system, not install QubesOS.



sandboxie was neat last time i tried it.


I firejail browsers specifically to stop them from playing audio, it's much easier than playing wack-a-mole with the browser settings.


And they absolutely love that you, and every other user, sends them every original image you choose to edit

So when you make a meme or whatever and repost it somewhere else, guess what, they know who did that!



Get with the times. Your privacy must be sacrificed so some random web app you have never heard of can do something no website should be able to do at all. Or maybe that’s just a pretense and not the real reason Google keeps adding all these APIs. People seem to forget that ChromeOS is literally Google trying to turn every computer into a thin client for their services.


We'll make sure we stop building software you could imagine then. /s

There's value here. Other people are allowed to want more than you want.



I look forward to the day the EU makes fingerprinting illegal.


Defining fingerprinting in a legal terms is fairly difficult.

Most regulators would also likely consider fingerprinting for certain use cases as acceptable. E.g., detecting abuse, fraud, CP, etc.



How is that difficult? Anything that is stored to recognise a user on two different devices/sites is fingerprinting.


let's just make fingerprinting for advertising illegal, and then go from there


Me too. But more likely they will do the opposite. Apple’s anti-fingerprinting is anticompetitive to the market for European data trackers or some such bullshit.


Fingerprint states that this service is for fraud detection, but they are actively discussing how they are circumventing browser privacy protections.

So as a user my preference not to be fingerprinted or tracked takes a back seat in the name of fraud detection?

So we should allow police to wiretap in the name of crime prevention?



Tale as old as time. Think of all of the legislative attacks on encryption in the name of protecting people.


My charitable take is that it does take both ends of the spectrum to arrive at a solution that does not exactly satisfy everyone, but is an acceptable place to stop the impossible arms race. The unfortunate reality is that we are nowhere the end of that race.

Admittedly, that was the first time I read about fingerprinting in this manner and bypassing explicit privacy protections is definitely not something I would want for my future self ( or that my of my family ).

In other words, I think you are right. Privacy probably needs to be codified. It may seem hard to do given existing entrenched interests, but you have to start somewhere. Not that long ago people thought buying people is 'just the way world works'. Things can change. Slowly, but they do.



> So as a user my preference not to be fingerprinted or tracked takes a back seat in the name of fraud detection?

the issue is murky for certain use cases. take payments for example. fingerprinting is used at scale in that field, and for good measure. you want to be able to know the risk associated with a user (chargebacks, fraud, etc).



> they are actively discussing how they are circumventing browser privacy protections.

I'd love to see a successful prosecution as something like a US CFAA violation, setting a precedent that puts the fear of god into the widespread slimy side of our field.

But I suspect it will have to be a non-US country leading that, because a lot of the US economy and power is now tied up in widespread slimy behavior of our field.



Did I read this correctly and audio fingerprinting is mainly about identifiying the used browser version and OS or laptop, but it cant identify end-users in a stable way?


Yeah, it doesn't tell a website who you are. Instead, it allows them to recognize you again when you come back to visit again, even if you clear cookies.

This is particularly a problem with big advertiser networks because they can track you across many sites you visit, even if you disable third-party cookies.

It has positive uses too, like preventing click fraud and concert ticket arbitrage.



>Instead, it allows them to recognize you again when you come back to visit again, even if you clear cookies.

I don't think that's what stockhorn said. stockhorn said it can only identify a what browser and OS and laptop model you're using. Someone else with the same browser, OS, and laptop model would have the same fingerprint. So audio fingerprinting couldn't precisely recognize you again when you come back again.



> Someone else with the same browser, OS, and laptop model would have the same fingerprint.

the collision rate of their ids is stated to be 0.05%

what they do is basically collect a lot of signals from the browser (audio processing stuff being only a part of it) and then compute an id on the server.



Browser, OS, laptop joined with IP looks like a pretty good ID


IP is a pretty good ID...


NAT really.


I see what you did there…


Not if you’re behind something NAT’d, which is especially true on mobile.


Still, parent does state a pretty big concern when looking at this from a higher vantage point.

These practices and their repercussions aren't self contained.



My phone running Firefox for Android produced the same results as the sample data for Firefox on Windows which does seem to fit with this largely being a browser identification scheme


I think that is correct, but it still seems like an amount of leakage that could be further correlated with other another trick.

There was previously a site which could indicate how globally unique your environment was (some combination of screen size, user-agent, fonts?, etc). Locking down to a specific hardware+browser configuration probably does a lot to remove anonymity.





Not the one I used, but this one actually looks better.

Just being Linux + Firefox is terrible for blending into the herd. Let alone everything else that leaks (having a desktop + GPU + good monitor basically destroys all remaining hope).



Probably was EFF's panopticlick, which has evolved into https://coveryourtracks.eff.org

The about page has some history https://coveryourtracks.eff.org/about



> Fingerprinting is used to identify bad actors when they want to remain anonymous. For example, when they want to sign in to your account or use stolen credit card credentials. Fingerprinting can identify repeat bad actors, allowing you to prevent them from committing fraud. However, many people see it as a privacy violation and therefore don’t like it.

This doesn't seem to acknowledge the use of fingerprinting in intentional violation of the privacy of ordinary people, for marketing profiling and just selling them out because someone is willing to pay.

On https://demo.fingerprint.com/ , they do start to hint at non-anti-fraud purposes, but the use case seems to be full of poo. (Logins or cookies are the way to do this. Anything else is trying to circumvent privacy mechanisms. And if they don't distinguish users perfectly, they're doubly violating privacy by then leaking private information between people.)

> Personalization -- Improve user experience and boost sales by personalizing your website with Fingerprint device intelligence. Provide your visitors with their search history, interface customization, or a persistent shopping cart without having to rely on cookies or logins.

Popup warning on "https://demo.fingerprint.com/personalization":

> Heads up! -- Fingerprint Pro technology cannot be used to circumvent GDPR and other regulations and must fully comply with the laws in the jurisdiction. You should not implement personalization elements across incognito mode and normal mode because it violates the users expectations and will lead to a bad experience. -- This technical demo only uses incognito mode to demonstrate cookie expiration for non-technical folks.

Sounds a bit like a disingenuous bad actor doing CYA while demonstrating their capabilities, nudge, nudge, wink, wink.



Their tool is priced too expensive to be used for marketing purposes in most cases.


Tl;dr: Apple’s implementation adds random, uniformly distributed noise, so running many samples one can back out the noise.

Kind of a naive design but easily fixed.



These predatory practices are getting out of hand. Props for them to expose this, even though they're the "bad actors" from my point of view.


They're not exposing anything, they're advertising their services. It was also posted by one of their employees.

Funny how no one seems to notice that, and they're all praising the article.



"Funniest" part is that this page also tries to establish a webrtc connection which i know because my firewall told me browser tried to connect via nat-stun port to some server. Webrtc is a common way to fingerprint vpn users because in some setups it leaks your real ip.


[flagged]



Based on the article, it sounds like this doesn't activate a device's microphone at all. If it did, most (all?) browsers would give a pop-up requesting permission for that.


This has nothing to do with the microphone...


Then where are these audio samples coming from?


From the article: "In a nutshell, audio fingerprinting uses the browser’s Audio API to render an audio signal with OfflineAudioContext interface." It links to a previous article with more details:

https://fingerprint.com/blog/audio-fingerprinting/

Here's an example from that article of a sound source:

    const oscillator = context.createOscillator()
    oscillator.type = "triangle"
    oscillator.frequency.value = 1000


This is using differences in the audio processing pipeline of the browser, they just use some input sound which could be taken from a file. The fingerprint is the slightly different output signal when applying filters to the input signal.


How is it possible that this produces enough variations to be usable without sampling some sort of audio source? The entire pipeline is digital, there is not any room for interference.


Please stop wasting everyone's time with your random assumptions as to why this does or doesn't work and just click on the link in the article to the detailed explanation of exactly how this works.

> The technique is called audio fingerprinting, and you can learn how it works in our [previous article].

https://fingerprint.com/blog/audio-fingerprinting/



It’s doing signal processing using floats, that can lead to differences in the result even if the implemented algorithm is identical. Float addition is not commutative so reordering some calculations, either in different implementations or with different compiler options, can lead to slightly different results. This just detects browser version and maybe OS/Architecture, the same browser binary should still give the same results between different devices with same hardware.


They just generate a sine wave and do some processing on it. The results are very similar but because the processing depends on functions like fast fourier transform, plus the exact algorithm in the browser code, tiny differences emerge.


It's about variations in the implementation of the digital pipeline that are traceable to the output. It has nothing to do with analog processing or interference.




I missed where the microphone is used, it looks like it's only using the output pipeline?


But all iPhones of the same model have the same processor. Every iPhone 15 Pro Max, of which Apple sells hundreds of millions, all have the same processor.

Why do they have different results?



They don’t. If you read this post carefully, it just claims to be able to tell Intel Macs from ARM Macs. It can also distinguish from older Safari versions that don’t have the fingerprinting protection.


They have different performance at different heat and battery levels.


(This would of course require a different technique to uncover.)


> Every iPhone 15 Pro Max, of which Apple sells hundreds of millions

They sell ~200M iPhones per year, but I doubt most are the most expensive model.



Buy the same CPU 10 times and benchmark them, they all score differently.


I think web browsers should implement already an API that allows developers to track any user in a "private" way, by generating a unique hash using your computer specs or something, and make it different for each website.

So, if you visit Google, your hash would be something like "h38kflak". If you're visiting twitter, the API would generate something different, so you won't be tracked across websites.

That way, even if you clean your cookies, you can still be identified as the same user.

The use case? Fraud detection and that kinda stuff. For example, you may create a web game where you allow users to play instantly without "creating" an account. So, an anonymous account would be created in the background, in order to log in. Any bad actor can just clear their cookies/storage to bypass a ban. IP banning isn't reliable, as multiple users may share an adress.

It's a shame that we have to rely on web api hacks in order to fingerprint users for legitimate reasons, and that ends up in an eternal cat and mouse game, because anything you try today may be broken tomorrow.



Because users do not want to be tracked or fingerprinted. I don't care about fraud detection and I am not a fraudster so why do I have to be tracked? There is no way that a feature like that would not get abused in one way or the other.






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com