Chrome 正在巩固第三方 cookie，这会误导用户

Chrome 正在巩固第三方 cookie，这会误导用户
Chrome is entrenching third-party cookies that will mislead users

原始链接: https://brave.com/blog/related-website-sets/

谷歌浏览器的一项名为“相关网站集”的新功能可能会让互联网用户面临隐私问题，尽管声称它只会在附属网站内实现数据共享，同时保持与删除第三方 cookie 所达到的隐私保护水平相当的水平。一项调查显示，用户很难识别网站之间的关系，因此他们很可能会因“相关网站集”而遇到意外的数据共享。由于隐私风险，主要网络浏览器决定排除此功能，这表明它可能对 Chrome 用户构成潜在危险。该报告的作者来自大学和 Brave Software，他们认为“相关网站集”主要是为了让广告商获利，而不是保护用户隐私。有人建议用户可以预见网站关系，但结果表明并非如此。许多网络用户未能猜测看似链接的站点之间的联系，这与“相关网站集”的运作基础相矛盾。此外，研究表明，许多网站在非 Chrome 浏览器中也能正常运行，这表明实施“相关网站集”的必要性是没有根据的。从本质上讲，这项研究表明，谷歌在呈现定制内容的幌子下，出于广告服务的目的故意破坏用户隐私。文章强调，“相关网站集”削弱了网络隐私模型的基础，以牺牲个人用户隐私为代价，为网站和广告商带来好处。

Firefox 和 Chrome 是流行的网络浏览器。从历史上看，Chrome 消耗的系统资源比 Firefox 少，尤其是在内存有限的情况下。然而如今，Firefox 提供了增强的隐私功能，例如仅 HTTPS 模式、加密 DNS 以及对 SOCKS 和 Encrypted Client Hello 等各种加密协议的支持。然而，由于这些额外的隐私功能，它需要更多的系统资源。用户可能会选择通过添加更多内存而不是切换浏览器来提高计算机的性能。对于在线跟踪，用户认为浏览器应该优先考虑用户隐私，不要与广告公司合作。相反，浏览器应该专注于提高产品质量，以便在竞争对手中脱颖而出，避免不必要的跟踪。此外，浏览器开发人员应努力减少用户指纹识别，因为这可能会泄露有关设备及其用户的敏感信息。实现这一目标的措施包括限制对画布数据、GPU 名称、声卡枚举等敏感数据的访问。新的 API 应确保它们不会增加指纹识别或将指纹数据隐藏在权限后面。关于第三方 cookie，该用户建议浏览器采取一种策略，允许在严重依赖第三方 cookie 的旧网站上允许 cookie 使用例外，而不是保留黑名单。尽管这种做法存在风险，例如因要求积极参与而可能给用户带来不便，但其目的是在便利性和隐私之间取得平衡。总之，该用户寻求一种现代网络浏览器，该浏览器能够提供强大的隐私控制、减少用户指纹识别、提供合理的第三方 cookie 管理选项，并专注于产品质量的整体改进，而不是出于营销目的而收集过多的数据。理想情况下，用户应该能够轻松选择退出不需要的跟踪机制。此外，用户希望数据使用的透明度，并希望浏览器将用户隐私置于经济利益之上。

原文

Summary

This post presents research on the privacy harms and risks of Google’s recent Related Website Sets feature, to be presented at the 2024 Internet Measurement Conference. The research finds both that the Related Website Sets feature would reverse some of the privacy benefits of deprecating third-party cookies, and that Google’s justification for reintroducing this privacy harm (i.e., that Web users can tell when two different sites are run by the same organization) is untrue for many, potentially most, Web users. The study supports other browsers’ decision to reject the feature because of its privacy risks, and highlights the risk Related Website Sets poses to Chrome users.

The study was conducted by researchers at University of St Andrews, Imperial College London, Hong Kong University of Science & Technology (GZ), and Brave Software. This post was written by Principal Privacy Researcher Peter Snyder.

Related Website Sets (RWS) is a recent Chrome feature, proposed by Google in anticipation of the end of third-party cookies. The privacy and security harms caused by third-party cookies are well documented, and have led every major Web browser to either block third-party cookies, or announce plans to do so (even if Google has, again, pushed its planned deprecation date back).

According to Google, Related Website Sets reenables third-party-cookie-like-behavior where it benefits users, without reintroducing the broader privacy harms of third-party cookies. In reality, RWS aims to allow (for example) Google to link the videos you watch on YouTube to your Google profile, even when you’re not logged into YouTube, and even after third-party cookies have been deprecated in Chrome. While the research described in this post presents and evaluates Google’s stated motivations with RWS, the core truth is that RWS exists for advertiser-serving situations like the above.

The broad idea behind RWS is that if two different sites are run by the same organization (for example, instagram.com and facebook.com are both run by Meta), then there is no need for the browser to block third-party cookies between the two sites, since the user already expects that both sites will share information with each other.

More casually, the motivation behind RWS is something like this: there’s no point in telling your mom a secret, and then trying to keep that secret from your dad; you should assume your parents are going to share everything with each other.

RWS is a user-hostile weakening of the Web’s privacy model, plainly designed to benefit websites and advertisers, to the detriment of user privacy. Google argues that RWS actually benefits users, either because the privacy exceptions help fix “site compatibility issues” or to keep users “signed in” across related domains. But a quick look at the actual Related Website Sets exceptions list reveals many examples unrelated to even these (even hypothetically) user-benefiting use cases, and these sites work correctly in browsers that do not implement RWS (i.e., almost all other browsers).

In reality, the primary motivation behind Related Website Sets is as frustrating as it is unsurprising: to benefit advertisers to the detriment of users (or, as Google euphemistically says, to “show you personalized content”). As with so many other user-harmful and needlessly-complex choices in Chrome’s overarching “Privacy Sandbox” proposal, RWS exists to make sure Chrome continues to serve advertisers’ needs first, even once Google has been shamed into (finally) deprecating third-party cookies.

Study Description: Users (Understandably) Do Not Anticipate Site Relations

The study considered RWS impact on Web privacy by testing whether the underlying assumption in RWS is correct: can Web users accurately determine if two different sites are related to each other? More specifically, if a Web user is presented with two different websites, how accurately are they able to decide whether the two sites are related to each other, given the existing site-relationships defined by Chrome’s RWS list.

In general, we found that Web users cannot accurately determine if two sites are related to each other (as determined by the Related Sites Set feature). We conducted a user study with 30 Web users, recruited over social media, and presented them each with 20 pairs of websites. Website pairs were randomly selected from both the Related Website Sets list (i.e., sites Google designates as “related”, and so warranting reduced privacy protections), and the Tranco list of popular websites. Each user was presented with different pairs of websites, asked to view the sites, and then decide if they thought the two sites were operated by the same organization. This resulted in 430 determinations of whether unique pairs of websites were related (some of the 30 users did not provide an answer to all of the 20 website pairs they were presented).

We found that users’ expectations for which sites were related often didn’t match the Related Website Sets list, and as a result, the RWS feature re-enables third-party cookie-like behavior in many cases users could not anticipate. In our study, the large majority of users (~73%) made at least one incorrect determination of whether two sites were related to each other, and almost half (~42%) of the determinations made during the study (i.e., all determinations from all users) were incorrect. Most concerning, of the cases where both sites were related (according to the RWS feature), users guessed that the sites were unrelated ~37% of the time, meaning that users would have thought Chrome was protecting them when it was not.

We conclude from this that the premise underlying RWS is fundamentally incorrect; Web users are (understandably, predictably) not able to accurately determine whether two sites are owned by the same organization. And as a result, RWS is reintroducing exactly the kinds of privacy harms that third-party cookies cause.

Lest anyone judge the study participants for being uninformed, or not taking the study seriously, consider for yourself: which of the following pairs of sites are related?

hindustantimes.com and healthshots.com
vwo.com and wingify.com
economictimes.com and cricbuzz.com
indiatoday.in and timesofindia.com

Keep in mind, a user needs to determine whether two domains are related before clicking on a link; once a site has been loaded, any information sharing and tracking has already occurred.

In conclusion, we find that RWS will be harmful to user privacy, and reintroduce the kinds of privacy harms the Web has been moving away from by removing third-party cookies. The full paper will be presented at the 2024 Internet Measurement Conference.

(For the above quiz, if you chose “4”, then, unfortunately that is incorrect. That is in fact the only pair of the four that isn’t considered “related” to each other.)

However, beyond the findings from the user study, we note a more fundamental privacy harm with RWS. RWS rests on the idea that if two sites are related to each other, then it’s harmless (or, at least “acceptable”) for the browser to reduce privacy protections between those two sites. Or, to go back to the previous analogy, if mom already knows something, then there’s no harm in telling dad; dad is going to find out regardless.

This assumption is wrong; modern Web browsers are perfectly capable of preventing (say) Meta from knowing that your Facebook account and your Instagram account are owned by the same person if you register them with different email addresses and information. In fact, this is the default behavior of most Web browsers today, both browsers focused at a popular audience (e.g., Brave, Firefox, Safari) and browsers targeting specialized audiences (e.g., Tor Browsers, Icefox). Unless you use the same credentials to register an account on two different sites, modern browsers can absolutely prevent two sites operated by the same organization from linking your behaviors across those sites. Or, in other words, modern Web browsers can absolutely prevent Mom from telling Dad your secrets.

Finally, we acknowledge that some companies do try to circumvent the privacy protections in Web browsers, to try and allow two sites run by the same organization to link your accounts across sites. Some sites use techniques like link decoration or bounce tracking to try and continue tracking you. But the difference here between privacy respecting browsers (which include link decoration and bounce tracking protections) and Chrome (which is explicitly designed to allow cross-site linkage) is damning: some browsers are experimenting with techniques to prevent organizations from tracking you across sites, and some browsers are designing features with the explicit intent of allowing such tracking.

Conclusions

In conclusion, our study finds that RWS is harmful for Web privacy, and in three ways:

First, RWS assumes users can anticipate which sites are related to each other, but in practice users cannot.

Second, RWS introduces privacy harm even before users have the the opportunity to decide if two sites are operated by the same organization; by the time users can view a webpage to try and decide if two sites are related to each other, the privacy harm has already occurred, and sites have had the opportunity to track the user across site boundaries.

And third, RWS entrenches a privacy-harmful assumption in the Web platform, instead of working to excise it. RWS assumes that if two sites are owned by the same organization, then the organization should be allowed to track you across those two sites. In contrast, privacy respecting browsers have gone in the opposite direction, and tried to prevent all sites from tracking you, regardless of what organization owns them.

Although Related Website Sets is being presented as a general Web proposal, the truth is that most of the Web has already considered and rejected it. Most browsers, including Brave, Firefox, and Safari, have publicly stated that they believe Related Website Sets (previously called First-Party Sets) is bad for users, and bad for the Web. The proposal has been removed from the W3C Privacy Community Group and is no longer being considered by any privacy-focused group in the W3C.

When Websites Change of Hands

What happens if / when the domains in the list change hands? This is a common concern with all sorts of “pin trust to a domain” proposals across this history of the Web. Just because domains A, B, and C are operated by the same organization today does not (at all) guarantee that they’ll be owned by the same organization tomorrow.

Security and privacy attacks from exactly these kinds of assumptions have happened with browser extensions that have been sold from “trustworthy” parties to malicious parties, or when popular software libraries / dependencies have been taken over by a malicious actor.

The broader concern is that, even if these sites are meaningfully related at the time they’re included in the list, there is no mechanism that will remove them when they (often silently) change hands.

Language / Perception Concerns

As mentioned above, the underlying justification (as flimsy as it is) for RWS is that users can perceive that they’re operated by the same organization. Our study finds that, even for English speaking users evaluating English sites, users can’t anticipate what sites Google judges to be related. This problem will (of course) get much worse when people are visiting sites in languages they do not speak.

Timing

The intuition behind RWS is that users will be able to determine if site B is related to site A, and then only visit site B if that arrangement is acceptable. However, this is a catch 22. In order to determine if site B is related to site A, I need to visit site B and see the “shared branding or logo” (or similar), indicating the relationship between these sites. However, once I’ve loaded the site to view it, it’s already too late, and my information has been shared between the two sites.