监控的平庸性

监控的平庸性
The Banality of Surveillance

原始链接: https://benn.substack.com/p/the-banality-of-surveillance

在一家模仿企业版Facebook的公司工作期间，一位数据分析师发现个人数据是多么容易获取。一个黑客马拉松项目专注于个人资料浏览量——一项普遍追踪的指标——揭示了看似安全的数据，即使受到加密和防火墙的保护，也并非由于技术限制而存在漏洞，而是因为访问和分析它*很繁琐*。现有的日志系统已经记录了一切，只需要一位熟练的分析师和复杂的SQL查询就能发现有价值的信息。这凸显了监控方式的转变：它不再依赖于复杂的科技，而是依赖于人工智能能够轻松地筛选大量、现成的数据集。政府机构不一定*收集*这些数据，而是*购买*它——已经由无数公司收集的数据。真正的危险不在于人工智能变得超级智能，而在于它能够自动化繁琐的数据分析任务，使以前不切实际的监控变得极其简单。传统的隐私障碍不是强大的安全措施，而是访问信息的*不便之处*。人工智能消除了这一障碍，使每个人都可能受到审查，不是因为正在追踪什么，而是因为现在*查看*变得毫不费力。

For a while, I worked at a company that branded itself as the “enterprise social network,” though for all intents and purposes, it was the enterprise Facebook. Facebook was moving all of our personal communication out of emails and into a shared feed of posts and replies; our product was designed to do the same thing for our professional communication.

That meant our product was also designed to look like Facebook. There was a newsfeed; there were messages and threads; there were users; there were user profiles. There was a like button. It was Facebook, in a small corporate sandbox.

A couple months after I joined the company as a data analyst, the product and engineering department held one of its regular hack days. Everyone had 24 hours to work on anything they wanted to, and then, a strict three minutes to present their project to the entire department. It was judged; there was a stage and an emcee; there was a soundboard full of jeers; there were trophies for winners; there was an open bar. There was pride in it, and everyone wanted to put on a good show. For this particular hack day, the data team was participating for the first time, and in the days leading up to the event, we talked about our ideas. What do you want to do, we asked each other? What are you going to build?

My idea felt obvious. If you had access to data on how people were using Facebook—which is the data we had, in a bizarro bureaucratic sort of way—what would be the first thing you’d look up? If you knew someone else had that data, what would be the last thing you’d want them to look up?

Profile views. It’s clearly profile views. It’s who’s looking at your profile; it’s the profiles that you’re looking at. That was the holy grail; the third rail; the third life. If you wanted to put on a good show—if you wanted to make a drunk audience look up from their laptops—that’s the data that will make them pay attention.

And it was, of course, data that we already had. Like any responsible SaaS product, our app was thoroughly “instrumented”—it recorded every click; every page view; every mobile interaction. We tracked the user who did it; the device that they did it from; their browser; their IP address; the sequence of clicks that came before; the sequence that came after. This type of logging was all generic, mundane, the “industry standard.” We used the same tracking libraries that everyone else used. We recorded the same events that everyone else did. It was mindless and mechanical—years before I joined, an engineer had stuck a few lines of code in our app’s codebase, it captured millions of events an hour, and everything was dumped into a huge table called “event properties.” Because, as the legal documents all say, some piece of it might one day be useful to “improve our Services.”

Though all this data was carefully protected in an encrypted database behind several firewalls and one very long password, that was not what made it secure. It was secure because it was a pain to use. You had to come up with interesting—or, you know, indelicate—questions to ask of it. You had to figure out how to answer that question using a sprawling array of machine-generated event logs. And you had to write 595-line SQL queries to do it all. But any employee—at our company, or at the hundreds of other SaaS startups that were functionally identical to us, and who all logged identical streams of data—could write that query, combine those logs, and answer those questions.

Or, more generally: Prior to working in Silicon Valley, I assumed that data was secure because it was obfuscated by impressive cryptography and stored in buildings that were guarded by tall fences. And I assumed that what we did on the internet was private—and people’s ability to draw any inferences from what we did was difficult—because “surveillance” required complex technologies that could detect faint patterns in millions of disparate signals. Yes, Target might be able to figure out if someone is pregnant before their father could, but that took years of careful observation and sophisticated science. It took well-trained humans working with well-trained models, years in the making.

If only. On an internet where everything is tracked—and man, everything is tracked—surveillance does not require a Ph.D., or even any particularly advanced math. It just requires a junior analyst with 24 hours of free time. Because the real fences around the data all we leave behind—and the real protections of our privacy—are neither tall nor covered in barbed wire. They are simply fences that are annoying to climb. We are not hidden, on the internet; mostly, people are just too uninterested to bother looking for us.

Everyone already knows what happened: The United States Department of War wanted to use Claude. Anthropic wanted them to use Claude, but with restrictions. The two sides could not agree; the negotiations broke down; the negotiations turned into outright hostilities; the hostilities became very public. The Atlantic reports on part of what went wrong:

Anthropic learned that the Pentagon still wanted to use the company’s AI to analyze bulk data collected from Americans. That could include information such as the questions you ask your favorite chatbot, your Google search history, your GPS-tracked movements, and your credit-card transactions, all of which could be cross-referenced with other details about your life.

When we hear stories about “mass surveillance” and “artificial intelligence” and the “CIA,” it is tempting to imagine systems of unfathomable reach and sophistication. It is tempting to worry about shadowy government agencies using AI to hack into our phones and turn them into sonar transmitters. It is tempting to see the the Greco—a million sensors and cameras feeding into a machine that “doesn’t think, but reasons:”

It reads every permutation in every wager in every seat in the entire casino, hand by hand. It’s wired into floor security cameras that measure pupil dilation, and determine if a win is legitimate or expected. It gathers bio feedback—players’ heart rates, body temperatures. It measures, on a second-by-second basis, whether the standard variations of gaming algorithms are holding or are being manipulated. The data is analyzed in real time, in a field of exabytes.

For better or for worse, reality is almost certainly much more mundane. Nobody wants to use AI to bug our phones, or to build a sprawling nerve system to track our vitals, because our phones are already bugged. Everything we do on them is recorded a dozen times over, by our wireless carriers, by the websites we visit and the apps we use, by the vendors and ad networks those companies are sending their data to, and in the marketplaces that sell that data. We built the eyes of the Greco decades ago.

But that data has remained relatively secure—or maybe more precisely, its potential energy has remained relatively buried—largely because it’s tedious to work with. It’s messy; it’s scattered across different sources and in different formats; combining it together is a pain, and most of us are simply not interesting enough to investigate. Data analysts who work at shadowy government agencies have lives too, and they do not want to write 595-line SQL queries either.

But AI doesn’t mind. And that’s the boring danger of what happens next: Not of AI becoming a superintelligent Sherlock Holmes finding impossible patterns in its enormous mind palace, but of it being a million monkeys at a million typewriters, doing the grunt work no person wanted to do. Because when prying questions are a prompt away—rather than 24 hours of work away—who wouldn’t get tempted to pry?

It does make you wonder though: While defense and intelligence agencies are unique in the legal and extralegal alleys in which they operate, they are not unique in their ability to warehouse massive amounts of data. In fact, as The Atlantic pointed out, these agencies aren’t collecting this data themselves; they are buying it from other people, in open markets:

The government can purchase detailed records of Americans’ movements, web browsing, and associations from public sources without obtaining a warrant, a practice the Intelligence Community has acknowledged raises privacy concerns and that has generated bipartisan opposition in Congress. Powerful AI makes it possible to assemble this scattered, individually innocuous data into a comprehensive picture of any person’s life—automatically and at massive scale.

But if those agencies can buy that data, so can other people. If they can use AI to trawl through it “at massive scale,” so can other companies—especially if those companies are already collecting those events and messages themselves.

People often talk about how AI breaks many of the foundational floorboards of our society. Our formal and informal senses of truth are built on the assumption that realistic photos and videos cannot be faked; that is breaking down. Our ambitions and careers are built on the assumption that intelligence and expertise are scarce; that is breaking down. Our sense of how the world works is often defined by what is possible for other people to do and what is worthwhile for them to do. Sure, we know it is possible for us to be monitored, but why would anyone bother watching the tapes? Everyone must have more important things to do with their time.

Banality is a sturdy armor. Or was, anyway.

监控的平庸性 The Banality of Surveillance

监控的平庸性
The Banality of Surveillance