阻止互联网档案馆不会阻止人工智能,但会抹去网络的历史记录。
Blocking Internet Archive Won't Stop AI, but Will Erase Web's Historical Record

原始链接: https://www.eff.org/deeplinks/2026/03/blocking-internet-archive-wont-stop-ai-it-will-erase-webs-historical-record

主要新闻出版商,如《纽约时报》和《卫报》,越来越多地阻止互联网档案馆(IA)——一个重要的数字图书馆和“时光机”的所在地——存档他们的网站。这一举动威胁着数十年来保存的在线新闻内容,记者、研究人员和历史学家依赖这些内容来追踪变化和获取原始报道。 出版商们担心人工智能公司抓取内容用于训练目的,并正在对它们采取法律行动。然而,IA是一个致力于保存而非人工智能开发的非营利组织。阻止访问的风险在于抹去重要的历史记录,因为文章经常会在网上被编辑或删除。 法律先例支持IA的存档行为属于“合理使用”,类似于搜索引擎创建索引的方式。虽然对人工智能训练的争议是有效的,但牺牲公众获取历史信息的机会,是一种有害且可能不可逆转的后果。这一举动不仅仅是关于限制机器人;而是关于抹去历史。

黑客新闻 新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 阻止互联网档案的措施无法阻止人工智能,但会抹去网络的历史记录 (eff.org) 28点 由 pabs3 3小时前 | 隐藏 | 过去 | 收藏 | 1条评论 帮助 xnx 17分钟前 [–] 互联网档案是否有分布式住宅IP爬虫程序?我将热心参与其中。在这种设置中必须存在防止篡改的机制。回复 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请YC | 联系 搜索:
相关文章

原文

Imagine a newspaper publisher announcing it will no longer allow libraries to keep copies of its paper. 

That’s effectively what’s begun happening online in the last few months. The Internet Archive—the world’s largest digital library—has preserved newspapers since it went online in the mid-1990s. The Archive’s mission is to preserve the web and make it accessible to the public. To that end, the organization operates the Wayback Machine, which now contains more than one trillion archived web pages and is used daily by journalists, researchers, and courts.

But in recent months The New York Times began blocking the Archive from crawling its website, using technical measures that go beyond the web’s traditional robots.txt rules. That risks cutting off a record that historians and journalists have relied on for decades. Other newspapers, including The Guardian, seem to be following suit. 

For nearly three decades, historians, journalists, and the public have relied on the Internet Archive to preserve news sites as they appeared online. Those archived pages are often the only reliable record of how stories were originally published. In many cases, articles get edited, changed, or removed—sometimes openly, sometimes not. The Internet Archive often becomes the only source for seeing those changes. When major publishers block the Archive’s crawlers, that historical record starts to disappear.

The Times says the move is driven by concerns about AI companies scraping news content. Publishers seek control over how their work is used, and several—including the Times—are now suing AI companies over whether training models on copyrighted material violates the law. There’s a strong case that such training is fair use

Whatever the outcome of those lawsuits, blocking nonprofit archivists is the wrong response. Organizations like the Internet Archive are not building commercial AI systems. They are preserving a record of our history. Turning off that preservation in an effort to control AI access could essentially torch decades of historical documentation over a fight that libraries like the Archive didn’t start, and didn’t ask for. 

If publishers shut the Archive out, they aren’t just limiting bots. They’re erasing the historical record. 

Archiving and Search Are Legal 

Making material searchable is a well-established fair use. Courts have long recognized it’s often impossible to build a searchable index without making copies of the underlying material. That’s why when Google copied entire books in order to make a searchable database, courts rightly recognized it as a clear fair use. The copying served a transformative purpose: enabling discovery, research, and new insights about creative works. 

The Internet Archive operates on the same principle. Just as physical libraries preserve newspapers for future readers, the Archive preserves the web’s historical record. Researchers and journalists rely on it every day. According to Archive staff, Wikipedia alone links to more than 2.6 million news articles preserved at the Archive, spanning 249 languages. And that’s only one example. Countless bloggers, researchers, and reporters depend on the Archive as a stable, authoritative record of what was published online.

The same legal principles that protect search engines must also protect archives and libraries. Even if courts place limits on AI training, the law protecting search and web archiving is already well established.

The Internet Archive has preserved the web’s historical record for nearly thirty years. If major publishers begin blocking that mission, future researchers may find that huge portions of that historical record have simply vanished. There are real disputes over AI training that must be resolved in courts. But sacrificing the public record to fight those battles would be a profound, and possibly irreversible, mistake. 

联系我们 contact @ memedata.com