数字档案馆管理员:保护公共数据免遭删除
Digital Archivists: Protecting Public Data from Erasure

原始链接: https://spectrum.ieee.org/digital-archive

互联网档案馆的时光机和其他倡议对于保存对科学、工程和医学至关重要的政府网站和数据集至关重要。这些资源确保了实验的可重复性、模型的有效性和学术工作的完整性。虽然之前的总统政府都对政府网站做出过改动,但特朗普政府的第二任期内却大量删除了数据,包括删除了“气候变化”等术语,这引发了法院的质疑和对公众信息获取的担忧。互联网档案馆和哈佛法学院图书馆创新实验室等组织正在积极存档政府数据,后者每天更新一份16TB的Data.gov副本。这些数字保护者对于维持知识的连续循环,从而基于过去的发现进行创新至关重要。

Hacker News 最新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交 登录 数字档案管理员:保护公共数据免遭删除 (ieee.org) 4 分,来自 rbanffy,56 分钟前 | 隐藏 | 过去 | 收藏 | 讨论 加入我们,参加 6 月 16-17 日在旧金山举办的 AI 初创公司学校! 指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系我们 搜索:
相关文章

原文

In the three decades since Brewster Kahle spun up the nonprofit Internet Archive’s Wayback Machine, it has scaled up to include government websites and datasets—many of which are essential to the engineering and scientific communities. U.S. government agencies like the National Science Foundation, Department of Energy, and NASA are critical sources of research data, technical specifications, and standards documentation in pretty much every area where IEEE Spectrum’s audience works—AI & computer science, biomedical devices, power and energy, semiconductors, telecommunications…the list goes on.

Access to that governmental data directly affects the reproducibility of experiments, the validation of models, and the integrity of the scholarly record.

So what happens if an entire dataset vanishes? Among other things, it can invalidate years of research built upon that foundation.

Until recently, wholesale deletion of data has been rare. In the United States, presidential transitions typically involve some changes to government websites to reflect new policy priorities. And after 9/11, the George W. Bush administration removed “millions of bytes” of information from government sites for security reasons as well as hundreds of Department of Defense documents and “tens of thousands” of Federal Energy Regulation Commission files.

The Obama and Biden administrations likewise made changes to government websites but didn’t engage in large-scale removal of Web pages or datasets. Obama, in fact, expanded public access to government data in 2009 by launching Data.gov, whose stated mission is in part “to unleash the power of government open data to inform decisions by the public and policymakers.”

During President Donald J. Trump’s first term, researchers at the Environmental Data & Governance Initiative found that some government sites became inaccessible, and the phrase “climate change” was purged from several government Web pages.

Access to governmental data directly affects the reproducibility of experiments, the validation of models, and the integrity of the scholarly record.

The second term has been different. In February, a few weeks after Trump was sworn in for his second term, The New York Times reported that his administration took down more than 8,000 Web pages and databases. Many of those pages have since reappeared, but some of the restored pages and files have had changes, including the erasure of terms like “climate change” (again) and “clean energy,”Grist reports. These moves have faced multiple court challenges; on 11 February, for instance, a federal judge ordered that public access to Web pages and datasets belonging to the Centers for Disease Control and Prevention and the Food and Drug Administration be restored.

In our April issue, Spectrum Assistant Editor Gwendolyn Rak reports on efforts to preserve public access to information. In addition to the ongoing work at the Internet Archive, she describes how archivists at the Library Innovation Lab at Harvard Law School amassed a copy of the 16-terabyte archive of Data.gov, which includes more than 311,000 public datasets. That copied archive is being updated daily with new data hoovered up via automated queries to application programming interfaces (APIs).

Archivists are the guardians of memory. We depend on them to help us stay in touch with our history, maintain our knowledge base, and provide context, allowing us to understand how we came to be where we are and to light the way forward. In the fields of science, engineering, and medicine, where today’s innovations stand on the shoulders of yesterday’s discoveries, these digital preservationists ensure that the circuit of human knowledge remains unbroken.

This article appears in the April 2025 print issue as “Lots of Copies Keep Stuff Safe.”

Editor’s note: This post was revised to match the print version.

From Your Site Articles

Related Articles Around the Web

联系我们 contact @ memedata.com