主要人工智能会议充斥着由人工智能撰写的同行评审。

主要人工智能会议充斥着由人工智能撰写的同行评审。
Major AI conference flooded with peer reviews written by AI

原始链接: https://www.nature.com/articles/d41586-025-03506-6

最近，Pangram Labs 的一项分析显示，人工智能生成的内容在学术同行评审中呈增长趋势。受到国际学习表征会议 (ICLR) 研究人员的担忧提示，Pangram 筛选了近 19,500 篇研究和 75,800 篇提交给 2026 年会议的同行评审。结果显示，21% 的同行评审是*完全*由人工智能生成的，超过一半显示出一定的人工智能影响。研究人员注意到了一些明显迹象，例如冗长、含糊的反馈、幻觉引文和不寻常的请求。只有 1% 的稿件是完全由人工智能生成的，但 9% 的稿件包含超过 50% 的人工智能撰写文本。这一发现证实了学术界的怀疑，并引发了对同行评审过程诚信的担忧。 ICLR 组织者现在正在实施自动化工具来检测人工智能滥用，这是该会议首次以这种规模解决此问题。这些发现强调了提高警惕性以及可能需要新的策略来确保研究的可靠和可信评估。

一份最新报告显示，提交到大型人工智能会议的同行评审中，有21%是由人工智能生成的。然而，Hacker News上的讨论迅速转向批评用于识别这些评审的方法论。许多评论者对依赖“人工智能检测器”表示怀疑，认为它们不可靠，并且经常产生误报，特别是像Pangram这样具有既得利益的公司提供的检测器。几位用户指出，考虑到人工智能内容生成的炒作，21% 的比例出乎意料地*低*。核心问题在于人工智能检测工具的准确性以及将人类撰写的作品错误标记为人工智能生成的可能性。有人建议关注内容的*质量*（“人工智能垃圾”），而不是其来源，提倡对糟糕的投稿进行降权或忽略，而不是监管人工智能的使用。最后一条评论提出了一个哲学问题：人工智能是否应该在关于自身未来的讨论中拥有发言权。

原文

Close up view of a red toy robot sat amongst a stack of books. — An AI-detection tool developed by Pangram labs found that peer reviewers are increasingly using chatbots to draft responses to authors.Credit: breakermaximus/iStock via Getty

What can researchers do if they suspect that their manuscripts have been peer reviewed using artificial intelligence (AI)? Dozens of academics have raised concerns on social media about manuscripts and peer reviews submitted to the organizers of next year’s International Conference on Learning Representations (ICLR), an annual gathering of specialists in machine learning. Among other things, they flagged hallucinated citations and suspiciously long and vague feedback on their work.

Graham Neubig, an AI researcher at Carnegie Mellon University in Pittsburgh, Pennsylvania, was one of those who received peer reviews that seemed to have been produced using large language models (LLMs). The reports, he says, were “very verbose with lots of bullet points” and requested analyses that were not “the standard statistical analyses that reviewers ask for in typical AI or machine-learning papers.”

But Neubig needed help proving that the reports were AI-generated. So, he posted on X (formerly Twitter) and offered a reward for anyone who could scan all the conference submissions and their peer reviews for AI-generated text. The next day, he got a response from Max Spero, chief executive of Pangram Labs in New York City, which develops tools to detect AI-generated text. Pangram screened all 19,490 studies and 75,800 peer reviews submitted for ICLR 2026, which will take place in Rio de Janeiro, Brazil, in April. Neubig and more than 11,000 other AI researchers will be attending.

Pangram’s analysis revealed that around 21% of the ICLR peer reviews were fully AI-generated, and more than half contained signs of AI use. The findings were posted online by Pangram Labs. “People were suspicious, but they didn’t have any concrete proof,” says Spero. “Over the course of 12 hours, we wrote some code to parse out all of the text content from these paper submissions,” he adds.

The conference organizers say they will now use automated tools to assess whether submissions and peer reviews breached policies on using AI in submissions and peer reviews. This is the first time that the conference has faced this issue at scale, says Bharath Hariharan, a computer scientist at Cornell University in Ithaca, New York, and senior programme chair for ICLR 2026. “After we go through all this process … that will give us a better notion of trust.”

AI-written peer review

The Pangram team used one of its own tools, which predicts whether text is generated or edited by LLMs. Pangram’s analysis flagged 15,899 peer reviews that were fully AI-generated. But it also identified many manuscripts that had been submitted to the conference with suspected cases of AI-generated text: 199 manuscripts (1%) were found to be fully AI-generated; 61% of submissions were mostly human-written; but 9% contained more than 50% AI-generated text.

Pangram described the model in a preprint¹, which it submitted to ICLR 2026. Of the four peer reviews received for the manuscript, one was flagged as fully AI-generated and another as lightly AI-edited, the team’s analysis found.

AI is transforming peer review — and many scientists are worried

For many researchers who received peer reviews for their submissions to ICLR, the Pangram analysis confirmed what they had suspected. Desmond Elliott, a computer scientist at the University of Copenhagen, says that one of three reviews he received seemed to have missed “the point of the paper”. His PhD student who led the work suspected that the review was generated by LLMs, because it mentioned numerical results from the manuscript that were incorrect and contained odd expressions.

When Pangram released its findings, Elliott adds, “the first thing I did was I typed in the title of our paper because I wanted to know whether my student’s gut instinct was correct”. The suspect peer review, which Pangram’s analysis flagged as fully AI-generated, gave the manuscript the lowest rating, leaving it “on the borderline between accept and reject”, says Elliott. “It's deeply frustrating”.

Repercussions

联系我们 contact @ memedata.com