
AI is transforming peer review — and many scientists are worried
原始链接: https://www.nature.com/articles/d41586-025-03506-6
最近,Pangram Labs 的一项分析显示,人工智能生成的内容在学术同行评审中呈增长趋势。 受到国际学习表征会议 (ICLR) 研究人员的担忧提示,Pangram 筛选了近 19,500 篇研究和 75,800 篇提交给 2026 年会议的同行评审。 结果显示,21% 的同行评审是*完全*由人工智能生成的,超过一半显示出一定的人工智能影响。 研究人员注意到了一些明显迹象,例如冗长、含糊的反馈、幻觉引文和不寻常的请求。 只有 1% 的稿件是完全由人工智能生成的,但 9% 的稿件包含超过 50% 的人工智能撰写文本。 这一发现证实了学术界的怀疑,并引发了对同行评审过程诚信的担忧。 ICLR 组织者现在正在实施自动化工具来检测人工智能滥用,这是该会议首次以这种规模解决此问题。 这些发现强调了提高警惕性以及可能需要新的策略来确保研究的可靠和可信评估。

An AI-detection tool developed by Pangram labs found that peer reviewers are increasingly using chatbots to draft responses to authors.Credit: breakermaximus/iStock via Getty
What can researchers do if they suspect that their manuscripts have been peer reviewed using artificial intelligence (AI)? Dozens of academics have raised concerns on social media about manuscripts and peer reviews submitted to the organizers of next year’s International Conference on Learning Representations (ICLR), an annual gathering of specialists in machine learning. Among other things, they flagged hallucinated citations and suspiciously long and vague feedback on their work.
Graham Neubig, an AI researcher at Carnegie Mellon University in Pittsburgh, Pennsylvania, was one of those who received peer reviews that seemed to have been produced using large language models (LLMs). The reports, he says, were “very verbose with lots of bullet points” and requested analyses that were not “the standard statistical analyses that reviewers ask for in typical AI or machine-learning papers.”
But Neubig needed help proving that the reports were AI-generated. So, he posted on X (formerly Twitter) and offered a reward for anyone who could scan all the conference submissions and their peer reviews for AI-generated text. The next day, he got a response from Max Spero, chief executive of Pangram Labs in New York City, which develops tools to detect AI-generated text. Pangram screened all 19,490 studies and 75,800 peer reviews submitted for ICLR 2026, which will take place in Rio de Janeiro, Brazil, in April. Neubig and more than 11,000 other AI researchers will be attending.
Pangram’s analysis revealed that around 21% of the ICLR peer reviews were fully AI-generated, and more than half contained signs of AI use. The findings were posted online by Pangram Labs. “People were suspicious, but they didn’t have any concrete proof,” says Spero. “Over the course of 12 hours, we wrote some code to parse out all of the text content from these paper submissions,” he adds.
The conference organizers say they will now use automated tools to assess whether submissions and peer reviews breached policies on using AI in submissions and peer reviews. This is the first time that the conference has faced this issue at scale, says Bharath Hariharan, a computer scientist at Cornell University in Ithaca, New York, and senior programme chair for ICLR 2026. “After we go through all this process … that will give us a better notion of trust.”
The Pangram team used one of its own tools, which predicts whether text is generated or edited by LLMs. Pangram’s analysis flagged 15,899 peer reviews that were fully AI-generated. But it also identified many manuscripts that had been submitted to the conference with suspected cases of AI-generated text: 199 manuscripts (1%) were found to be fully AI-generated; 61% of submissions were mostly human-written; but 9% contained more than 50% AI-generated text.
Pangram described the model in a preprint1, which it submitted to ICLR 2026. Of the four peer reviews received for the manuscript, one was flagged as fully AI-generated and another as lightly AI-edited, the team’s analysis found.
AI is transforming peer review — and many scientists are worried
For many researchers who received peer reviews for their submissions to ICLR, the Pangram analysis confirmed what they had suspected. Desmond Elliott, a computer scientist at the University of Copenhagen, says that one of three reviews he received seemed to have missed “the point of the paper”. His PhD student who led the work suspected that the review was generated by LLMs, because it mentioned numerical results from the manuscript that were incorrect and contained odd expressions.
When Pangram released its findings, Elliott adds, “the first thing I did was I typed in the title of our paper because I wanted to know whether my student’s gut instinct was correct”. The suspect peer review, which Pangram’s analysis flagged as fully AI-generated, gave the manuscript the lowest rating, leaving it “on the borderline between accept and reject”, says Elliott. “It's deeply frustrating”.