学生使用“人类化”程序来规避人工智能作弊指控。

学生使用“人类化”程序来规避人工智能作弊指控。
Students using “humanizer” programs to beat accusations of cheating with AI

原始链接: https://www.nbcnews.com/tech/internet/college-students-ai-cheating-detectors-humanizers-rcna253878

## 人工智能与学术界：螺旋式对抗大学校园正陷入与生成式人工智能的升级“军备竞赛”。最初对广泛作弊的担忧促使人们采用人工智能检测软件，但这些工具被证明不可靠，经常错误地指控学生——尤其是非英语母语者——使用了人工智能。这引发了一场反向运动：学生利用“人工化”工具来改写人工智能生成文本，或主动修改自己的写作以避免被错误标记。人工智能检测公司正在通过更新软件甚至监控学生写作过程的工具来应对，这引发了对监控的担忧。核心问题是缺乏关于可接受的人工智能使用明确的指导方针，并且检测工具的不准确性正在造成巨大的压力，一些学生甚至退选课程或面临指控，尽管他们能够提供原创作品的证据。专家建议需要转变——从惩罚性的检测转向公开讨论学生*如何*使用人工智能，并呼吁对人工智能工具本身进行监管。最终，这种情况凸显了人工智能在教育领域不可避免的存在，即使对于那些不打算使用它的人来说也是如此，从而形成了一个焦虑和技术应对升级的循环。

## AI 与学术诚信：黑客新闻摘要最近的黑客新闻讨论集中在学生使用“人类化”程序来规避学术作业中的AI检测。核心问题是准确识别AI生成文本的难度，导致对学生的错误指控和潜在的学术处罚。许多评论者建议回归现场、监考的考试作为解决方案，强调课程应该侧重于实践和学习，考试评估真正的理解。然而，有人对这种方法在大学课程中的可行性以及大学可能从标记作品为AI生成中获利的可能性表示担忧。另一些人认为AI是一种应该整合到学习中的工具，学生应该根据他们有效利用它的能力进行评估。一个关键点是，如果AI无处不在，价值在于AI无法轻易提供的技能。还有人强调，由于检测工具中的风格偏见，AI可能会对非英语母语者不利。最终，讨论指出需要在AI时代重新定义学术评估，超越仅仅检测AI的使用，并专注于评估批判性思维和真正的理解。

On college campuses across the United States, the introduction of generative artificial intelligence has sparked a sort of arms race.

Rapid adoption of AI by young people set off waves of anxiety that students could cheat their way through college, leading many professors to run papers through online AI detectors that inspect whether students used large language models to write their work for them. Some colleges say they’ve caught hundreds of students cheating this way.

However, since their debut a few years ago, AI detectors have repeatedly been criticized as unreliable and more likely to flag non-native English speakers on suspicion of plagiarism. And a growing number of college students also say their work has been falsely flagged as written by AI — several have filed lawsuits against universities over the emotional distress and punishments they say they faced as a result.

NBC News spoke to ten students and faculty who described being caught in the middle of an escalating war of AI tools.

Amid accusations of AI cheating, some students are turning to a new group of generative AI tools called “humanizers.” The tools scan essays and suggest ways to alter text so they aren’t read as having been created by AI. Some are free, while others cost around $20 a month.

Some users of the humanizer tools rely on them to avoid detection of cheating, while others say they don’t use AI at all in their work, but want to ensure they aren’t falsely accused of AI-use by AI-detector programs.

In response, and as chatbots continue to advance, companies such as Turnitin and GPTZero have upgraded their AI detection software, aiming to catch writing that’s gone through a humanizer. They also launched applications that students can use to track their browser activity or writing history so they can prove they wrote the material, though some humanizers can type out text that a user wants to copy and paste in case a student’s keystrokes are tracked.

“Students now are trying to prove that they’re human, even though they might have never touched AI ever,” said Erin Ramirez, an associate professor of education at California State University, Monterey Bay. “So where are we? We’re just in a spiral that will never end.”

The competition between AI detectors and writing assistance programs has been propelled by a heightened anxiety about cheating on college campuses. It shows how inescapable AI has become at universities, even for students who don’t want to use it and for faculty who wish they didn’t have to police it.

“If we write properly, we get accused of being AI — it’s absolutely ridiculous,” said Aldan Creo, a graduate student from Spain who studies AI detection at University of California San Diego. “Long term, I think it’s going to be a big problem.”

Do you have a story to share about technology in education? Contact reporter Tyler Kingkade

A teaching assistant in a data science course accused Creo of using AI to write a report in November. Creo explained to the TA that he has a habit of explaining step by step how he reasons through a problem, which ChatGPT is known to do, according to a copy of messages he exchanged with the TA.

Eventually, his grade was corrected but, to avoid another battle, Creo said he sometimes “dumbs down” his work by leaving words misspelled or using Spanish sentence structures that aren’t proper in English. And now, Creo runs all of his material through an AI detector pre-emptively.

“I have to do whatever I can to just show I actually write my homework myself,” he said.

‘How could Al make any of that up?’

At their worst, the stress from the accusations has driven some students to drop out of school.

Brittany Carr received failing grades on three assignments she completed as a long-distance student at Liberty University, a private evangelical school in Virginia that has one of the largest online enrollments in the U.S., because they were flagged by an AI detector. She showed her revision history, including how she’d written one first by hand in a notebook, according to screenshots of emails and messages she exchanged with her professors.

“How could Al make any of that up?” Carr wrote in a Dec. 5 email. “I spoke about my cancer diagnosis and being depressed and my journey and you believe that is Al?”

Her evidence wasn’t enough — the social work school still told her she needed to take a “writing with integrity” class and sign a statement apologizing for using AI, emails show.

“It’s a very weird feeling, because the school is using AI to tell us that we’re using AI,” she said.

It stressed her out. Carr worried another cheating accusation could cause the Department of Veterans Affairs to take away her financial aid. In order to avoid more false accusations, she said, she ran all of her material through Grammarly’s AI detector and changed any section that it highlighted until it concluded a human wrote the whole thing.

“But it does feel like my writing isn’t giving insight into anything — I’m writing just so that I don’t flag those AI detectors,” she said.

After the semester ended, Carr decided to leave Liberty. She’s unsure where she’ll transfer.

"I’m writing just so that I don’t flag those AI detectors."

Brittany Carr, liberty University student

Liberty University, which was co-founded by religious broadcaster Jerry Falwell Sr., said it does not comment on individual students. It said in a statement that all academic integrity concerns are addressed “with care and discretion, providing a process that keeps the student’s best interests at the forefront. Our aim is only to see our students succeed, and every student is afforded an exhaustive process to address any concerns about unfair treatment.”

Eric Wang, vice president of research at Quillbot, which makes a detector and a humanizer, said this kind of fear is going to remain unless educators move away from automatically deducting points instead of bringing students in to discuss how they use AI.

Once that happens, Wang said, “it starts to not matter whether you do or don’t sound like AI and instead moves us toward a world asking how are we using this technology but not losing our sense of humanity, our sense of creativity, and our ability to create great things on our own.”

‘Where’s the line’

At the root of many conflicts about students using chatbots to cheat is a disagreement about what counts as too much AI use on homework.

“When we did our first training program for 3,000 teachers,” said Edward Tian, CEO and co-founder of GPTZero, “every teacher and every student had a different understanding of what’s acceptable — just the understanding was very fragmented, and then it’s getting even more fragmented with the number of tools growing.”

Independent analyses of AI detectors show mixed accuracy. One pre-print study last year found GPTZero is good at finding AI-generated writing, but “its reliability in distinguishing human-authored texts is limited.” However, other research pegged the company’s detector at near-perfect accuracy. Meanwhile, separate studies from 2023 and 2024 have found that Turnitin had a low false positive rate, but failed to identify more than a quarter of AI-generated or AI-rephrased texts.

Both companies emphasized that research showing flaws in their detectors is outdated due to the rapid evolution of large language models and updates to their own detection software.

"It’s almost like the better the writer you are, the more AI thinks you’re AI."

Erin Ramirez, professor of education at Cal State Monterey Bay

AI detection probability scores are often misread too by users who fail to recognize the detectors are flagging text that it flags text that is likely generated by AI, rather than confirmed to be made by a chatbot, or fail to recognize that the text needs to be at least 300 words long for some detectors to effectively evaluate it.

Turnitin tells schools never to use its tools as the sole basis for deciding whether a student cheated, said Annie Chechitelli, the company’s chief product officer. It should instead prompt a conversation with a student about how and why AI was used, she said.

“The most important question is not so much about detection, it’s really about where’s the line,” she said.

GPTZero also has a disclaimer on its platform advising faculty not to use its detector to punish students.

Ramirez, the Cal State Monterey Bay professor, who is studying how AI can be used in K-12 settings, said anyone who relies on a detector has never put their own work through it.

“It’s almost like the better the writer you are, the more AI thinks you’re AI,” she said. “I put my own papers into AI detectors just to check because I don’t like to hold students accountable without knowing how the tool works. And it flags me at like 98% every time, and I didn’t use AI in any capacity.”

Faculty, administrators and AI detection company leaders all agreed that professors should have conversations with students after software flags their work to ensure no one is falsely accused of academic dishonesty. But to do that properly takes time — especially when many instructors have dozens or even hundreds of students each semester, said Morgan Sanchez, an assistant professor of sociology at San José State University.

“So it is creating a slight sense of tension, but more so it’s creating extra labor — uncompensated labor — that we have to do,” Sanchez said.

A booming humanizer industry

Turnitin, which has been around for a quarter-century offering tools to help educators catch plagiarism, is trying to keep up with humanizers. The company views humanizers as a “growing threat to academic integrity,” and issued a software update last August to detect text modified by the tools.

"The most important question is not so much about detection, it’s really about where’s the line."

Annie Chechitelli, Turnitin's chief product officer

The company has a list of 150 tools that charge as much as $50 for a subscription to adjust text so that it’s not flagged by an AI detector. Chechitelli referred to them as companies whose “sole goal is to really help students cheat.”

Demand has surged for the tools. Joseph Thibault, founder of Cursive, an academic integrity software company, tracked 43 humanizers that had a combined 33.9 million website visits in October.

Thibault, who also publishes a newsletter on cheating called This Isn’t Fine, said he thinks students would be better off ensuring they have a record of their revision history in Google Docs or Microsoft Word than using a humanizer. But ultimately, he believes, the shift that’s coming is moving toward more monitoring of students completing assignments.

“I think we have to ask students, what level of surveillance are you willing to subject yourself to so that we can actually know that you’re learning?” he said. “There is a new agreement that needs to be made.”

Students surveilling themselves

Superhuman, the company that makes Grammarly, developed a tool it calls Authorship that’s included with basic accounts. Students can turn it on to surveil themselves on Google Docs or Microsoft Word as they write and playback later. It will show which sections were typed, pasted from another source or generated with AI.

“We’re going to keep track of when you are going to Wikipedia,” said Jenny Maxwell, Superhuman’s head of education. “We’re going to keep track of when Grammarly is making suggestions and you’re taking them, we’re going to keep track of how much time you’ve spent in this paper or how many sessions.”

As many as 5 million Authorship reports were created in the past year alone, she said, though most of the time they aren’t submitted.

Maxwell said the tool was inspired by a viral TikTok video from Marley Stevens, who described the havoc of what she said was a false accusation of AI use that landed her on academic probation in 2023 at the University of North Georgia. In reality, Stevens said, she’d only used Grammarly’s extension to help fix spelling and punctuation.

The university declined to comment on Stevens’ case, citing federal privacy law, but said in a statement that “faculty communicate specific guidelines regarding the use of AI for various classes and those guidelines are included in the class syllabi.”

Stevens, who graduated last month, said it was difficult to keep track of each professor’s policy around AI usage — and it increasingly became hard to avoid writing on software that didn’t have it embedded.

“Google has AI embedded into it, Microsoft has AI embedded into it — like literally everything has AI in it,” she said. “So, in a roundabout way, there’s no way to write a paper without using AI, unless you go to the library and you check books out and use encyclopedias.”

Pressure on colleges

Some students believe universities should stop using AI detectors because of false positives. In upstate New York, an online petition calling on the University at Buffalo to drop the software received more than 1,500 signatures last year.

Kelsey Auman, who graduated last spring, started the petition after she fought to prove she did not use AI on several of her assignments. She knew enough classmates with similar experiences that they had a group chat named “Academic Felons for Life.” Auman said she started to run her papers through multiple AI detectors on her own before turning them in, hoping to avoid another dispute, but it created more anxiety when they also incorrectly flagged things she wrote as generated by a chatbot.

“So it’s like, how far do you want to go down the rabbit hole? I’m making myself crazy,” she said.

"We keep turning on what the academic institutions need to do to fix problems that they didn’t create."

Tricia Bertram Gallant, director of academic integrity at UC San Diego

The University at Buffalo said it does not have an institutionwide rule on AI use, but instructors must have evidence beyond a detector score to report a student for academic dishonesty.

Tricia Bertram Gallant, director of the academic integrity office at UC San Diego, advises faculty to realize how herculean of a task it is to ensure students don’t use AI if they aren’t completing the work in front of them.

“If it’s an unsupervised assessment, don’t bother trying to ban AI,” she said. “And don’t bother trying to prove AI was used because you end up spending more time doing that.”

But Bertram Gallant, who is also president emeritus of the International Center for Academic Integrity, wishes more people would put pressure on the government to regulate AI and the academic cheating industry, and on tech companies to make it harder for students to use their products to cheat.

“We keep turning on what the academic institutions need to do to fix problems that they didn’t create,” she said.