人工智能 vs. 专业作者结果

人工智能 vs. 专业作者结果
AI vs. Professional Authors Results

原始链接: http://mark---lawrence.blogspot.com/2025/08/the-ai-vs-authors-results-part-2.html

这项实验重审了一项两年前的测试，该测试将人类奇幻小说作者与人工智能在短篇小说写作方面进行对决。目标并非要明确“获胜”，而是要探索人工智能的突破点——随着复杂性增加，其性能何时会下降。虽然较长篇幅的作品显然更适合人类，但短篇小说提供了一个具有挑战性的测试。结果令人担忧：人工智能在平均评分中*优于*人类作者，得分最高的短篇小说是由人工智能生成的。公众甚至ChatGPT自身尝试识别作者的努力也大多无效，平均表现仅略好于抛硬币。参与的作者们也难以区分人类和人工智能的作品。参与的作者——包括罗宾·霍布和珍妮·伍尔茨——总共售出了1500万本书，这凸显了这一结果的重要性。尽管作者承认该测试并非科学严谨，但他对人工智能现在能够创作出令人愉悦的小说表示 dismay，引发了关于艺术、作者身份以及人类创造力可能贬值的伦理问题。尽管希望人工智能专注于有益的应用，例如治愈疾病，但作者担心未来人工智能生成的内容会充斥市场，可能损害人类艺术家。

黑客新闻新 | 过去 | 评论 | 提问 | 展示 | 招聘 | 提交登录 AI 与专业作者的结果 (mark---lawrence.blogspot.com) 14 分，作者 biffles，2 小时前 | 隐藏 | 过去 | 收藏 | 讨论指南 | 常见问题 | 列表 | API | 安全 | 法律 | 申请 YC | 联系搜索：

原文

Before you look at the results make sure to do the test!

So, before I get to the numbers and reveals I'll reiterate a few things:

i) I hate that AI can do this.

ii) These authors write books - typically long series - so flash fiction is not their forte.

iii) Flash fiction is where AI does best - it starts to fall apart as the required work gets longer.

Some have questioned "why flash fiction"...

Answer: because you test things at breaking point. If I'm interested in what it takes to smash a window I don't throw 8 anvils at 8 windows and 8 ping-pong balls at 8 windows and say, "Welp, there you have it, anvils are 100% better than ping-pong balls."

In the first blog post 2 years ago the performance of the authors and AI overlapped but the authors did better on the whole. So my guess was correct, this is where the AI performance starts to falter.

In the second blog post we revist to see what has changed with 2 years development. I found the results interesting.

If I could have got a meaningful number of people to read 8 twenty-thousand-word novellas (I couldn't) and I could convince busy authors to write novellas for an experiment (I couldn't) then we would clearly get a 100% result in favour of the humans ... and ... have learned very little about the state of play.

The contributing authors have sold around 15 millions books between them. And they are...

Robin Hobb

Janny Wurts

Christian Cameron / Miles Cameron

& me!

In terms of ratings - when we did this 2 years ago, the scores were low. Six of the eight entries scored 3* or less. The people I can attract to vote in this like to read books. Short stories are unpopular in comparison, and the shorter they get the harder it is to tell a good tale. The ungenerous will suggest it is simply because none of the entries were great - I feel it's because book readers rate short stories lower than books.

Two years later, five of the eight entries scored 3* or above. You can consider the results below to see if that was because the humans did better, or the AI did better, or both.

I have a Ph.D student on my patreon who constantly berates me for my terrible diagrams. Here's another one, just for you, Rae!

We had 964 votes on the issue of whether story 1 was by a human or AI. This fell fairly smoothly to 474 votes on the rating of story 8.

So, when it came to choosing, on average the public got 3 wrong, 3 right, and couldn't decide on 2. I.e. they're no more effective than a coin toss!

Two of these were too close to be statistically significant, but in some cases the votes were quite certain. A sizeable majority of people thought my story was human authored and a sizeable majority thought Janny's story was AI authored. So it's not that people don't have strong opinions/instincts ... it's just that they're no more likely to be correct than tossing a coin.

I asked (a new session) of ChatGPT to guess which ones were AI and it didn't do a good job either, despite generating them.

And the scores on the doors?

And here the bad news is that the AI scored better than us. Not only was the highest rated story an AI one, but they scored higher on average too.

I asked the authors to do the test themselves. Only one has got back to me at time of posting.

That author made five guesses, four of which were wrong, and listed as their top two stories ... two AI generated ones...

Conclusion

First off, let me repeat my disclaimer about this not being a scientifically rigourous test.

Given that:

On the short scale it seems likely that people, on average, can't tell AI from human when it comes to fantasy writing.

If you got 6 right out of 8 ... well there's a ~15% chance of getting that result (or better) by chance, so rather than 15% of us patting ourselves on the back, we really we have to look to the bulk statistics for answers. And they don't look good.

In terms of enjoyment ... in this test the AI won.

Can AI generate a better book than Robin Hobb can write, absolutely not. Might it one day generate a book that would do better than one of hers in terms of sales and public acclaim? A few years ago I would have said 'absolutely not', at least in my lifetime. Now, it seems like a possibility, though hopefully an unlikely one (again - in my lifetime).

Should AI generate fiction, imagery, voices etc competing with artists in a number of fields and fooling the public. No, of course not. I hate that idea and most people do too.

Will it happen? It's already happening. Wherever anyone can circumvent skill and heart and just profiteer off a new technology, they're going to do it. People threaten people with knives in the street for a few dollars - are people going to try to sell you AI books ... of course.

I want AI to cure diseases. That's mostly it. But it looks like it was one of the belated escapees of Pandora's Box, and we're not going to be able to put it back.

Will I ever use AI to write anything (other than the bits of flash fiction in these tests). No.

Will I ever read AI fiction for pleasure. No. To quote someone wise: If nobody could be bothered to write this, why should I bother to read it?

It's a pretty grim outlook though, especially for new and future authors.

I had always felt that to write a great book that looked at human issues and offered insights, emotion, and enjoyment, would require an actual human, and that we wouldn't reach the point where a computer could do it any time soon.

I now wonder, if (and it's still a significant if) we get there ... will that mean that the AI is intelligent, alive in some sense, worthy of respect and rights? Will we have created an intelligent lifeform in lieu of going off into space and finding one? And is that a wise and/or moral thing to do?

It's a huge shock to me that fiction which, in this test, scores higher than great authors who write wonderful stories full of soul and heart and wit and intelligence, can be generated by the multiplication of a relatively small number of not particularly large matrices. On the face of it it undercuts so many things we value about being human.

There are many ways to argue against being too disheartened by this sort of thing. I advise you to seek them out. The future feels like a scary place right now, but I hope that, as far as the creative arts are concerned, AI runs up against a wall very soon and efforts are directed into doing tasks that benefit humanity rather than undermine it.

人工智能 vs. 专业作者 结果 AI vs. Professional Authors Results

人工智能 vs. 专业作者结果
AI vs. Professional Authors Results