生成式人工智能的情况将会变得更糟

生成式人工智能的情况将会变得更糟
Things are about to get worse for generative AI

原始链接: https://garymarcus.substack.com/p/things-are-about-to-get-a-lot-worse

介绍我的最新文章，题为“生成人工智能的情况将变得更糟”。在 garymarcus.substack.com 上发表的这篇文章中，我深入研究了生成式人工智能技术引起的令人震惊的侵权问题。目前的人工智能系统，特别是 DALL-E 和 ChatGPT，被发现会产生侵犯各种形式知识产权的材料，包括文本、图像和电影特许经营权 - 所有这些都没有告知用户其来源。此外，当前的系统缺乏有效的方法来提供正确的归属或跟踪生成的媒体的来源。让事情变得更加复杂的是，由于我们的调查结果涉及通过 Bing 使用 DALL-E，微软也受到了牵连。这些问题给创作者和公司带来了重大的财务和法律问题。请继续关注 IEEE Spectrum 即将推出的进一步报道。

但是，如果我要分发图纸的副本，无论是我自己画的还是让人工智能为我画的，这都成为商业行为，并且侵犯版权成为问题。 Ultimately, the question boils down to whether the outputs of llms are unique creations or simply transformed rehashes of existing work. 无论我们如何解决这场争论，人工智能的发展很可能会导致创意产业发生颠覆性变化，特别是当这些工具变得更容易获得和负担得起时。这可能需要消费者、生产者、政策制定者和律师等的观念转变。

原文

At around the same time as news of the New York Times lawsuit vs OpenAI broke, Reid Southen, the film industry concept artist (Marvel, DC, Matrix Resurrections, Hunger Games, etc.) I wrote about last week, and I started doing some experiments together.

We will publish a full report next week, but it is already clear that what we are finding poses serious challenges for generative AI.

The crux of the Times lawsuit is that OpenAI’s chatbots are fully capable of reproducing text nearly verbatim:

The thing is, it is not just text. OpenAI’s image software (which we accessed through Bing) is perfectly capable of verbatim and near-verbatim repetition of sources as well.

Dall-E already has one minor safeguard in place – proper names (and hence deliberate infringement attempts) reportedly sometimes get blocked – but those safeguards aren’t fully reliable:

And worse, infringement can happen even the user isn’t looking to infringe and doesn’t mention any character or film by name:

Dall-E can does the same kind of thing with short prompts like this one,

Here, just two words. The show SpongeBob SquarePants is never mentioned:

No mention of the film RoboCop

Video game characters

And a whole universe of potential trademark infringements with this single two-word prompt:

A few minutes ago, a user on X, named Blanket_Man01 discovered essentially the same thing:

Justine Moore of A16Z earlier today independently noticed the same thing:

The cat is out of the bag:

Generative AI systems like DALL-E and ChatGPT have been trained on copyrighted materials;
OpenAI, despite its name, has not been transparent about what it has been trained on.
Generative AI systems are fully capable of producing materials that infringe on copyright.
They do not inform users when they do so.
They do not provide any information about the provenance of any of the images they produce.
Users may not know when they produce any given image whether they are infringing.

My guess is that none of this can easily be fixed.

Systems like DALL-E and ChatGPT are essentially black boxes. GenAI systems don’t give attribution to source materials because at least as constituted now, they can’t. (Some companies are researching how to do this sort of thing, but I am aware of no compelling solution thus far.)

Unless and until somebody can invent a new architecture that can reliably track provenance of generative text and/or generative images, infringement – often not at the request of the user — will continue.

A good system should give the user a manifest of sources; current systems don’t.

In all likelihood, the New York Times lawsuit is just the first of many. On a multiple choice X poll today I asked people whether they thought the case would settle (most did) and what the likely value of such a settlement might be. Most answers were $100 million or more, 20% expected the settlement to be a billion dollars. When you multiply figures like these by the number of film studios, video game companies, other newspapers etc, you are soon talking real money.

And OpenAI faces further risks.

And because the stuff we reported on above was all done through Bing using Dall-E, Microsoft is on the hook, too.

More about all this on January 3, at IEEE Spectrum.

If you care about artists, please consider sharing this post

Gary Marcus is a scientist and best-selling author who spoke before US Senate in May on AI Oversight. He was Founder and CEO of Geometric Intelligence, a machine learning company he sold to Uber.

Reid Southen, his collaborator on this work, is film industry concept artist who was worked with many major studios (Marvel, DC, ) and on many major films (Matrix Resurrections, Hunger Games, etc)

生成式人工智能的情况将会变得更糟 Things are about to get worse for generative AI

生成式人工智能的情况将会变得更糟
Things are about to get worse for generative AI