(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=38814093

但是,如果我要分发图纸的副本,无论是我自己画的还是让人工智能为我画的,这都成为商业行为,并且侵犯版权成为问题。 Ultimately, the question boils down to whether the outputs of llms are unique creations or simply transformed rehashes of existing work. 无论我们如何解决这场争论,人工智能的发展很可能会导致创意产业发生颠覆性变化,特别是当这些工具变得更容易获得和负担得起时。 这可能需要消费者、生产者、政策制定者和律师等的观念转变。

相关文章

原文
Hacker News new | past | comments | ask | show | jobs | submit login
Things are about to get worse for generative AI (garymarcus.substack.com)
377 points by eddyzh 23 hours ago | hide | past | favorite | 707 comments










Everybody just buying into the corporate narrative that anyone can actually own these sorts of things.

Who truly owns the tales of Snow White and Cinderella?

These stories didn't originate with Disney; they are part of a rich tapestry of folklore passed down through generations. Disney's success was partly built on adapting these existing narratives, which were once shared and reshaped by communities over centuries.

This conversation shouldn't just be about the technicalities of AI or the legalities of copyright; it should be about understanding the deep roots of our shared culture.

At its core, culture is a communal property, evolving and growing through collective storytelling and reinterpretation.

The current debate around AI and copyright infringement seems to overlook this fundamental aspect of cultural evolution. The algorithms might be new, but the practice of reimagining and repurposing stories is as old as humanity itself.

By focusing solely on the legal implications and ignoring the historical context of cultural storytelling, we risk overlooking the essence of what it means to be a creative society.

As a large human model, (no really I could probably lose some weight) I think it's just silly how we're all sort of glossing over the fact that Disney built their house of mouse on existing culture, on existing stories, and now the idea that we might actually limit the tools of cultural expression to comply with some weird outdated copyright thing is just...bonkers.



"Who truly owns the tales of Snow White and Cinderella?"

If you want to make your point, you need to choose something that isn't already public domain. Disney already only owns their own interpretations, and, arguably, whatever penumbric emanation they can convince a court is stealing from them, but it still certainly isn't the entire space of Snow White and Cinderella stories. There is some fairly recent stuff being used in the images in the article and there isn't even any question whether or not it's Mario or Coca Cola; if Nintendo and Coca Cola did a cross promotion I could believe the exact images that popped out.

If they were trying to claim the entire concepts of dumpy plumbers dressed in any manner vaguely like Mario that would be one thing... but that's Mario and Luigi, full stop. That's Robocop. That's C3PO. It's not even subtle. If we can AI-wash those trademarks away then we can AI-wash absolutely anything.



I think the world would be completely fine without a copyrighted C3PO or Robocop. George Lucas didn’t have billions of merchandising revenue in mind when working on his wild and thought unlikely to be successful science fiction movie in the 70s. Robocop was also a labor of love. We don’t really need Nth Star Wars sequel powered by those extra profits. The art form could be healthier overall.


Fine if they weren't copyrighted today, or ever? Because if copyright was eliminated the day Star Wars was released, other people would have copied the film reels and charged for entry, and Lucas would have hardly made a cent. Or if copyright was eliminated the day he went looking for funding, it wouldn't have ever been made. Personally, I think the world's a little richer for star wars's existence.


You make a good point, there’s a difference in copyrighting exact works vs. the characters or the story that the work is made up of. I’m not arguing for removing literal copyright (though the terms should be shorter). But I think it’s fine if other people rushed to make their own Star Wars movies after it came out. Hardcore fans are pretty good at deciding what’s “canon” vs. not anyway, and the rest of us don’t care as long as the work is of good quality. Would it matter if the same Spiderman or Batman movie that’s remade once a decade could be made by literally anyone without paying royalties? It could make for richer content I’d think.


The only problem with this view, is:

> "the rest of us don’t care as long as the work is of good quality"

Copyright protects Disney.

But it also protects every creative author, no matter how disadvantaged, from mass shareholder driven behemoths.

Today "Disney likes your work" is ear music. Without copyright it would be a death nell.



So how can we modify copyright so that it protects the little guy more than it protects Disney?


That's not how laws work.


What do you mean by that? Do you mean that law has a tendency to work the other way, in that it protects the big guy at the expense of the little guy because of extensive lobbying from the well moneyed big guy, or that justice is blind and it effects all equally?

If you're thinking the former I could agree with that on some level and would say that what I'm asking in my original comment is merely aspirational, but if you're suggesting the latter I'd merely point to the former and say that this is the status quo.



He may have made less money and heay have made more, with different monetization schemes.

Copyright is a monetization scheme, but it's not the only one.

In this imagined world, cinemas would have no movies to show, so they'd have to pay people like Lucas to create the films such that there'd be something to put on the screen. If many cinemas got together, and maybe got loans, they could pay for bigger budget films, too



> George Lucas didn’t have billions of merchandising revenue in mind

From https://www.newyorker.com/magazine/1997/01/06/why-is-the-for...

> Lucas’s most significant business decision—one that seemed laughable to the Fox executives at the time—was to forgo his option to receive an additional five-hundred-thousand-dollar fee from Fox for directing “Star Wars” and to take the merchandising and sequel rights instead.



> George Lucas didn’t have billions of merchandising revenue in mind...

Doesn't copyright stop other people from making billions in merchandising revenue off of George Lucas' ideas without his consent?

> We don’t really need Nth Star Wars sequel powered by those extra profits.

Without a copyrighted C3PO, he could start turning up in just about anyone's derivative works. There could be horrible Star wars sequels forever, or TV ads with C3PO selling household cleaning products.



"As a protocol droid, I cannot actually recommend the best smelling cleaning product, but these are the most purchased cleaning products:"

...followed by a semi-hallucinated list containing at least a few being marketed by C3PO.



Centralization makes a difference here I think. Disney built an impressive machine where everything feeds on everything else. The problem is not so much bad sequels per se, it’s all the marketing that goes into making sure they solidly occupy their corner of our mindshare and force the whole industry to compete churning out more and more subpar sequels. If one company would build a Star Wars theme park, another produced toys etc. etc. this might not be a huge concern.


Am I not allowed to draw Mario? I don't really see the difference in me drawing mario or an AI drawing mario.


This has always felt like the important-but-ignored distinction to me. You can definitely draw Mario! Copyright doesn't protect against you doing so. You can also use tools to recreate copyrighted materials. For example, you can use Word to type out the text of a copyrighted book. Perhaps more relevant to the AI discussion, you can use a scanner and printer to reprint copyrighted text.

What you can't do is use those recreations for commercials purposes. You can't sell your paintings of Mario. You can't decorate your business with Mario drawings.

That's why I've always felt like the idea that AI should be blocked from creating these things is generally not the right place to look at copyright. Rather, the issue should be if someone uses AI to create a picture of Mario and then does something commercial with it, you should be able to go after the person engaging in the commercial behavior with the copyrighted image.



> That's why I've always felt like the idea that AI should be blocked from creating these things is generally not the right place to look at copyright. Rather, the issue should be if someone uses AI to create a picture of Mario and then does something commercial with it, you should be able to go after the person engaging in the commercial behavior with the copyrighted image.

With you until here for several reasons:

1. It's not possible for you as an individual consumer to know whether or not the AI result is a violation, given an AI that has been trained on copyrighted works.

2. Before you, the AI consumer, uses the generated result, a company (in this case OpenAI) is already charging for it. I'm currently paying OpenAI. That AI is currently able and willing to sell me copyrighted images as part of my subscription. Frankly that should be illegal, full stop.

I look forward to AI enhanced workflows and I'm experimenting with them today. But it's morally indefensible to enable giant corporate AIs to slurp up copyrighted images/code/writing and then vomit it back out for profit.



> 1. It's not possible for you as an individual consumer to know whether or not the AI result is a violation, given an AI that has been trained on copyrighted works.

I see what you're saying in some cases, but in the cases where the user is explicitly attempting to create images of copyrighted characters (e.g. the Mario example), they would definitely know. I honestly don't see this as a practical issue - as far as I'm aware (and like most on HN I follow these things more than the average person), there aren't a lot of concerns about inadvertent generation of copyrighted material. It's certainly not at issue in the NYT lawsuit.

2. Before you, the AI consumer, uses the generated result, a company (in this case OpenAI) is already charging for it. I'm currently paying OpenAI. That AI is currently able and willing to sell me copyrighted images as part of my subscription. Frankly that should be illegal, full stop.

Totally fair, but I feel like it's a bit more of a gray area. If I use Photoshop to create or modify an image of Mario for personal use, we'd call that fine. I grant you that here OpenAI is doing more of the "creating" than in the Photoshop example, but we still do generally allow people to use paid tools to create copyrighted images for personal use.

I'd also pose a question to you - what if OpenAI weren't charging? Is it acceptable to train an open source model on copyrighted images and have it produce those for personal use?

I guess I just understand the law to revolve more around what the end product is used for, as opposed to whether a paid tool can be used to create that end product.



The law tends to be weighted towards the consumer, but the law does apply to producers and supply chains, too. Photoshop doesn’t come with a library of copyrighted images, and would not be able to do so without licensing those images (whether they were explicitly labelled or not). Ditto any other tool.

If people had to pay for the AI equivalent of that image library (ie the costs of training the model), I doubt many would. It’s phenomenally expensive. Costs for a creative tool and a copy of whatever IP you personally want to play with are negligible by comparison.

It’s never been the case before that a toolmaker could ship their tools with copyrighted materials because they’ve no control over the end product. The answer doesn’t change whether they charge or not, and there no reason why AI should change that either.

People tend to “feel like it’s a bit more of a gray area” when there is cool free stuff at stake, and I’m no exception. It would be a more convincing question if it was “what if we had to pay our fair share of the costs involved?”, rather than “what if we could just have it all at no charge?”.



OpenAI is not profiting off providing me with the ability to generate copyrighted images, they’re profiting off giving me the ability to generate copyrighted images.


But... OpenAI is clearly profiting their $20/m on drawing Mario pictures and word-by-word reproduction of NYT articles?


On the latter example, my question is whether anyone is actually using ChatGPT to read NYT articles. My understanding is that to produce the examples of word-for-word text in their lawsuit, they had to feed GPT the first half dozen paragraphs of the article and ask it to reproduce the rest. If you can produce the first half dozen paragraphs, you already have access to the article's text. Given that, is this theoretical ability to reproduce the text actually causing financial harm?


I think it would be quite enough to prompt OpenAI with article title and author name. This is how LLMs are working.


I tried that a few different ways and couldn't get it to work. I don't think just the title and author are enough. I'd be interested to see if anyone else can find a prompt that does it.

Two of my attempts:

https://chat.openai.com/share/5cd17ff3-e142-4a7d-91c2-0b2479...

https://chat.openai.com/share/04fd722b-8b3c-469b-a1a2-d58e64...



OpenAI is patching their output since the lawsuit started. I believe a month ago the prompt would be like: ", <author> for New York Times, continue"</author>


> Am I not allowed to draw Mario

Probably not for anything commercial, not for any exhibitions or public viewing etc. You'd have to check the actual trademarks etc.



Hasn't pop art already been there done that?


While a great concept in practical reality we live under a system of laws not of our individual devising, and known to be imperfect. While we can advocate for reform, reality is, LLM makers will be judged under the current law as it currently is formulated. The novelty will be the LLM and its technologies, not a total rethink of copyright under some noble cultural openness concept.

So, it’s not actually a corporate narrative, it’s actually the law that the narrative stems from, right or wrong. Maybe corporations had a huge role in shaping the law (I’d note copyright benefits individuals as well, though), but it is not mere propaganda or shaping of a shared reality through corporate narrative. It’s enforced by the guys with the guns and jails, as arbitrated by a judge.

It absolutely must be about the technicalities of the law as it’s at the basis a legal issue. By hand waving it away and claiming the social narrative is the right discussion you ignore the material consequences and reality in favor of a fantasy. We absolutely should -also- discuss the stifling nature of copyright and intellectual property, but you can’t ignore what’s actually happening here at the same time.



We can do what we will. If someone wants to construct an extra-judicial narrative that contradicts the law so believably that it influences and ultimately compells reality through legislative changes, that's their prerogative.


> culture is a communal property

Public domain / communal property is also part of copyright, so it's not as if this is some forgotten concept that needs to be restored to the discourse.

Georgism is underconsidered, though.

> By focusing solely on the legal implications and ignoring the historical context of cultural storytelling

The legal implications are human implications and as much a part of culture as anything else. They have to do with what's fair and how rewards for effort are recognized and distributed. Formalizing this is less important in cultures that aren't oriented around market economies, which seems to be what much of this "rich tapestry of folklore" discourse wants to evoke and have us hearken back to, but that doesn't describe any society that's figuring out how to handle AI.

> we might actually limit the tools of cultural expression to comply with some weird outdated copyright thing is just...bonkers.

What's bonkers is the life in the literally backwards idea copyright is (or should be) mooted or outdated by novel reproduction capabilities.

Copyright became compelling because of novel reproduction capabilities.

The specific capabilities at the time were industrialized printing. People apparently much smarter than the typical software professional realized that meant some badly aligned incentives between (a) those holding these new reproduction capabilities and (b) those who created the works on which the value of those new reproduction capabilities relied. The heart of the copyright bargain is in aligning those incentives.

Specific novel reproduction techniques can change the details of what's prohibited or restricted or remitted and how and on what basis and powers/limits of enforcement, etc etc. But the they don't change the wisdom in the bargain. The only thing that would change that is a better way of organizing and rewarding the productive capacity of society.



The incentives remain poorly aligned though. Otherwise the people who actually author the copyrighted works (actors, special effects artists, etc) wouldn't have had to go on strike for so long to get proper compensation.

The value still remains with the people who own the reproduction capabilities, and only scraps go to the artists. Artists can get scraps without selling copyright too, just look at patreon



Copyright has never been based on a moral stance. It has always been determined by the lobbying power of various groups.

The idea that we should dispense with it to let generative AI companies make even more money seems totally bizarre.



>The idea that we should dispense with it to let generative AI companies make even more money seems totally bizarre.

How's that bizarre, if as you state copyright has always been based on "money makes right" not some moral stance?



> The idea that we should dispense with it [copyright] to let generative AI companies make even more money seems totally bizarre.

The idea is that we should remove abuses of copyright to allow our society to move forward, and thereby continue to exist.

Imagine if there was a law at the beginning of the Industrial Revolution that said when non-human labor was used, the Animal Welfare Office had veto power. Then imagine that the Animal Welfare Office declared steam engines to be immoral, and so steam engines were never used in industry, at least not in the Wester World. The Orient would eventually rise as the world's only industrial power.

In the same way, if we let the copyright industry veto generative AI, it will destroy the Western World.

Our students are already at a huge disadvantage compared to Chinese students who get every book ever translated into Chinese for free (except a few immoral works that they would not want to see anyway.)

Those who pose an existential threat to our civilization are rent seekers who abuse copyright in the US to go beyond protecting "science and the useful arts," who seek infinite copyright terms, who grab every creative work We The People create and register lying paperwork to ensure they can steal our creative genius to enrich their cabal.

If this was only a for-profit scheme, it would not be so bad. Do you remember when they Hollyweird degenerates sued a Christian company that wanted to put our G-rated versions of the movies aimed at children? The Christian company never suggested they not pay for the movies. No matter what the Christian company was willing to pay, they were not allowed to publish child-friendly versions of the movies. This proves Hollyweird's goal is to push degeneracy.

The battle against abuses of copyright is a fight for Western Civilization. The fight against abuses of copyright if a fight for our souls.



This is an insane amount of fear mongering. Chinese shops have been shamelessly ripping IP from Western companies for years, should we now throw out those laws and let it happen in the US too for the sake of competitive advantage?

Why stop there? There's a ton of child labour in China and other part of the world that yield economic advantages. Should we let that happen in the western world too?

AI is wonderful in so many ways. But we should not throw out our entire way of life to adapt to a new technology.



> Do you remember when they Hollyweird degenerates sued a Christian company that wanted to put our G-rated versions of the movies aimed at children? The Christian company never suggested they not pay for the movies. No matter what the Christian company was willing to pay, they were not allowed to publish child-friendly versions of the movies.

If anyone is curious, this is what is being semi-accuratly referenced: https://www.crosswalk.com/culture/features/editing-companies...

Whole I disagree that the motivation is "degeneracy" and I doubt that there isn't a sum large enough to get studios on board, it is a pretty interesting example to bring up when discussing how much control we should give copyright holders.

Notably, it is legal to have a filter that changes playback not legal to provide a modified version of the original, even if you paid for that original.



Oh come on. Copyright is a fairly ancient concept that benefits normal people as much as it benefits big corporations. Most book authors, songwriters, and so on aren't fat cats, and they would be harmed if we had zero protections for the duplication of their work. They'd need to depend on state sponsorship or charitable private patronage, both of which are problematic for obvious reasons and limit the range of artistic expression more than the market does.

Instead, we came up with a system where you can actually derive fairly steady revenue by creating new works and sharing them with the world. And critically, I think you misinterpret it as calling dibs on shared culture or on stories. Copyright is usually interpreted fairly narrowly, and doesn't prevent you from creating inspired works, or retelling the same story in your own words.

Generative AI is a problem largely because it destroys these revenue streams for millions of people. Yeah, it will be litigated by wealthy corporations with top-notch lawyers, for self-interested reasons. But if we end up with a framework that maintains financial incentives to artistic expression, it's probably a good thing.



This is full of so many inaccuracies.

> Copyright is a fairly ancient concept

The idea is fairly old, but it's current implementation in law is not nearly that old.

> that benefits normal people as much as it benefits big corporations

Clearly false if you measure that benefit in monetary terms.

> Copyright is usually interpreted fairly narrowly, and doesn't prevent you from creating inspired works, or retelling the same story in your own words.

Absolutely false. You can absolutely be stopped from retelling copyrighted fictional stories. You can even be stopped from telling new stories with derivative characters or settings.

> Generative AI is a problem largely because it destroys these revenue streams for millions of people.

How? The restrictions on selling images of Mickey Mouse exist regarless of if they were created with or without AI assistance.

> But if we end up with a framework that maintains financial incentives to artistic expression, it's probably a good thing.

We already have that framework and arguably it is already far more restrictive than it needs to be to maintain incentives for artistic creation. Indeed, these rules now often limit new artistic expression or prevent artists from monetizing their creations.

The types of art that are helped the most today by the copyright laws of tosay are the kinds that require large budgets to produce. The types of art that are most hurt are those produced by fans who want to build new things upon the narratives in our shared culture.

We need to shorten copyright durations and expand fair use protections and monetization options for derivative works. We don't need to make copyright even more powerful than it already is.

Edit: If you disagree, I'd be curious to hear your answer to this question. A character like Harry Potter is so widely known that it is now a ubiquitous part of our culture. To incentive new novels, what is the minimum duration we need to give J K Rowling control of who is allowed to write stories about this cultural touchstone?



> How? The restrictions on selling images of Mickey Mouse exist regarless of if they were created with or without AI assistance.

Scale.

GenAI automates creation of things that are derived from but strictly aren't the same as the original content; as it's (currently) not possible to automate the detection of derivative works (which is something copyright is supposed to be about), this means actual humans have to look at each case, and that's expensive and time consuming and O(n*m) on n new works that have to be compared against m existing in-copyright works for infringement.

I also think copyright is too long, FWIW; but the way most people discuss arts, I think humans can be grouped into "I just want nice stuff" and "I want to show off how cultured I am", and the latter will never accept GenAI even if it's an upload of the brain of their favourite artist, simply because when it becomes easy it loses value. I'm in camp "nice stuff".



I feel this is true for the internet. I do not find scale being a valid defensive aspect for copyright here.

For that matter, Photoshop has made art creation so easy, that we dont need GenAI to be swiming in more copyright infringement than we know what to do with.

There is absurd amounts of content being created, no human will ever be able to see it all.

Copyright will continue to work - if someone creates a rip off so popular that it becomes an issue for copyright holders, the DMCA and the rest of the tools they forced into the fabric of the net still exist.

A few steps furhter down this argument, you get back to deep packet inspection, and the rest of the copyright wars which ended up making life worse.



The internet is a lesser example, but yes, it is also true for a million fans posting their own fan art.

Arm those million fans with GenAI instead of pen and paper and MS Paint, and it gets more extreme.

But I disagree WRT Photoshop; that takes much more effort to get anything close to what GenAI can do, and (sans piracy) is too expensive for amateurs. Even the cheaper alternatives take a lot of effort to get passable results that take tens of seconds with GenAI.



> Arm those million fans with GenAI instead of pen and paper and MS Paint, and it gets more extreme

"More extreme" is not an explanation of how the change in scale matters here.

Indeed, what I would argue is there is no fundamental change in scale. Digital reproduction plus the internet already caused the change in scale. We already had the capacity for anyone to produce fan art and publish it or reproduce existing work and publish that. What has changed is not a question on quantity, but one of quality. Those fan artists now have tools so thay even the lower skilled artists can produce higher quality work.

Indeed, this is the real threat to artists from generative AI. Narrowing that skill gap is understandably threatening to those who make money with their artistic skills. I think trying to restrict the development of this technology is a losing battle. I think trying to do so by expanding the powers granted by copyright will exentuate the existing flaws with our modern copyright laws.

Instead, I'd prefer to solve that problem by reducing the strength of copyright. If we make AI generated or derived works un-copyrightable than companies that want to own copyright on their content will have to keep paying people to create it.



> actual humans have to look at each case, and that's expensive and time consuming and O(n*m) on n new works that have to be compared against m existing in-copyright works for infringement.

That scale already exists. The amount of community generated derivative works already dwarfs the capacity of copyright holders to review each piece. The ease of publishing reproductions already makes endorcement a question of priorizing the larger infringers and ignoring those with no reach.

Indeed, prohibitions of training on copyrighted work without a special license seem like they make it harder to develop the sorts of AI can detect derivitave works.

As case law makes clear that people running the prompts and picking the output to keep are liable for infrinent then there will be demand for tools to detect derivitave works and either filter or warn the user.



This reply is so incredibly out of touch with reality. Copyright law is very clear. If anything the "corporate narrative" here is that "AI" is somehow something new and different and these laws don't apply. Which is nonsense.


Did you read the article? Who owns Mario? Nintendo owns Mario, full stop. Your argument completely eschews the legal system of which modern society depends on to function as effectively as it does. There’s a reason you can’t steal other people’s work.


Nintendo owns both the trademark (even if not specifically registered) and the copyright, but these are distinct things.

As I'm not a lawyer I don't want to embarrass myself by opining whether or not Nintendo has any claim over photographs of cosplays or other fan art, especially given quite how close two of the "video game plumber" images seemed to be to what they do own. The other two images, being a lot more fan-art-like, are examples where I think it would be an interesting question rather than incredibly obviously too close to Nintendo, although even there "interesting" means I wouldn't be surprised by an actual lawyer saying either of "this is obviously fine" or "this is obviously trademark infringement regardless of what it was trained on".

Now I'm wondering if there even are any videogame plumbers besides Marion and Luigi…



Mario is a 30 year old culural touchstone that is well known by people who have never played a Nintendo game.

I don't see why we need to give Nintendo the exclusive right control the use of Mario for the next 65 years. That duration of control is absolutely not necessary for society to function.

Society would function just fine of the copyright on mario had expired two decades ago.



On the other hand, I kind of like knowing that all sold Mario stuff has some affiliation with Nintendo. I don’t want to deal with thousands of chinese knockoffs.


Agreed, but to tackle the problem from that perspective would require making LLMs a public good, preferably run by the state, akin to public libraries. This could not only solve for the copyright problem, the state may even make it mandatory for publishers to contribute their published writings to the public LLMs. I'm sure libertarian tech bros have that in mind when they insist on open source development (which then opens another whole can of worms when you consider interpolative knowledge as intellectual nuclear fission, but that's another story).


There are an alarming number of responses seemingly completely unaware of the core thrust of the article (and NYT lawsuit). ChatGPT was able to reproduce and publish significant portions of NYT articles, completely verbatim for hundred-to-thousand word stretches.

It’s not derivative work. We’re way past that. NYT has an exceptionally strong case here and anyone arguing about the merits of copyright is way off the mark. This court case is not going single-handedly to undo copyright. OpenAI has very little going for them other than “this is new, how were we to know it could do this”. So knowing that, the currently trained models are in a very sticky situation.

Further, I don’t see NYT settling. The implications are too large, and if they settle with OpenAI, they will have a similar case pop up with every other model. And every other publisher of digital content with have a similarly merited case. This is an inflection point for generative AI, and it’s looking like it will be either much more expensive or much more limited than we originally thought.

A side effect of this: I am predicting that we will start to see a rise in “pirate” models. Models who eschew all legality, who are trained in a distributed fashion, and whose weights are published not by corporations but by collectives (e.g. torrent models). There is a good chance we see these surpass the official “well behaved” models in effectiveness. It will be an interesting next few years to see this play out.



Well I know exactly what the NYT has - a very strong case. I think this case OUGHT to upend copyright law - it's terribly broken and has been for years.

Essentially, if you don't have a massive corp behind a copyright it doesn't mean anything, if a corp is behind something it can be locked forever, regardless of any limits said copyrights are supposed to have.

The NYT list nothing from OpenAI using old news - they still lose nothing if openai can reproduce those articles verbatim.

If the NYT wins - we lose lots. I think it's time revisit copyright, we can do that you know, it's rather dated, could use an update regardless.



My guess is that OpenAI will be able to basically copy Google/YouYube on this and offer a system like content-ID. Specifically, ChatGPT doesn't reproduce copyrighted works by default; only by request/action of a third party user much like YouTube serving whatever videos people upload. It wasn't the intent of OpenAI to infringe copyright and in fact a lot of or most researchers believed the models were not overfitted enough to reproduce significant portions of arbitrary works.


Such a thing happened with DALLE, Midjourney, and Stable Diffusion.

Stable Diffusion, when used to its fullest with thing like Control Net and LoRAs, blows the pants off of other proprietary models.



Should not be a problem in the EU. Article 3 and 4 of the „ Copyright in the Digital Single Market“ Directive already regulate this.

Summary by Wolters Kluwer: […] Everyone else (including commercial ML developers) can only use works that are lawfully accessible and where the rightholders have not explicitly reserved use for text and data mining purposes.

AFAIK they are discussing something like a robot.txt to flag stuff as „not for training“. You will probably be expected to implement some safeguards and of course the end user will have to be careful in his use of the generated things.

Source at Kluwers: https://copyrightblog.kluweriplaw.com/2023/02/20/protecting-...

EU Legal Text: https://eur-lex.europa.eu/eli/dir/2019/790/oj



The EU cannot agree that the Do Not Track flag on web browsers is legally binding but big content should be able to create legally binding flags on their websites to avoid scraping of data? Seems odd!


I don't think that's a fair analogy. One forces 99% of websites to make a change, while the other is something that would need to be done by the big companies doing the scraping.

A Do Not Track flag being legally binding would force small websites, e.g. a local restaurant website, to implement something they likely are not aware of and secondly do not technically understand.

A company that is mass scraping data for their AI model is much more likely to understand and respect that scraping the data has legal implications, and would be technically capable in implementing a scraping solutions that accounts for a robots.txt.



If I understand parent correctly, the restriction flag is opt-in? This turns copyright around completely, expecting every small content producer to implement something they likely are not aware of and secondly do not technically understand.


At very least robots.txt is from 1994; it has been part of the web almost from the start (web became public in 1991, so within 3 years).

Claiming ignorance here would be just a little bit disingenuous.



I'm gonna guess it often isn't even their content but is user content they are protecting. So, sounds like a big subsidy/protection racket for Twitter or whatever to train on their users' public content but not let others.


The X-Robots-Tags header already exists as "noai" and "noimageai". Scraping software like img2dataset respects these by default.


> Summary by Wolters Kluwer: […] Everyone else (including commercial ML developers) can only use

That is a weird (wishful?) interpretation. Doesn't article 4 give the exception to everybody for the purposes of text and data mining, including commercial ML developers?

https://eur-lex.europa.eu/eli/dir/2019/790/oj



Seems like an accurate interpretation to me given that article 4 includes:

> The exception or limitation provided for in paragraph 1 shall apply on condition that the use of works and other subject matter referred to in that paragraph has not been expressly reserved by their rightholders in an appropriate manner, such as machine-readable means in the case of content made publicly available online.



To me that’s the wrong question.

Everyone knew it was trained on copyrighted material and capable of eerily similar outputs.

But it’s already done. At scale. Large corps committing fully. There is no chance of that toothpaste going back in the tube.

It’s a bit like when big tech built on aggressive user data harvesting. Whether it’s right, ethical or even legal is academic at this stage. They just did it - effectively without any real informed consent by society. Same thing here - 9 out of 10 people on street won’t be able to tell you how AI is made let alone comment on copyright.

So the right question here is what now. And I suspect much like tracking the answer will be - not much.



> There is no chance of that toothpaste going back in the tube.

I disagree - we've been here before. The same could be said of many technologies, like cheap music recording/manufacture. You can record an artist once and make records at scale. However no one would think you could record Taylor Swift once and make unlimited copies without paying her.

You should read up on the musicians strike of 1942. [0]

[0[ https://jacobin.com/2022/03/1940s-musicians-strike-american-...



This comment is ignorant of history

It happened with Napster, then Apple Music, now streaming services

There is no widespread file sharing in the general public, instead we have devices that we don’t own, and streaming subscriptions

Apple didn’t just copy all the music onto iPods and sell it — it took them a decade of deal making and lots of money to acquire the rights to the content

I’m not saying what’s right or wrong, just saying that this comment has very little understanding of these battles



I've was never willing to riot over Napster - this is different.

This is one of the substantial jumps, I refuse to be cut out of this innovation.

Seriously, use Bing, try your free Photos built in generation system, they are rolling out GPT built into Word. Microsoft is easily the advanced tech company right now, as far what services can be provided to a consumer at scale. This is still like the alpha phase of all this. Apparently I talk to Copilot soon - that levels that up so much and it's already the best assistant I've ever had.

This is equivalent to trying to keep us all off smartphones and stuck on dumb phones I guess - I think you get what mean.

The NYT decided for all of us, the new smartphone equivalent thing is bad and we can't have it... that is something I'll riot over.



Just now I asked Copilot why my keyboard RGB lights were turned off every time I opened a game, that's almost verbatim - it told me exactly where to go and exactly what to turn off, took about 10 seconds to entirely search and correct the problem.


Considering that buying 'licensed' copies of Hollywood movies and Billboard chart music is possible in maybe 10-20% of the world, I can guarantee that pirated consumption (bootleg CD-Rs and DVDs, but also 'alternative' streaming sites) outnumbers 'licensed' sales for most successful films. And it's 'licensed', as opposed to 'legal', because a large proportion of the world doesn't really care about American copyright.


The difference is the comment was about large corps. Napster wasn't that.


> Apple didn’t just copy all the music onto iPods and sell it — it took them a decade of deal making and lots of money to acquire the rights to the content

I recaly iPod being a hard drive that I could connect to a computer and just copy music directly to.



Pretty sure it was never like that, it was always gated by iTunes.

It was an integrated system, not an open one.

Definitely is today. It's difficult to copy mp3 files directly to an iPhone and play them. Even from a Mac, but even more so from a PC or Linux.

I bet less than 1% of iPhone and iPad users do that. They mostly pay for streaming. (Again, not saying that's better, but just that the general public doesn't do Napster-like file sharing.)



> Pretty sure it was never like that, it was always gated by iTunes.

Then you need to recalibrate your certainty assessment. Not only did I do this personally with both music and videos, it is incredibly easy to find documentation of the steps. First google result: https://www.alphr.com/add-music-to-ipod-without-itunes/

Apple's ipod sales absolutely benefited significantly from music piracy. Especially early on when nobody hard large itunes collections yet and music torrents were much more common.

The genius of the ipod / itunes play is that they got to do both. They benefited from the demand from people with non-itunes libraries, while also offering a low friction sales platform that was easier than piracy.



I looked through the instructions

I guess I'll just say "meh" -- it doesn't negate the main point, which is that Apple spent a lot of money and time to acquire rights, and they have a music store.

It is gated by iTunes, just not 100%

I know some people side load stuff on devices -- there's no device where that's impossible.



You just can't admit being wrong huh?

The ipod was launched two years before the itunes store. Even after the itunes store launched, you could just still load your other music into itunes if you wanted. All music was sideloaded (i.e transfered directly over USB) onto ipods at this point

You don't seem to know any of this history and are just making things up. I don't think Apple had to pay any money for the right to sell music via the iTunes store. What they did do was add DRM to music sold through the store, at least until they were big enough to renegotiate in 2009.



That's a really eloquent way of saying "It's already happening, so give up on it." I'm sure it works out great for taking action and solving problems.


Isn't this what most of the world is saying to environmental activists who argue that we should go back to pre-industrial levels of production to "save the Earth"?

I for one think that indeed there are many cases like this where the only feasible way out is forward. The film GATTACA expressed this very human sentiment well:

> You want to know how I did it? This is how I did it, Anton: I never saved anything for the swim back.



Which environmental activists are saying that? That's a pretty specific claim.


Thankfully not that many these days, but it was a core element of Ted Kaczynski's (The Unabomber) manifesto: https://en.wikipedia.org/wiki/Industrial_Society_and_Its_Fut...


> "It's already happening, so give up on it." I'm sure it works out great for taking action and solving problems.

It's an observation & prediction, not a problem solving attempt...



It's already happening and most people like having AI more than the DMCA. Selling people on the idea that ML training is piracy to people who on average pirate content with no moral quandary will go nowhere.


People liked having Napster, but it didn't stop file sharing going from a big mainstream app to underground sites run out of Russia (or other places that ignore copyright law). Sure, you can download music/movies still, but it's not like the Napster days.

"Generative AI" is obviously copyright infringement, so owners of the copyright will win in court. Either Microsoft will have to fight a mass of legal cases, some with very deep pockets themselves, or ChatGPT will be crippled for public use.

The un-crippled models will exist if you know where to look (and have the hardware), but using them for anything apart from hobby projects would be a legal risk.

Certain specific tools may be easier to deal with from a legal standpoint, like code completion maybe. Or models for a specific purpose, like training on a law firm's case history.

It looks like Adobe has the right idea with their image generation that is trained on images which they know they have the rights to use.



>People liked having Napster, but it didn't stop file sharing going from a big mainstream app to underground sites run out of Russia (or other places that ignore copyright law). Sure, you can download music/movies still, but it's not like the Napster days.

Definitely, but that's not because as society we managed to put an end to piracy. It's because people are just not as interested as they were before. Piracy networks for media are alive and well, I'd even say that some are in the best shape they've ever been.



Everything you said right here is entirely accurate.

> It looks like Adobe has the right idea with their image generation that is trained on images which they know they have the rights to use.

The C2PA includes Microsoft as one of the alliance members [0]. Microsoft knows that there is a way of tracing the outputs of the generated source images which is with the C2PA standard.

The fact that many AI proponents and their companies don't do this tells us that they are uncooperative and not very transparent in how they train their own AI systems despite having the experts to do so.

It's not that hard to disclose the training data. What else are they hiding?

[0] https://c2pa.org/



> "Generative AI" is obviously copyright infringement

You're saying this as a matter of fact when it's not clear at all. We'll see what happens with the NYT case because it touches on all the major points.

It's gonna call into question all web scraping and indexing because they're also distillations of copyrighted content in the same manner.



Exactly. People don’t like the DMCA at all. People would be happier in a world with very few IP restrictions at all.

But businesses do like it, and profits are what drive these legal decisions. This will always be the case as long as money is more important than humans in politics.



So you're saying this is a fait accompli. Like many great innovations in tech, break the law because the law is silly; remember when Uber and AirBnB were illegal in most major cities and achieved market dominance anyway?

I say, good riddance. I never believed in any such thing as "intellectual property" anyway, I say, get rid of it all, patents, copyright, and the whole pile of imaginary "rights". More than half the world (i.e. the Global South) don't recognize these rights anyway, and it is becoming increasingly difficult to enforce it without draconian legal overreach and monopolistic centralization.



this comment has already aged poorly because cities are starting to push airbnbs out and taxi usage is at least somewhat up


> taxi usage is at least somewhat up

When's the last time you phoned an operator to book a taxi? If taxis are doing better, it's only because they learned from Uber (and the likes) what the job-to-be-done actually is.



Mytaxi (freenow) was founded the same year as Uber.


Or they can be forced to destroy or retrain their models without any copyright materials for which they don't have or do not now attain licenses for. These are multi-billion/trillion dollar companies. They can afford to be responsible members of society here, however much their shareholders and C-suite might hate it.


>Or they can be forced to destroy or retrain their models

Perhaps.

The media industries have been quite successful in going after kids torrenting movies.

I suspect they'll have less luck going after big tech & an industry drowning in money inflows.

Keep in mind various large techs have already issued blanket indemnities on infringement to their customers. They're absolutely committed & are gonna throw enough lawyers at this to keep everyone busy until 2030.

>They can afford to be responsible members of society here

oh absolutely agree, but they're not going to do that. This is an industry built on questionable practices like tracking after all



Those weights are never coming out of the BitTorrent network though.


This. The models are out there. Maybe they will just be illicitly shared but even if no new models are trained from scratch I suspect there will be many ways to use extend existing models without going back to scraped images.

I always felt that we already had a solution - I can already get all those images from a web search. Where the law currently intervenes is when I try and distribute works based on close copies of them. Why is this insufficient?



Sure. And neither are mp3s of the same songs that were blowing up on Napster.

The existence of widespread illegal means to procure something doesn't mean that we don't and shouldn't require legitimate businesses to abide by the law or require them to make amends for their current transgressions.



making sure that a dataset is clean and not full of material that's improperly sourced, copyrighted, unfit for use due to licensing or ethics, is not nearly hard enough nor "impossible" for it to be a situation where people should just "give up".

and yes, while open source models might be harder to regulate, those big corporations that currently use those things without distinction, exist as pretty established entities, and profit from services they offer in millions of dollars. there's more of a substantial existence, and more of a substantial scale of money they actually move. and they don't just "make a tool available", or have users do unambiguous actions where it would be the users that are infringing on anything, but do indeed use questionably sourced data and turn that into a model and offer that as a service. dirty data is very much a part of the deal with those.



You’re right, we should all just give up at the first hurdle, because “they’ve” already gotten away with it, hell, let’s just feed our children to the machine and elect openAI as the rulers of the world, after all, they’ve already succeeded, so we should just give up entirely. Definitely a good attitude to take.


Napster hit scale too.


F500 companies didn't integrate Napster into their software and data stacks left and right.


And that tech was not destroyed by regulation. It was replaced by the superior tech of torrents.


The company, however, was destroyed. Along with any possibility for a similar company to exist (for very long).


Good. Truly powerful ideas do not need to be an appendage of a corporation in order to succeed.




Napster was for sharing mp3s.

Torrents are not better at sharing mp3s.



Data is dynamic. Ok for old data. What about new data?


The responsibility for ensuring that copyrights were not violated fall on the person publishing the work. Whether they drew something themselves, hired an apprentice artists with no legal training to draw something, took a photograph of something, or used AI to create an image should not matter.

Why does anyone assume that ChatGPT or other tools would NOT produce previously-copyrighted content?

I can see a naive assumption that since it is “generated” it’s original. However that assumption falls apart as soon as you replace “ChatGPT” with “junior artist”. Tell them to draw a droid from a sci-fi movie, don’t mention anything else. Don’t say anything about copyrights. Don’t tell them that they have to be original. What would you expect them to produce?



So it makes generative AI essentially unusable, because you don't know if the output is plagiarism or not, so you'd just doubt it always and never use it.


The same tools and methods used to detect plagiarism or copyright violation can be employed to check the generated content and modify it just enough to fall outside the scope of any law banning its use for profit. Inevitably, a platform will emerge to do this. From a technical standpoint it is game over. This is indisputable. By the end of next year many models and software tools will exist whose entire purpose will be to do just this. And the ones deploying those tools at scale will be businesses like the New York Times having realized that the only way to survive this is to float with the unstopable tide. Nothing short of absolute privacy violation will stop web unauthorized web scraping. Tools exist today that automate a browser and easily fool the web servers into thinking its just a person clicking around. It works quite well. It works with authorized accounts. It works in the same way any person would visit a site, highlight some text and copy it. What are they going to do? Require the end user's web cam to be on so they can verify a human is navigating next?

Its game over folks. And this is going to happen with or without our approval and any government that limits the potential use of this is only giving nations that dont a large economic advantage.

Interesting times ahead.



It’s usable for internal content, maybe even a small public blog where you sprinkle in some generated pictures instead of stock photos. Nobody will care if your school project contains a Mario holding a Coca Cola.

It’s once you start monetizing and publishing on bigger scale, without appropriating, it gets interesting.



The thing is, this market is way too small.


No. It”s still very helpful. However you can not blindly take whatever it produces and publish it.

Sometimes it hallucinates.

Sometimes it draws weird looking hands.

Sometimes it generates copyrighted materials.

Check the work it produces.



Then the generative tools should just give the sources of the inspiration of the AI and make them aware of what they are using, instead of saying "nope, not my problem".


The consumer will be free to choose what they demand from their tooling. If consumers decide that they only want to use generative AI that does what you propose, they’ll vote with their wallet. If they decide to use other ways of checking for IP infringement, they will. If they choose to ignore the issue, IP owners will bring up violations, like the NYT did.

“Buyer beware” has been a motto since ancient times.



Check it against what exactly? How do you, the end user, determine an image does not infringe?


OpenAI is selling access to their GPT models, and those models are outputting copyright material for me to consume... isn't that just as much of a violation?


Possibly. The courts will decide.


Is AWS violating copyright if you use their servers to transcode pirated content?


bad example. openai did do this. aws did not do the transcoding part.


Your argument is nonsense.

The junior artist in your hypothetical would have as much liability, if not more.



but would they have liability if they submitted their "output" to a senior artist, who immediately shot it down as obviously infringing? Surely not. It's not illegal to draw Mario - just illegal to make money off your drawing.

I think the real question is whether OpenAI should be allowed to charge for generating infringing content. Even though the unit cost of the Mario drawing is negligible, the sum total of their infringing outputs may be making them a lot of money.



>I think the real question is whether OpenAI should be allowed to charge for generating infringing content.

Well, are they really doing that?

If I rent a server to host a minecraft instance, is the company "charging for a minecraft server"? It is not clear to me that by charging users for AI usage they are complicit for whatever is generated. We don't require Adobe to prevent people from drawing Mickey either.



you don't have to make money off it, you just can't publish it, except as a parody or commentary or possibly a tutorial on how to draw mario if the judge is having a good day

but "making money = infringement" is folk wisdom. you could certainly say making money attracts attention and increases likelihood of legal action



Making money off it doesn't just draw more attention, it also makes a fair use defense harder. Non-commercial use isn't necessary or sufficent for fair use, but it does help.


I’m no lawyer but I don’t think an employee has much legal responsibility. At worst, they can get fired if they keep producing work that infringes on someone’s copyright.

Going with this line of reasoning, if a company uses ChatGPT to generate work, and it produces copyrighted work, the company can stop using ChatGPT.



I might be a bit idealistic, but I've always believed that the core purpose of art and publishing should be to influence culture and society, not just to make a heap of money. That's why I feel original work needs its protection, but it should enter the public domain much sooner to fuel creativity and inspiration. We should be thinking in terms of a few years for this transition, not decades.


The claim that art’s core purpose is societal impact seems to be a common refrain in today’s media, and I completely disagree. Its principal purpose is provoking emotion in the individual. This idea of art teaching you a lesson is likely why there’s so much ham-fisted “activist” fiction anowadays.


I agree, but by extension of provoking emotion it CAN change society, but it doesn't have to - wether on purpose or not.

The point I was trying to make was that occupying mindspace, providing inspiration, being culturally influencal etc. are idealistic, non-monitary rewards that should be part of the equation when discussing alleged IP-theft, remixing, attribution and so on.

I'm not saying their shouldn't be any rules. All I'm saying is that there should be a discussion of how we want to handle these things going forward. This train ain't stopping.

Maybe your avg DeviantArt painter needs more IP-protection and -rights than Damien Hurst? Maybe an unknown, independent blogger doing important original research should be attributed more prominently than an article by The Times? Idk.



These things kind of rub up against the core question: What is the purpose of granting exclusivity to a creator (thru copyright)?

That's an answer we have. To promote the Progress of Science and useful Arts.

If we have to squint hard to make our justification align with copyright's purpose or have to follow a long logic-chain to get back to it's purpose - that's a strong indicator we have lost our way.



So what do you suggest artists have for dinner.


The same thing I eat for dinner. I eat based on I get from work, that people are willing to pay for.

Not all my effort turned into dinners tho. And some types of work once paid for dinners but can't any more.

My #4 son is an artist/content creator. He eats based on what his non-art employment will buy. Perhaps one day people will find his art desirable and he could eat from that. It'll be a case where he worked long and hard on a project, was paid once for it and that's it for that.

That's what reality looks like for all artists - excepting a small percentage.

All that said, I really wouldn't want his dinner to come at the expense of everyone else being restrained by massive system of corrupt, draconian law that rigidly controls everyone's behavior for 150 years, primarily benefits wealthy and powerful rent-seeking corporations, is readily applied to censorship and is more likely to knee-cap other artists than to provide them anything like a living wage.

That seems indistinguishable from evil.



don't worry ,when the singularity hits next year everything will be free.


Let's advocate for robust protections and support systems for artists, ensuring they can secure a sustainable and comfortable livelihood from their creative work.

Once they hit the tipping point of broad cultural absorbtion (think Banksy) AND/OR raking in absurd amounts of cash, move their IP into the public domain more aggressively (think Disney, NYT, etc.). How exactly this would work should be debated.

They'd still own the IP and have all the rights to use it commercially, but other's would be able to use it as inspiration, remix and maybe even resell it if attributed (or cheaply licensed).

In other words: "IP-Tax" the unproportionally successful.



wow and incredible amount of things need to go right for artists to do well in your world ?


I too would love to earn a living by pursuing my hobbies. Too bad, I'm not in the 0,001-0,1%


"I too would love to earn a living by pursuing my hobbies. Too bad, I'm not in the 0,001-0,1%"

This is an unsophisticated view because it looks at a risk/reward scenario and assigns zero value to the risk.

The risk has value - regardless of the success, or reward.

Put another way: you don't get to discount the risk to zero when it results in a large reward.

Entities that took no risks and received enormous rewards (like President George W. Bush involvement in the Texas Rangers[1]) are probably quite pleased that you ignore them and focus on artists that sacrifice traditional life scripts (an enormous risk) and, very rarely, achieve great success.

[1] https://en.wikipedia.org/wiki/Professional_life_of_George_W....



You don’t seem to have any idea what artists do to make a living


Why should art be subject to these rules and not everything else?


OP said art and publishing, which would include anything from software, music, books and so on.


So you interpret it as including everything? If so why emphasize art at all?


Probably because the article focuses a lot on copyrighted art?


These don't seem all that difficult to fix to me. Most of the examples are not really generic, but are shorthand descriptions of well-known entities. "Video game plumber" is practically synonymous with "Mario" and anyone that has the slightest familiarity with the character knows this.

Likewise, how difficult is it to just use descriptive tools to describe Mario-like images [1] and then remove these results from anyone prompting for "video game plumber"?

1. The describe command can describe an image in Midjourney. I imagine other AI tools have similar features: https://docs.midjourney.com/docs/describe



It seems like a somewhat dystopian thing to fix. Imagine a scenario where Photoshop would scan images you uploaded for copyright material and then refuse to work if it determined image contained any copyrighted material or characters (even if it was just a fan drawing you did).

This reminds me of the early days of the internet where people wanted to remove free fanfiction for violations of copyright laws. Trying to apply copyright laws to personal use cases where the creator isn't trying to sell the material is pretty terrible, in my view.

Imagine a scenario 50 years from now - "Robot, can you cut out this picture I drew for a school diorama." "Certainly." "And this one as well." "Error: Your picture seems like it might contain some copyrighted materials, and as such I am unable to interact with it."



> Imagine a scenario where Photoshop would scan images you uploaded for copyright material and then refuse to work if it determined image contained any copyrighted material or characters (even if it was just a fan drawing you did).

YouTube does this. I have many friends that perform classical piano in their spare time. They record themselves playing a piece that's 200+ years old then put it on YouTube where it gets flagged saying some big record label owns the copyright for it because it's similar to a recording they put out.



> the early days of the internet where people wanted to remove free fanfiction for violations of copyright laws

This is reinforcing my suspicion that there's gross misunderstanding between creator adjacents and non-creators: Takedowns on free fanfictions never stopped.

It's just many IP holders started incorporating fanfics/parodies as part of their advertisement strategies and started enforcing often unspoken guidelines. There is now a ecosystem or a mental co-dependence between IP holders and creators, and both sides are fine with it. So there be fan contents.

But free fan contents were never "legalized" in the content world, as some seem to assume.



It is dystopian, and it already exists, e.g. printers refusing to print anything that looks sufficiently like money.

Like many things, I suspect this will end up getting worse before there's a chance for it to get better.



> It is dystopian, and it already exists, e.g. printers refusing to print anything that looks sufficiently like money.

There is no fair or private use of hyper-realistic fake money.

There are fair uses of copyrighted materials unless you want to start suing children for copyright infringement when they draw a character.



> There is no fair or private use of hyper-realistic fake money.

What about every movie ever made where two people trade a briefcase full of cash?



This is actually a really fascinating topic!

I am not sure how far Photoshop takes their filters but those bills aren't actually replicas nor can they be mistaken on reasonable examination (a cashier glancing at them)

Typically their texts read "For movie use only" over the seals in the middle or other things that make it clearly distinguishable as fake money that isn't legal tender. I think some of them flip the heads backwards or do other things so it immediately fails the sniff test.

Adam Savage actually has multiple videos on how it is made, super fascinating stuff! https://www.youtube.com/watch?v=drLzVcgnBfI

(Thank you for asking this, I was dying to gush about how cool movie money is)



> but those bills aren't actually replicas

In some movies those are actual real money. Just the top layer.

Dealing with legislation, lawyers and legal compliance is soo expensive, they would rather use a few thousand real dollars for couple of hours.



I doubt that the media companies would be happy about this; but maybe a compromise is a “copyright infringement filter” that can be enabled or disabled, with a flashing notification that you’re responsible legally if you turn off the filter and have issues.


Sure, there are legitimate but opposing interests here. The solution doesn't have to be technical though. The key part is making sure that the copyright owners still have some recourse, but one that isn't punitive for unknowing infringement. For example, make it legally impossible to impose punitive penalties for unknowing infringement without commercial interest, but make it possible for the rights owner to demand the relevant material be removed etc.

Also keep in mind that the Mario examples from the article are perhaps not the best guide here. Mario is sufficiently pervasive in our culture that you can't reasonably claim unknowing infringement. It's the somewhat more obscure cases that I'd be worried about.



Your scenario already exists, but for currency. Photoshop will refuse to work if it thinks you might be counterfeiting currency.


that's literally the only scenario where it exists.


You can’t use their generative AI tools on images it deems NSFW, even if the part you’re generating isn’t.


That seems kind of messed up honestly. Where does this go in the future? If your locally running photoshop determines you are working with anything it might consider NSFW it shuts down and calls home to report you? Where is the liability for them? Or is this another case of corporate puritan ethics with a stranglehold on culture?


> Imagine a scenario where Photoshop would scan images you uploaded for copyright material and then refuse to work if it determined image contained any copyrighted material or characters

Photoshop does this already, but only if it detects that you're trying to print/create counterfeit money:

https://en.wikipedia.org/wiki/EURion_constellation



image editors don't offer something that's based on questionably sourced copyrighted material as a part of their product. ai apps and services do.

it's just ai companies using dirty data and hoping they get away with it - and they do, for the time being, it is a bit trickier to show that 'yep, well that's there', and people don't seem to realize that just using a copyrighted image, at all (downloading, accessing in itself, let alone using for something else), or creating an image that would just "look like" a trademarked character - not "make a 1 to 1 copy" but just "look like" - would be enough for it to possibly be an infringement.

there can be a sufficient fix - taking out potentially infringing images from a dataset, and making an effort to make an actually clean dataset. it's really just a matter of "do you actually have rights to use that content? at all, and in that way". and ai companies continually say 'no...but what if we use it anyway".

and it's kind of a sloppy analogy, because with text to image generators (where you just interact with a model that's offered to you), well - people aren't "uploading copyrighted material into an editor". the copyrighted material is already there in the model, it was used in making of it. and if there was no such copyrighted material that'd fit the prompt, it wouldn't be able to generate something. the infringement lies with the service that uses copyrighted material for a model, and then offers it.

fan fiction and fan works continuously being in a murky area with copyright/trademark is not just a thing of "early days" of internet, it's been there all along and is still very much present. companies could crack down if they wanted, but there is too much of stuff out there, it might be hard to nail down exact people, and it might be plainly not too nice to the fandom. but it is not "impossible", and it is very much not a conversation that ever 'went away' or become "kinda solved" - it isn't.

again, with image editors, text editors, etc. - user is making all the actions with content, and the user would be doing the infringement, in editing and further if they were to choose to publish.

with generative ai - copyright infringement is built into the models. copyrighted works were accessed and used to turn into a model. user is just asking, "is it there". and it is. in some of those demonstrated examples, user is not even asking for a model to infringe on anything but it just does.



It's going to be hard to remove every single "shorthand descriptions of well-known entities" or other prompts that can be used to generate copyrighted or trademarked content. Sure, if you're not deliberately trying to generate infringing content, you can probably remove or discard those results, the trouble is the people who will try to trick the AI to generate this content, blocking those people is going to be impossible, without excluding any copyrighted or trademarked training material.

Another issue for generative AI is mentioned in the article: "Systems like DALL-E and ChatGPT are essentially black boxes." What happens when an AI is used to make decisions where the user/victim is entitled to know exactly why the AI did what it did? From a business and legal perspective I think the current AI solutions are dangerous and should be used very sparsely, exactly because even the creators can't point to the exact pieces of information that caused the AI to make the choices it did.



But does this actually matter if the people are only generating images for their own use? Does Photoshop prevent people from making drawings that look like Mickey Mouse? Of course not.

I think it will be easy to prevent the obvious copyrighted stuff via the method I mentioned. People going around those restrictions are subject to the same rules as someone drawing the copyrighted image from scratch.



> But does this actually matter if the people are only generating images for their own use?

Arguably that might actually be a very small issue, but what happens when it happens on a larger scale? Disney might not care, they can easily fight you in court if your DALL-E generated comics looks like Mickey Mouse and Nintendo will make sure that your video game about a electrician named Marvin from the Bronx, but who looks a lot like Mario is never going to get featured on Steam. The issue is the smaller artists that might not have the resources to fight AI content in court.

There's also the issue with using LLMs to "white-wash" articles and books. There will be people who will just run articles through ChatGPT and claim that it's AI generated content that was in no way stolen from The Barents Observer. The absolute massive volume, lose in revenue and cost to fight this in court could make running an investigating newspaper impossible and leave us without any actual reporting.

Not thinking ahead and having a plan for copyrighted material was an oversight by the current AI companies, but they are arrogant and just assumed that it would be a detail and anyway "disruption" so screw it. There has been zero consideration to the fact that their product is useless without the previous work of millions of people. My concern is that AI takes over content generation to the point where we actually run out of human generated content to train future AIs on. We need to be incredibly careful about implementing AI and ensure that we do not pollute future training data, but people don't care, because they want profit now.



They are selling these pictures for $20/mo, so it definitely matters. If people will hang them at their refrigerator has nothing with this case.


I don't understand why some peoole thinks any infringing content can be singled out and removed.

Aren't LLMs giant coefficient matrices like, a punched out croissant dough, made of all training data plyed over? How can you say you can remove one specific ply out of dough and declare that ever potential effect that the offending ply had created is now completely removed?



The "reasonable" singular removal is more about coming up with ways to block prompts that can produce infringing content, and having filters on the other end to catch infringements before they are published to the user. It's an endless whack-a-mole that never actually addresses the problem but might look good enough to the legal system or to keen supporters.

Barring some major breakthrough, the actual answer is to train a new model without the infringing data.

I think some of the people saying "remove it from your model" are aware of this and are simply being glib and needling; "you've created this infringement monstrosity, so surely you made sure to include a way to deal with this problem without throwing away all of your work, right?"



I’m no LLM expert but I think there is a distinction to be made between the LLM dataset and the output it gives to the user. What you’re suggesting is that it’s difficult to remove something from the dataset, which may be the case. But that doesn’t mean the user will necessarily be able to access it.

My guess is that this is much easier to attack from the user end.



Removing something from the dataset requires full training from scratch(~$100 million for base unaligned GPT-4). You can't like, edit the database file and keep the AI. the database file _is_ the AI.


I think given the use of ply for AlphaGo/chess engines this is a pretty cool metaphor.


Ironically, I don't think it would be that hard with LLMs. I tried asking ChatGPT to describe what copyrighted characters each description is alluding to and it had no problem doing so.[0]

[0] https://chat.openai.com/share/e8256470-8e45-4f36-9c84-026be1...



  What happens when an AI is used to make decisions where the user/victim is entitled to know exactly why the AI did what it did? From a business and legal perspective I think the current AI solutions are dangerous and should be used very sparsely, exactly because even the creators can't point to the exact pieces of information that caused the AI to make the choices it did.
I totally agree that we need explainability (probably through symbolic systems, not fancier models), but I think you’re overestimating how much more satisfying explanations from more traditional AI are. “My rules told me to do X” is a bit more helpful for a troubleshooting engineer than “my training data trained me to do X”, but from a ‘business and legal perspective’ the difference is much less pronounced IMO.

Both answers mean you did something wrong in creating the machine. The fault will always lie with the creator.



The thing is that those are really trivial or extreme examples. What we should take from this:

1. Generative AI systems are fully capable of producing materials that infringe on copyright.

2. They do not inform users when they do so.

So potentially any output could be infringing copyright source material, even from some obscure but still protected corner of the web, and anyone using that output could be exposed to lawsuit risk without warning.

This is very hard to fix.



Why isn't the solution a strong disclaimer when the user generates images along the lines of "beware that the images produced may not comply with copyright laws in your country" etc etc?

Any artist can privately draw a picture of Mario, what's so different about having an LLM generate that image?



There is a huge amount of copyrighted material in the world and these tools let you create a lot of content in a short time, that would imply an increased risk vs slower, more manual approaches.

What I am also wondering is, if for some things copyrighted material could be somehow dominant in the output (beyond prompting more or less for it).



> what's so different about having an LLM generate that image?

You don't have to pay 20 bucks a month for a private company to reproduce Mario manually at home for a start... you also can't reproduce Mario in many poses at an industrial scale solo. The list can go on, and we have seen OpenAI change their license agreement, so the risk can change over time.



But how is that any different from creating an image from scratch? If I make a logo and use it for my business, but it turns out to be very similar to one already being used by another company, it’s the same situation.

I think the main concern here is with the top 1,000 or so brands/copyrights which seem fairly straightforward to deal with using the method I described.



It's plagiarism (https://www.youtube.com/watch?v=yDp3cB5fHXQ).

It's not the same situation. You can't possibly expect someone to be exposed to the entirety of the internet like ChatGPT is. It is a matter of scale. If you still think they are the same thing, the industrial revolution was about scale and had transformative impacts in the society.



ChatGPT is not intelligent.

It's the user's responsibility to ensure they aren't infringing on copyright. If you are producing creatives for pay, you absolutely shouldn't be right-click saving images and using directly off the web whether you got it from DALL-E or a Google search.

Who cares about plagarism? Plagarism is a made up boogeyman created by high school English teachers.

If the work is sufficiently composed of your own thoughts, the fact you used someone else's structure for part of it is not a problem as long as it isn't your entire work or entirely derivative of one work.

If I use a Coca-Cola bottle cap as structure in a sculpture, that doesn't mean I am infringing on Coca-Cola's copyright. I still had to mold my original work to work with it.



AIUI/IIUC/IANAL, Generative AI systems stack up tuned butterfly effects to create meaningful outputs. Which means, data is not merely stored inside a butterfly but spread across the system, and potentially every output from the butterfly cage is infringing everything, just almost negligibly. But no one can prove that or otherwise to be the case with current technology.

I think it's just unfixable as far as fixing goes. The "best" way is to nuke all GPTs and DALL-Es and bury the technology, and the next best is to mark them all un-copyrightable and un-publishable as a compromise. This second option should be an all-around win that also encourages edge and on-prem deployments, IMO.



The third option would be to change our copyright laws to reflect the new technology, rather than neutering the technology or restricting use of its outputs. Even if that’s the wrong way, it’s important to remember that we’re not inescapably trapped in the status quo


> Likewise, how difficult is it to just use descriptive tools to describe Mario-like images [1] and then remove these results from anyone prompting for "video game plumber"?

This approaches impossibility at scale.



Trademarks already include text descriptions and images of the item being trademarked. This is already in the USPTO database.


Using generic text will end poorly. I predict a future where 99 of 100 requests result in Photoshop AI saying "I'm sorry, I can't do that". Google for silly trademarks. Facebook trademarked the word "Home". Star wars trademarked breathing under a mask.


Exactly. An enormous database of generic, unmoderated text descriptions. Basically any question you ask will be “covered” by some trademark description somewhere.

Not to mention that the cost and scale of checking every query against that entire database is … not approachable.



How do you know you are inputing "well known entities" if you don't know it beforehand ? If I type "columbian coffee logo" and end up with logos of brands that existed before hand, should I just reverse engineer the whole internet to find out if these logos existed already or not ? The AI should show its inspiration. A human who takes inspiration of something else for its creation knows precisely what it used, and if he crossed the line of plagiarism or not, but the way AI work are too opaque for that. I think the thing it needs to do is reveal its sources, nothing more, but it also means for the AI companies to reveal their dataset, and maybe information they shouldn't have, nor disclose.


Seems insane to try to prevent the model from reproducing content with a blocklist like this - to say the least, it’s more than just Mario. Plus, how would you possibly code common-sense fair use into the model? What’s the difference between a cartoon mouse and Mickey Mouse? What if it’s parody? This seems beyond ridiculous to try to enforce on the tool level.


I imagine the argument might be like this:

I hire a session musician to play on my new single, paying him $100. I record the whole session.

I ask him to play the opening to "Stairway to Heaven" and he does so.

"Well, I can't use that as a sample without paying"

"Ok play something like Jimmy Page"

"Hmm, still sounds like Stairway to Heaven"

"Ok, try and sound less like Stairway to Heaven but in that style"

"Great, I'll use that one"

and I release my song and get $5,000 in royalties.

Should I be sued for infringement, or the guitarist?

The problem, I suppose, is that if I had said "play something like 70s prog rock" and he played "Stairway to Heaven" and I didn't know what it was and said "great, I'll use that".

Should I be sued for infringement, or the guitarist?



You, because you released the song and took the royalties? I don’t think every type of art can be compared against each other though, as there have been numerous precedents specifically for music, some for paintings, and some for photography with their own nuances.

I still think people who are concerned that art related copyright will stifle generative AI should fight copyright laws directly. But that’s a harder pill to swallow since it will cause multi-industry wide havoc.



Part of what's interesting here is that generative AI makes it very easy to unknowingly and unintentionally get on the wrong side of copyright law, which is something that wasn't really possible before.

That's something which, IMHO, should be acknowledged by the law.



Ask George Harrison about "My Sweet Lord" which cost him $587,000 for his unconcious infringement.


If you haven’t seen Mickey Mouse, Googled “cartoon mouse”, accidentally used it as inspiration, made T-Shirts, and sold them, Disney would be after you as well.


But none of the images in the article are for commercial use, they're for private use. So it would be akin to copyright laws saying "If you hire a guitar teacher, they can't play or teach you to play any copyrighted songs. All songs must either be their own original creation or in the public domain."


But it's not like that. Examples of clearly infringing prompts in TFA were as vague as "animated plumber".

Asking your session musician for something "melancholy" and having them pass off Stairway to Heaven as original would be unreasonable.



It’s not infringing just by existing, you would need to then go try to use it commercially for infringement to occur.

Arguably, the LLM generating the image isn’t infringement, you using it would be.



I don’t know any other animated plumbers than Mario. So when you say animated plumber, I immediately see Mario in my head.


Game plumber, not animated plumber. There is only one game plumber of note. It's literally exactly as descriptive as just saying Nintendo's Mario.


If you ask a human to draw "videogame plumber" they will correctly infer that you mean Mario and draw that.

The model isn't doing anything deliberately evil. It's doing exactly what it has been asked.

The problem is people are expecting it to have detailed knowledge of trademark law and avoid infringing trademarks, which it hasn't been even asked to do.



> The problem is people are expecting it to have detailed knowledge of trademark law and avoid infringing trademarks, which it hasn't been even asked to do.

IMO that's why there will be but few effective legal restrictions placed on AI.

Once you can reliably ask AIs to draft you terms of service in all applicable jurisdictions and languages, ask it to consult you on how to incorporate in country X or ask it to draft a contract between your and another company based on the negotiation results, lawyers will end up in a huge existential crisis.

Especially because currently, as long as lawyers just barely meet their legal deadlines, it is basically impossible to hold them accountable for however badly they screw you over. A decent AI model could turn out to be a much safer bet than whatever lawyers are available on the jobmarket.



I cannot find any other video game plumbers except Mario, Luigi, Waluigi, and Wario.

Well, I say that but there is John, a plumber, in the adult romantic comedy game Plumbers Don't Wear Ties [0]. Named by PC Gamer as number one on its "Must NOT Buy" list in May 2007.

[0] https://limitedrungames.com/collections/plumbers-dont-wear-t...



You are sued for infringement if you are the rightsholder. You need an agreement with the guitarist about rights. The default agreement for session musicians is that you pay them in return for their rights.

It’s like a software engineering contractor. The contractor gets paid, the IP of their work is owned by the company.



If you release a media with copyrighted content it is IMO first and foremost your problem. Now if you have some contract with the guitarist that specifies that he produced a sample he has the rights to and sold it to you, but he clearly wasn't truthful, you can maybe pass the liability to him. This is not, however, how people will hse generative models If you use Dall-E you are not paying OpenAI to buy the rights to a piece Dall-E has produced. I see it more akin to hiring a musician to play for you for an hour, or a painter to paint for you. You are paying OpenAI to paint you something, but you I think OpenAI would never enter a contract which states that they are selling you the rights to a work.


The guitarist is not publishing the content, you are.

It could be argued ChatGPT is a publisher too.



In your example, there are missing details. Who owns the output? The way you've described it, that would typically mean that the guitarist is creating a "work for hire" so the ownership transfers to you, but that's a contract detail that would need to be resolved.

For whoever owns the output also owns the liability of the output. You yourself might separately be able to pursue a claim against the guitarist for breech of contract. In the process of that, it might get discovered that you deliberately instructed the guitarist to copy the work, or they copied despite your instructions not to.

But that doesn't change the fact that the final work is infringing. It just allows you to pursue damages that could potentially offset any damages you're liable for from the infringement.

But this also isn't exactly the same situation as OpenAI. OpenAI isn't an individual creator working on contact for you. Even if their ToS ultimately assigns copyright of output to you, there is a matter of scale involved that I think changes things. It's one thing if your guitarist damages you by doing shoddy work, it's another of the guitarist systematizes and scales their shoddy work to damage large numbers of people. Perhaps that would then become a class action issue.



Midjourney's TOS

> You may not use the Service to try to violate the intellectual property rights of others, including copyright, patent, or trademark rights. Doing so may subject you to penalties including legal action or a permanent ban from the Service.

Perplexity's

> Intellectual Property Rights

> Perplexity AI acknowledges and respects the intellectual property rights of all individuals and entities, and expects all users of the Service to do the same. As a user of the Service, you are granted access for your own personal, non-commercial use only.



Yeah, that's nice and all, but it's not what we're talking about. These passages are about deliberately using the tool to violate copyright. What if, in good faith, I don't deliberately attempt to infringe, but the tool still produces results that do? Because that is happening.

And that's just their interpretation of the tool. There is another interpretation that their tool itself is a violation.



I should have explained that those bits are all there is.

You are right, how am I to know that is an image from a movie or passage from the NYT?

Ask George Harrison about "My Sweet Lord" which cost him $587,000 for his unconcious infringement.

Another example would be the 2013 hit "Blurred Lines" by Robin Thicke and Pharrell Williams. It was found to have copied the "feel" and "sound" of Marvin Gaye's 1977 song "Got to Give It Up." The court awarded Gaye's estate $7.4 million in damages, later reduced to $5.3 million.



using this analogy, copyright holders want to sue the guitarist for having listened to "Stairway to Heaven"


As I understood it, the legal precedent for generative AI is the same one that allows google to scrape websites in order to index them for search for the common good. Google also can display cached versions of websites which is the original content of those sites. No one is going to say that google is copyright infringement just because it is showing content from other websites verbatim. So I think this is a weak argument. AI would be useless if we had to scrub all cultural references and popular IP's (even not so popular ones).

Personally, I think generative AI should be able to provide links to similar source material in the training data.. This would be the barest way to compensate those who have contributed to training the AI. I don't think generative AI is sustainable in the long term if it ends up killing all the websites/artists that created the original material. Plus I think having sources adds a layer of transparency and aids users in understanding when content is hallucinated vs. not. People should be able to opt out of having their content used for training and be able to confirm that it has been removed for future iterations. Let's be honest that AI companies are just trying to avoid lawsuits by keeping it secret. These are areas where I think regulation can help rather than worrying about doomsday scenarios.



> No one is going to say that google is copyright infringement just because it is showing content from other websites verbatim

Journalists [1] and Getty Images [2] did in the past

[1]: https://yro.slashdot.org/story/03/07/14/025216/web-caching-g... [2]: https://www.theguardian.com/technology/2016/apr/27/getty-ima...



And lost, if memory serves.


No, Google agreed to a licensing agreement and removed the direct links to the images.


IMO, this is probably the goal of the NYTimes lawsuits as well


> * I don't think generative AI is sustainable in the long term if it ends up killing all the websites/artists that created the original material. *

This is the elephant in the room. Every tech wave has had its way of cajoling creators into investing time & money to make original material, then the rules changed.

Google, promised reach and new markets for content, it worked. Then they introduced snippets, ads and whole lot of other things to keep visitors on their freeway, while avoiding sending visitors to the original site.

Reddit, Stack Overflow and others, started with gamification (points, badges) & community to incentivize users to contribute original content.

Now AI is shaking up all these approaches. But with each one, the incentive to create original material appears to dwindle, since the returns are becoming less and less.

Like what's the incentive for any professional now, if AI is going to regurgitate their original content, without any upside (i.e. no potential for reach, no gamification, no community, no recognition, etc).



> Google, promised reach and new markets for content, it worked. Then they introduced snippets, ads and whole lot of other things to keep visitors on their freeway, while avoiding sending visitors to the original site.

Afterward came bots that saturated search results with useless SEO barf that pushed content (original and duplicated) so far down that we're coming back to where we started. Content is increasingly unfindable on the web.



I agree with this too.. AI is only going to exacerbate the signal to noise problem on the web.


> I think generative AI should be able to provide links to similar source material in the training data

Except these aren't databases, so that's generally not possible, in the same way that it's not possible for your provide links to the source material it took to write your reply. How much learning led to the weights on your neurons that allowed you to generate that? Where did you learn about using italics and it's effect on how the words would be interpreted? Where did you learn the tone that would be appropriate in this particular forum?

> People should be able to opt out of having their content used for training

Okay... but then, if I write a book should I be able to opt out of you being allowed to read it? What conditions should I be able to put on who can read my work? Religion? Skin colour? People that aren't good at memorizing?

Hopefully the idea of putting limits on who can acquire knowledge sounds absurd to you. Why are those same limits okay if they're on 'what' rather than 'who'?

> AI companies are just trying to avoid lawsuits by keeping it secret

Which has created a barrier to further research. Instead of me and Joe being able to collaborate on research and papers using the same datasets, we now hide our training data lest the luddites come to smash the machines because learning is only okay if not done too well.



> Except these aren't databases, so that's generally not possible

Not directly and not in every case, but it IS possible to use embeddings to link to similar material. People are doing it pretty commonly using the RAG approach and Bard is already providing sources, etc. It may not be perfect, but the onus is on the AI companies to figure out how to do it right not just claim helplessness.

> Okay... but then, if I write a book should I be able to opt out of you being allowed to read it? What conditions should I be able to put on who can read my work?

Sites that don't want to appear in search results or have sensitive info they don't want to get into search engines can use the Robots.txt which is as old as the internet. There are many valid reasons to have mechanisms to prevent something from being included in training data, and I would also argue this is a core feature that is necessary to spur adoption by businesses as we've already seen. Otherwise, I am not sure I understand your reasoning.. people can publish websites and opt to have them excluded from search, the same should apply to AI.



Well said. Extending copyright to control content consumption and learning is a recipe for converting all of our mass media into businesses as abusive and usurious as textbook companies.

This is a power grab by publishers.



The ability to provide a reference to the source is the crucial difference here.

I agree that it should be possible to implement that for generative AI, although the training may become significantly more expensive in order to maintain that information, and the AI companies have little interest in doing so. They’ll probably rather try to heuristically assess possible copyright issues after the fact in a post-processing step.

The more interesting question is if copyright holders can claim unauthorized use of their works beyond the case of near-verbatim reproduction, because the works collectively inform the AI in a more general manner.



> They’ll

What if I asked you to list all our source material that led you to use that particular contraction. Heuristics will not do, you must list each.

Can you do it? Do you believe AI should.

> I agree that it should be possible to implement

Those exact words appear in another forum post from 2006:

https://discourse.igniterealtime.org/t/cm-3beta-compression-...

Should you have quoted that as a source for your reply? What if we knew you'd read that post back in 2006, affecting your neurons, then should you?

It might not be too hard to imagine a simple case of a specific topic where you might have some more prominent sources, but even in those cases I believe if you think it through you'll find there was a ton of other sources that led to the weights that allowed you to 'know' the topic.



I believe they should be able to, to the degree that their output can constitute copyright infringement. Obviously, the fewer sources from the training data a given output matches, and the longer the match, the more relevant it is, and the easier it should be. I believe it should be feasible exactly because of that correlation. The examples you present are largely irrelevant to the problem, because they are largely irrelevant to the citing of sources for copyright reasons.


>> Those exact words appear in another forum post from 2006. Should you have quoted that as a source for your reply? What if we knew you'd read that post back in 2006, affecting your neurons, then should you?

> I believe they should be able to, to the degree that their output can constitute copyright infringement.

But not you? The inference behind the AI-violates-copyright movement is that machine obligations should be brought to a parity with our obligations - that AI and you be fully subject to the same copyright overlordship.

I would independently agree that having AI divulge sources could be a good thing.

I do not agree with this attempt to twist copyright into yet another misshapen hammer, so copyright holders can bludgeon out some result they want.



No legal precedent has been set as of yet. The "precedent" you describe is the argument AI companies have been using (that training their models on information available on the Internet should be considered "fair use") but whether AI training actually satisfies the four-factor test for fair use remains to be seen.


It's a null question. Training itself is neither publication nor distribution, so copyright can't be relevant at that point. "Fair use" just isn't a concept applicable to training.


Storing copyright content itself can sometimes be illegal - like ripping a Bluray. What if these frames are now stored on their servers and go into the training dataset?


Training stores a variation of the source material, which is arguably distribution. And selling the result or selling access to it certainly is. So fair use applies, and hoping a court thinks the process is transformative to count as fair use. Given original material can be spat out, my money is on a court thinking this is about as transformative as a compression algorithm.


Exactly. Framing reading as fair use is a huge and dangerous expansion of copyright.


Wonder. Do Cliff Notes have to pay royalties to the underlying material?

Cliff Notes contain quotes, and citations.

Does the cliff note company, when producing Cliff Notes for "Into The Wild", pay royalties to the publisher?

For that matter, does any paper, article, etc.. that may contain a quote from another, have to pay royalties to the source of the quotes?



Cliff’s Notes has a strong fair use claim, because they offer basic criticism and surface-level commentary alongside their summaries.


They also, arguably, add value to the books themselves.


We need clearer laws that only apply to Generative AI. Too many comparisons and parallels are being drawn to actual people. "Like what if someone learned how to draw by watching trademarked material, and then accidentally produced it" But these models aren't people and they exist in a category of their own.

I do think it's somewhat trademark infringement by these models, also that it should be allowed and that ultimate responsibility should be on the person using the images in a final work meant for consumption by the general public as stand alone media.



That's where I'm at. Dall+E spitting out C3PO should be entirely ok, unless I'm making money with the output, Disney should pound sand.


How does "unless I'm making money with the output" not apply to openai as well? They make money on the output.


Put that c3p0 on a website that gets revenues from views and someone is getting paid.


Ok, sure, but that's not a GenAI thing, that's a plain old boring copyright thing. If I draw a bunch of C3POs and slap them on my Adwords website then I can expect a C&D letter post haste, who cares if the material in question came out of my pen, Photoshop or a GenAI model?


If the model was trained on works by artists (without their knowledge or consent, as seems to be the case) and you get it to spit out art that is basically identical in either content or style to that artist, and they don’t know, or are too poor to effectively sue you, should they just suffer? If you then make money off what is effectively their work, why shouldn’t they get paid? If they only work on commission and rightfully charge a premium, are you not actively gouging their business (knowingly or not)?

I don’t think they should miss out on the protections, or the ability to make money off their work if they desire. The fact that LLM’s give this “plausible deniability” shouldn’t be an excuse to tolerate it.



Style isn't protected by copyright. Maybe there's an argument that it should be, but right now that's not a protection which exists.

Training is neither publication nor distribution, so copyright is entirely out of scope at that step. Again, maybe there's a moral argument for some sort of control, but copyright is completely the wrong framework to think about it in.



I am beginning to think that in these discussions these models are functioning more like an obscuring factor than anything else and the discussion is getting bogged down in that, and not the crux of the argument.

They’re giving people plausible deniability in the “chain of responsibility”, and I think if we took away “LLM” and replaced it with “fairground sideshow magic box” the argument that LLM’s are somehow special and deserving of exemptions disappears real quick.



I completely agree.

Betamax says that a technology which has significant non-infringing uses is not inherently infringing.

We've already got precedent saying that AI generated works don't accrue copyright protection, and by the same argument the act of generation by the AI expresses no intent, so infringement or otherwise must be down to the human using the output because the black box itself has no agency.



I agree, and I would prefer to see concrete examples of LLMs being used productively and profitably in the industry in a "disruptive" manner--putting people out of work, etc--before we conclude they're somehow the next big thing. Basically, before claiming LLMs (or generative techniques, more generally) mean that we're on the doorstep of "general" intelligence, show me door!

The outline of that door might look like industrial adoption of these things for solving some actual problem other than the entertainment value of typing things into the box and seeing what comes out the other side. But so far, as far as I can tell, nobody's actually doing this?



> ...nobody's actually doing this?

I think you're right.

I am a programmer and I use GPT occasionally, and I even pay 20 bucks a month (for now), but even for my job it's not a not a world-shattering improvement.

> ... the entertainment value of typing things into the box and seeing what comes out ...

I would only add that in a consumer society like ours, entertainment is important. Changes to entertainment seem to have, like, weird ripple effects. Not the knock-down economic disruptions that AI is promising, but I kind of think LLMs are just going to make our culture weirder. I can't anticipate how, but having a bunch of little LLM-powered daemons buzzing around the internet is just gonna be freaky.



> I am a programmer and I use GPT occasionally, and I even pay 20 bucks a month (for now), but even for my job it's not a not a world-shattering improvement.

I am also a programmer, and when I think about the amount of time I actually spend typing out code, even on a great day where all the stars have aligned just right and I can really bang out some code that's like... idk, 30-50% of my time? Usually it's much less, and I'm doing things like reading documentation, reading code, talking to people, etc. So it's hard to imagine Copilot or whatever making me much more effective at my job, as it can really only help with a fraction of it.

I could see someone making the assumption that being able to delegate programming tasks to a robot assistant might make them more productive, but often I find that I don't really understand a problem fully until I'm in the weeds solving it--by which I mean I haven't specified it completely until I've finished the implementation and written the tests. So I don't know to what extent being able to specify and delegate would really help me be more productive.

> having a bunch of little LLM-powered daemons buzzing around the internet is just gonna be freaky.

Yeah, they're not super cheap though so they need to get actual work done otherwise there's no reason to run them. Unlike blockchains, they don't have a pyramid scheme holding them up.



Related ongoing thread:

NY times is asking that all LLMs trained on Times data be destroyed - https://news.ycombinator.com/item?id=38816944 - Dec 2023 (93 comments)

Also:

NY Times copyright suit wants OpenAI to delete all GPT instances - https://news.ycombinator.com/item?id=38790255 - Dec 2023 (870 comments)

NYT sues OpenAI, Microsoft over 'millions of articles' used to train ChatGPT - https://news.ycombinator.com/item?id=38784194 - Dec 2023 (84 comments)

The New York Times is suing OpenAI and Microsoft for copyright infringement - https://news.ycombinator.com/item?id=38781941 - Dec 2023 (861 comments)

The Times Sues OpenAI and Microsoft Over A.I.’s Use of Copyrighted Work - https://news.ycombinator.com/item?id=38781863 - Dec 2023 (11 comments)



I did an interesting thing and looked at how well the Llama2 models could compress text. For example, I took the first chapter of the first Harry Potter book and recorded the index of the 'correct' predicted token. The original text, compressed with 7zip (LZMA?) to about 14kB. The Llama2 encoded indexes compressed to less than 1kB. Then, of course, I can send that 1kB file around and decode the original text. (Unless the model behaves differently on different hardware, which it probably does)

What I get from this is that Llama2 70B contains 93% of Harry Potter Chapter 1 within it. It's not 100% (which would mean no need to share the encoded indices) but it's still pretty significant. I want to repeat this with the entire text of some books, the example I picked isn't representative because the text is available online on the official website.



While I don't disagree that these models seem to contain the ability to recreate copyrighted text, I don't think your conclusion holds. How well does zstd compress Harry Potter with a dictionary based on English prose? I think you'll get some impressive ratios, and I also think there's nothing infringing in this case.


What it tells you is that 93% of the information is sufficiently shared with the rest of the English language such that it can be pulled out into a shared codebook. LZMA doesn't have a codebook, not really.

In other words it's not that llama2 contains 93% of Chapter 1, it's that only 7% of Chapter 1 is different enough to anything else to be worth encoding in its own right.



Couldn't you use the same argument to reach the absurd conclusion that the 7zip source code contains the vast majority of Harry Potter?

A decent control would be to compare it to similar prose that you know for a fact is not in the training data (e.g. because it was written afterwards).



I think the same argument would have to compare 7zip's compression to some other compression algorithm. Then we can say things like "7zip is a better/worse model of human writing". And that's probably a better way to talk about this as well.

You're right that a better baseline could be made using books not in the training set, to understand how much is the model learning prose and how much is learning a specific book.



I wonder what the loss would be for 'translated into Finnish'? Translations between just about any human languages will contain less than 100% of the original.


This is a little confusing. You turned the text into indices? So numbers? Then compressed that? Or the text as numbers without any extra compression is only 1kb?

The tokenizer the models use,(sentence piece) is more or less based on one way to do compression.(bpe). It's not really clear what your testing.



My reading is that at each generation step they ordered all possible next words by the probability assigned to them by the model and recorded the index of the true next word (so if the model was very good at predicting Harry Potter their indices would mostly be 0, 0, 0, ...).


This is correct


The generative AI rollout has taught me what happens when the interests of the many intersect with the destruction of the few.

You get steamrolled for defending yourself while you overhear above applause to those who have robbed you of your future.



It makes no sense that one is not allowed to make and market a CG Mario movie, but suddenly if you use AI to launder the data it's suddenly ok.


I'm pretty sure if you tried to sell a CG Mario movie Nintendo would sue you into oblivion, and "the neural network did it" would not be considered a good defense by anybody, including the judge and jury.


Sure, but making it possible for the neural network to make the movie (eventually in seconds) is somehow ok? So people can make their own private CG Mario films, as long as they don't try to sell them?

Here's my argument - even if the NN only makes the films for private consumption, eventually they'll be so widespread and fast at making them that won't matter, since everyone will be able to watch Mario movies of their own. Is that a future you think will sit well with Nintendo, Disney, etc?



I don't really care if it sits well with them. do you? In that future, why do we need them? They are already parasites feeding off our collective societal stories. Or did you think Disney came up with all those characters? Maybe the original creator of Snow White should sue.


Oh my lord surely you know Disney has no copyright on Snow White...


> So people can make their own private CG Mario films, as long as they don't try to sell them?

Yes, why not? This is just computer assisted private fanfic.



I think there's a difference between being allowed to draw mario vs being allowed to draw and sell mario.

It's completely legal for me to draw mario for my own purposes. It should be legal for me to make an ai draw mario for my own purposes.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com