OpenAI Says It's "Over" If It Can't Steal All Your Copyrighted Work

JohnFen · 2025-03-24T21:50:41 1742853041

What I don't understand is why this is always presented as a "race" that "we" have to win or else. It's just such a strange framing to me and every time I see it, it's presented as some sort of self-evident truth, but I don't think it's self-evident at all.

Avicebron · 2025-03-24T22:00:07 1742853607

I suppose if your market hegemony, social and political rep depends on running around trying to frame everything you do as the new world critical/space race so that people will happily point a firehose of money into your pocket it makes sense. Frame it with the mindset of someone who desperately wants to be treated like a super genius and sees a path towards this goal while making obscene amounts of money and it makes sense.

EDIT: words

neilv · 2025-03-24T21:40:47 1742852447

It sounds like government continuing to honor the property rights of everyone is getting in the way of a handful of rich people's desire to take all that value for themselves.

ls612 · 2025-03-24T21:42:25 1742852545

By this logic Google Search couldn't exist. Except that Google won those cases.

adammarples · 2025-03-24T21:45:57 1742852757

How is Google search breaking copywrite?

xyzzyz · 2025-03-24T21:51:09 1742853069

Exact same way OpenAI does: by scraping data, ingesting it, processing it, incorporating it into its proprietary system, and using it to serve responses to queries.

This is not to say that I thing that any of this is wrong. I think that if what Google or OpenAI do is illegal, then the law is wrong, not Google or OpenAI.

fffernan · 2025-03-24T21:56:40 1742853400

Last I checked Google is not buying or pirating books for Google Search they just grab free data that has been provided.

xyzzyz · 2025-03-24T22:01:13 1742853673

What do you mean by “provided”, exactly? Just because something can be accessed via HTTP GET request doesn’t mean it’s even legal to do so, and does not give you an implicit license to do with it whatever you want. Google, in fact, will happily, scrape, index, and serve queries from PDFs of illegally pirated copyrighted books.

Iulioh · 2025-03-24T21:54:13 1742853253

ChatGPT is a substitute to the original work, Google redirects to the original work.

(and yes,the synthesis on the top of the page is a problem, I agree)

paulcole · 2025-03-24T21:52:33 1742853153

Is it the exact same though?

xyzzyz · 2025-03-24T21:54:47 1742853287

Sorry, do you have actual point, or are just trying to be pedantic? Strictly speaking, it’s not exact same, because no two different things are exact same, by definition. However, my point is that the same principle should apply to both.

netruk44 · 2025-03-24T21:49:40 1742852980

The text preview underneath the search result is one thing I remember being contentious (some news websites in France took Google to court and won if I recall correctly)

For mostly the same reasons people are against AI. If you read that text, sometimes there’s no reason to visit the website, which ‘deprives’ website owners and the content creators of ad revenue they would have gotten if Google hadn’t copied the text from their website.

After all, the news is the same regardless of if it’s written on the Google result preview or on the news website itself.

basisword · 2025-03-24T21:39:57 1742852397

Some good news for a change!

slowtrek · 2025-03-24T22:00:36 1742853636

Why is this flagged?

slowtrek · 2025-03-24T21:50:16 1742853016

So basically, we know China is never going to pay the publishers/content creators (never). If we hold our principles to OpenAI (pay who you took from), they will go bankrupt. So of course they are speaking in end-game language. To suggest the race is lost even before it starts is an incredible thing.

How is it that we can theorize that the model would get better with more data, but we can't theorize that the business model would need to get bigger (pay the content creators) to train the model? Shoot first and ask questions later (or rather, BEG later).

whatthedangit · 2025-03-24T21:54:49 1742853289

You know, there's a creative third way which the US could approach if it had the cajones.

Allow OpenAI and other AI companies to use all data for training, but require that they pay it forward by charging royalties on profits beyond X amount of profit, where X is a number high enough to imply true AGI was reached.

The royalties could go into a fund that would be paid out like social security payments for every American starting when they were 18 years old. Companies could likewise request a one time deferred payment or something like that.

It's having your cake and eating it. Also helping ease some tensions around job loss.

Sadly, what we'll likely get is a bunch of tech leaders stumbling into wild riches, hoarding it, and then having it taken from them by force after they become complacent and drunk on power without the necessary understanding of human nature or history to see why they've brought it on themselves.

slowtrek · 2025-03-24T21:58:25 1742853505

Not to be funny on purpose, but we are having discussions in America currently on if we should finance aid for poverty and the like. I love your idea though.

CuriouslyC · 2025-03-24T21:54:48 1742853288

It's always interesting to see how the title of a HN post radically changes the people who comment and vote. The AI friendly people are being carpet bombed by haters, but in a model release thread the haters would be flagged to oblivion.

cadamsdotcom · 2025-03-24T21:52:17 1742853137

You have to remember a company is not a social being with balanced obligations. Its obligation is to its owners and not to society.

If OpenAI’s leadership weren’t saying precisely this, they wouldn’t be doing their jobs.

giancarlostoro · 2025-03-24T21:42:20 1742852540

I think we just need to rethink copyright for language models. I'm okay just licensing 1 copy of a work to any LLM model throughout its various generations. Just don't pirate it if no special license is available, buying the ebook should suffice. It should be no different from a human buying a copy. The rule should only be that it does not leak the entire work.

JohnFen · 2025-03-24T21:55:54 1742853354

I'm not OK with that, though... and here we have the nut of the problem. There is no agreement as to what's acceptable and what's not.

I personally think that the odds of me me being able to both publicly publish my words and code and be able to keep them out of training data is pretty close to zero. Since that's unacceptable to me, my only option is not to publish that stuff at all.

hagbard_c · 2025-03-24T21:51:49 1742853109

Yes, well, in a way they're right and I suspect everyone here knows it no matter how and mighty they might want act when commenting. When foreign (here 'Chinese') competition just ignores copyright laws while 'western' companies have to abide by them for every piece of data they use to train their models the former will have a clear advantage over the latter. This also happens to be how the USA acted in the 1800s [1]:

the United States declined an invitation to a pivotal conference in Berne in 1883, and did not sign the 1886 agreement of the Berne Convention which accorded national treatment to copyright holders. Moreover, until 1891 American statutes explicitly denied copyrights to citizens of other countries and the United States was notorious in the international sphere as a significant contributor to the "piracy" of foreign literary products. It has been claimed that American companies for the most part "indiscriminately reprinted books by foreign authors without even the pretence of acknowledgement" (Feather, 1994, 154). The tendency to freely reprint foreign works was encouraged by the existence of tariffs on imported books that ranged as high as 25 percent (see Dozer, 1949).

[1] http://socialsciences.scielo.org/scielo.php?script=sci_artte...

ChrisArchitect · 2025-03-24T21:56:29 1742853389

Discussion: https://news.ycombinator.com/item?id=43352531

josefritzishere · 2025-03-24T21:34:14 1742852054

The product requires crime? I feel like most products do not require crime. This is not a good sales pitch.

jug · 2025-03-24T21:48:01 1742852881

Either that, or copyright law is bad in its current form and LLM’s are yet an example of what exposes that.

Even if copyright owners can’t point to how much damage, if any, they suffer from AI, it’s seen as wrong and bad. I think it’s getting boring to hear that story about copyright repeat itself. In most crimes, you need to be able point to a damage that was done to you.

Also, while there are edge cases in some LLM’s where you can make them spew some verbatim training material, often through jailbreaks or whatnot, an LLM is a destructive process involving ”fuzzy logic” where the content is generally not perfectly memorized, and seems no more of a threat to copyright than recording broadcasts onto cassette tapes or VHS were back in the day. You’d be insane to use that stuff as a source of truth on par with the original article etc.

xyzzy123 · 2025-03-24T21:54:22 1742853262

More like, it's interesting that big tech companies can create extremely elaborate copyright assignment, metering and payout mechanisms when it's in their interest - right down to figuring out who owns 30 seconds of incidental radio music that plays in the background during someone's speedrun video.

But for other classes of user generated content, the problem is suddenly "impossible".

scudsworth · 2025-03-24T21:50:13 1742853013

something tells me that this pathetic messaging approach is not going to be the one that squares the circle between "piracy is illegal" and "information wants to be free"

bpodgursky · 2025-03-24T21:51:57 1742853117

Sorry but it is actually a huge problem for the US if the DeepSeek models are able to train on sorta-illegal dumps of scientific papers and US models aren't. The ones that are paywalled by scientific journals.

Everyone WILL start using hosted frontier Chinese models if they are demonstrably better at answering scientific questions than ChatGPT, sending essentially all US research questions into a Chinese data dump. This is even worse than the national security catastrophe that is TikTok (even aside from the EVEN BIGGER issue that China will have models that are staggeringly better than those in the US, because they are up to date on the science).

I understand the reflexivity against AI companies "stealing content" but we need to stay competitive and figure out the financial compensation later. This is not a case where our unbelievably generous copyright laws should take precedence over US competitiveness.

nadermx · 2025-03-24T21:47:13 1742852833

Copyright infrigement is not stealing[0]. The person still has what they made. Not sure why they propigate it as theft. Seems like a pro copyright propaganda extremisit article which goes significantly against progress of advancements for arts and sciences.

[0]https://en.m.wikipedia.org/wiki/Dowling_v._United_States_(19...

(评论) (comments)

(评论)
(comments)