(评论)
(comments)

原始链接: https://news.ycombinator.com/item?id=43465333

这个Hacker News帖子讨论了一篇文章,该文章声称OpenAI认为需要使用受版权保护的材料才能在AI发展中取得成功。评论者们就用于训练的数据抓取的伦理和法律问题展开了辩论,并将其与谷歌搜索的做法进行了类比。一些人认为现行的版权法阻碍了AI的进步,尤其与可能忽视这些法律的中国等国家相比。一个关键的担忧是遵守版权法的美国公司面临的竞争劣势。一位评论者建议建立一个特许权使用费制度,让AI公司向一个惠及所有美国人的基金缴款。其他人则强调了公平补偿内容创作者的难度以及AI可能破坏现有商业模式的可能性。一些人表示侵犯版权并不等于偷窃,这听起来像是支持版权的宣传,这有悖于艺术和科学的进步。最终,该帖子突出了版权保护和AI发展之间的紧张关系,以及缺乏公平解决方案的共识。


原文
Hacker News new | past | comments | ask | show | jobs | submit login
[flagged] OpenAI Says It's "Over" If It Can't Steal All Your Copyrighted Work (futurism.com)
36 points by raju 1 hour ago | hide | past | favorite | 29 comments










What I don't understand is why this is always presented as a "race" that "we" have to win or else. It's just such a strange framing to me and every time I see it, it's presented as some sort of self-evident truth, but I don't think it's self-evident at all.


I suppose if your market hegemony, social and political rep depends on running around trying to frame everything you do as the new world critical/space race so that people will happily point a firehose of money into your pocket it makes sense. Frame it with the mindset of someone who desperately wants to be treated like a super genius and sees a path towards this goal while making obscene amounts of money and it makes sense.

EDIT: words



It sounds like government continuing to honor the property rights of everyone is getting in the way of a handful of rich people's desire to take all that value for themselves.


By this logic Google Search couldn't exist. Except that Google won those cases.


How is Google search breaking copywrite?


Exact same way OpenAI does: by scraping data, ingesting it, processing it, incorporating it into its proprietary system, and using it to serve responses to queries.

This is not to say that I thing that any of this is wrong. I think that if what Google or OpenAI do is illegal, then the law is wrong, not Google or OpenAI.



Last I checked Google is not buying or pirating books for Google Search they just grab free data that has been provided.


What do you mean by “provided”, exactly? Just because something can be accessed via HTTP GET request doesn’t mean it’s even legal to do so, and does not give you an implicit license to do with it whatever you want. Google, in fact, will happily, scrape, index, and serve queries from PDFs of illegally pirated copyrighted books.



ChatGPT is a substitute to the original work, Google redirects to the original work.

(and yes,the synthesis on the top of the page is a problem, I agree)



Is it the exact same though?


Sorry, do you have actual point, or are just trying to be pedantic? Strictly speaking, it’s not exact same, because no two different things are exact same, by definition. However, my point is that the same principle should apply to both.


The text preview underneath the search result is one thing I remember being contentious (some news websites in France took Google to court and won if I recall correctly)

For mostly the same reasons people are against AI. If you read that text, sometimes there’s no reason to visit the website, which ‘deprives’ website owners and the content creators of ad revenue they would have gotten if Google hadn’t copied the text from their website.

After all, the news is the same regardless of if it’s written on the Google result preview or on the news website itself.



Some good news for a change!


Why is this flagged?


So basically, we know China is never going to pay the publishers/content creators (never). If we hold our principles to OpenAI (pay who you took from), they will go bankrupt. So of course they are speaking in end-game language. To suggest the race is lost even before it starts is an incredible thing.

How is it that we can theorize that the model would get better with more data, but we can't theorize that the business model would need to get bigger (pay the content creators) to train the model? Shoot first and ask questions later (or rather, BEG later).



You know, there's a creative third way which the US could approach if it had the cajones.

Allow OpenAI and other AI companies to use all data for training, but require that they pay it forward by charging royalties on profits beyond X amount of profit, where X is a number high enough to imply true AGI was reached.

The royalties could go into a fund that would be paid out like social security payments for every American starting when they were 18 years old. Companies could likewise request a one time deferred payment or something like that.

It's having your cake and eating it. Also helping ease some tensions around job loss.

Sadly, what we'll likely get is a bunch of tech leaders stumbling into wild riches, hoarding it, and then having it taken from them by force after they become complacent and drunk on power without the necessary understanding of human nature or history to see why they've brought it on themselves.



Not to be funny on purpose, but we are having discussions in America currently on if we should finance aid for poverty and the like. I love your idea though.


It's always interesting to see how the title of a HN post radically changes the people who comment and vote. The AI friendly people are being carpet bombed by haters, but in a model release thread the haters would be flagged to oblivion.


You have to remember a company is not a social being with balanced obligations. Its obligation is to its owners and not to society.

If OpenAI’s leadership weren’t saying precisely this, they wouldn’t be doing their jobs.



I think we just need to rethink copyright for language models. I'm okay just licensing 1 copy of a work to any LLM model throughout its various generations. Just don't pirate it if no special license is available, buying the ebook should suffice. It should be no different from a human buying a copy. The rule should only be that it does not leak the entire work.


I'm not OK with that, though... and here we have the nut of the problem. There is no agreement as to what's acceptable and what's not.

I personally think that the odds of me me being able to both publicly publish my words and code and be able to keep them out of training data is pretty close to zero. Since that's unacceptable to me, my only option is not to publish that stuff at all.



Yes, well, in a way they're right and I suspect everyone here knows it no matter how and mighty they might want act when commenting. When foreign (here 'Chinese') competition just ignores copyright laws while 'western' companies have to abide by them for every piece of data they use to train their models the former will have a clear advantage over the latter. This also happens to be how the USA acted in the 1800s [1]:

the United States declined an invitation to a pivotal conference in Berne in 1883, and did not sign the 1886 agreement of the Berne Convention which accorded national treatment to copyright holders. Moreover, until 1891 American statutes explicitly denied copyrights to citizens of other countries and the United States was notorious in the international sphere as a significant contributor to the "piracy" of foreign literary products. It has been claimed that American companies for the most part "indiscriminately reprinted books by foreign authors without even the pretence of acknowledgement" (Feather, 1994, 154). The tendency to freely reprint foreign works was encouraged by the existence of tariffs on imported books that ranged as high as 25 percent (see Dozer, 1949).

[1] http://socialsciences.scielo.org/scielo.php?script=sci_artte...





The product requires crime? I feel like most products do not require crime. This is not a good sales pitch.


Either that, or copyright law is bad in its current form and LLM’s are yet an example of what exposes that.

Even if copyright owners can’t point to how much damage, if any, they suffer from AI, it’s seen as wrong and bad. I think it’s getting boring to hear that story about copyright repeat itself. In most crimes, you need to be able point to a damage that was done to you.

Also, while there are edge cases in some LLM’s where you can make them spew some verbatim training material, often through jailbreaks or whatnot, an LLM is a destructive process involving ”fuzzy logic” where the content is generally not perfectly memorized, and seems no more of a threat to copyright than recording broadcasts onto cassette tapes or VHS were back in the day. You’d be insane to use that stuff as a source of truth on par with the original article etc.



More like, it's interesting that big tech companies can create extremely elaborate copyright assignment, metering and payout mechanisms when it's in their interest - right down to figuring out who owns 30 seconds of incidental radio music that plays in the background during someone's speedrun video.

But for other classes of user generated content, the problem is suddenly "impossible".



something tells me that this pathetic messaging approach is not going to be the one that squares the circle between "piracy is illegal" and "information wants to be free"


Sorry but it is actually a huge problem for the US if the DeepSeek models are able to train on sorta-illegal dumps of scientific papers and US models aren't. The ones that are paywalled by scientific journals.

Everyone WILL start using hosted frontier Chinese models if they are demonstrably better at answering scientific questions than ChatGPT, sending essentially all US research questions into a Chinese data dump. This is even worse than the national security catastrophe that is TikTok (even aside from the EVEN BIGGER issue that China will have models that are staggeringly better than those in the US, because they are up to date on the science).

I understand the reflexivity against AI companies "stealing content" but we need to stay competitive and figure out the financial compensation later. This is not a case where our unbelievably generous copyright laws should take precedence over US competitiveness.



Copyright infrigement is not stealing[0]. The person still has what they made. Not sure why they propigate it as theft. Seems like a pro copyright propaganda extremisit article which goes significantly against progress of advancements for arts and sciences.

[0]https://en.m.wikipedia.org/wiki/Dowling_v._United_States_(19...







Join us for AI Startup School this June 16-17 in San Francisco!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact



Search:
联系我们 contact @ memedata.com