I Fear for the Unauthenticated Web

Aurornis · 2025-03-20T15:56:38 1742486198

Rate limiting is the first step before cutting everything off behind forced logins.

> This practice started with larger websites, ones that already had protection from malicious usage like denial-of-service and abuse in the form of services like Cloudflare or Fastly

FYI Cloudflare has a very usable free tier that’s easy to set up. It’s not limited to large websites.

blibble · 2025-03-20T15:58:30 1742486310

I get the feeling that I'm going to read a blog post in a few years telling us that the CDN companies have been selling everything pulled through their cache to the AI companies since 2022

nottorp · 2025-03-20T16:00:48 1742486448

And even if they don't, is everything depending on Cloudflare to stay online a good thing?

jmclnx · 2025-03-20T15:52:18 1742485938

I would think all you need to do is add a copyright statement of some kind.

Sad things are getting to this point. Maybe I should add this to my site :)

(c) Copyright (my email), if used for any form of LLM processing, you must contact me and pay 1000USD per word from my site for each use.

jefftk · 2025-03-20T16:00:59 1742486459

It's reasonably likely, but not yet settled, that LLM training falls under fair use and doesn't require a license. This is what the https://githubcopilotlitigation.com/ class action (from 2022) is about, and its still making its way through the court. This prediction market has it at 12% likely to succeed, suggesting that courts will not agree with you: https://manifold.markets/JeffKaufman/will-the-github-copilot...

Aurornis · 2025-03-20T15:59:47 1742486387

Copyright is for topics like redistribution of the source material. You can’t add arbitrary terms to a copyright claim that go beyond what copyright law supports.

I think you’re confusing copyright with a EULA. You would need users to agree to the EULA terms before viewing the material. You can’t hide contractual obligations in the footer of your website and call it copyright.

JohnFen · 2025-03-20T15:59:37 1742486377

Such a notice is legally meaningless, though. Doubly so if the courts rule that scraping for AI purposes counts as fair use.

jasperr1 · 2025-03-20T15:53:42 1742486022

The reality is that a lot of these small websites have very permissive licenses. I really hope we don't get to the point where we must all make our licenses stricter.

krapp · 2025-03-20T15:55:58 1742486158

The reality is that none of these LLM scrapers give a damn about copyright, because the entire AI industry is built on flagrant copyright violation, and the premise that they can be stopped by a magic string is laughable.

You could sue, if you can afford it, meanwhile all of your data is already training their models.

JKCalhoun · 2025-03-20T15:57:02 1742486222

For some reason I am not really moved by a lot of the hand wringing I am seeing lately.

It's a not a binary thing to me: LLMs are not god, but even without AGI, they have proven wildly useful to me. Calling them "shitty chat bots" doesn't sway me.

Further I have always assumed that everything that I post to the web is publicly accessible to everyone/everything. We lost any battle we thought we could wage some 2+ decades ago when web crawlers started hoovering up data from our sites.

ToucanLoucan · 2025-03-20T15:53:11 1742485991

Yet another entry in the long and shameful history of Silicon Valley abusing the public square for their own profit (or in this case, fantasies of profit) and the rest of us just have to learn to live with it because the justice system simply will not even try and give us recourse.

Move fast and break things apparently has a bonus clause for the things you break not being your responsibility to fix.

Analemma_ · 2025-03-20T16:00:10 1742486410

I don't think the justice system is the one to blame here. Right up until LLMs and their huge datamining operations appeared, everyone in tech was strongly for unrestricted scraping. Everybody here cheered the LinkedIn decision [0], saying "it's on the public web: if you didn't want it to be scraped, you should've put it behind authentication". LLMs change nothing about the legal landscape, they've just convinced everyone on an emotional level that unrestricted scraping is no longer an automatic good. It's not the justice system's job to react to such vibe shifts, the laws themselves have to be changed.

[0]: https://news.ycombinator.com/item?id=15012883

（评论） (comments)

（评论）
(comments)