（评论）

（评论）
(comments)

原始链接: https://news.ycombinator.com/item?id=41109926

网络抓取涉及自动化软件程序未经许可从网站提取数据。这种做法通常是由公司为了自身利益而收集大量数据而进行的。然而，它可能会导致服务器负载过重、违反服务协议条款和侵犯版权等问题。 Linux ISO 等开源项目允许共享其数据，但在网络抓取过程中过度使用此类项目可能会导致性能问题。使用人工智能 (AI) 生成用于建模目的的合成数据引起了人们对此类生成数据的质量和可靠性的担忧。从长远来看，为合适的数据集购买商业许可证通常会更便宜。如果抓取公开信息需要经过复杂的采购流程或获得管理层批准，员工可能会选择更简单的网络抓取途径，尽管可能会产生法律后果。政府用纳税人的钱收集和产生大量数据。理想情况下，公众应该能够以最低成本或免费轻松获取这些信息。私人实体试图通过网络抓取行为垄断政府资助的数据可能被视为剥削行为。网络抓取引起的另一个担忧是对较小的内容创作者的影响，他们的作品被无偿提取和使用。随着互联网上人工智能驱动的应用程序越来越多，网络抓取的便利性以及由此产生的数据滥用变得越来越严重。

Affected companies are becoming increasingly frustrated with the army of AI crawlers out there as they won't stick to any scraping best practices (respect robot.txt, use public APIs, no peak load). It's not necessarily about copyright, but the heavy scraping traffic also leads to increased infra costs.

What's the endgame here? AI can already solve captchas, so the arms race for bot protection is pretty much lost.

The idea is not to make scraping impossible, but to make it expensive. A human doesn't make requests as fast as a bot, so the pretend human is still rate limited. Eventually, you need an account, and tracking of that also happens, and accounts matching specific patterns get purged, and so on. This will not stop scraping, but the point is not to stop it, but to make it expensive and slow. Eventually, expensive enough that it might be better off to not pretend to be a human, pay for a license, and then the arms race goes away.

Can defenses be good enough it's better to not even try to fight? It's a far harder question than wondering if a random bot can make a dozen requests pretending to be human

I liked the analogy to Gabe Newell's "piracy is a service problem" adage, embodied in Virgin API consumer vs Chad third-party scraper https://x.com/gf_256/status/1514131084702797827

Make it easier to get the data, put less roadblocks in the way for legitimate access, and you'll find fewer scrapers. Even if you make scraping _very_ hard, people will still prefer scraping if legitimate use is even more cumbersome than scraping, or you refuse to even offer a legitimate option.

Admittedly, we are talking here because some people are scraping OSM when they could get the entire dataset for free... but I'm hoping these people are outliers, and most consume the non-profit org's data in the way they ask.

Well, it isn't a case of piracy, is it? The data exists on the website, for free, under the assumption/social contract that you are a human, not an agent of a shady enterprise wasting the bandwidth. An analogy would be the game itself being put out for free on itch.io, but then downloaded and unpacked to make an asset flip.

Seems to me eventually we might hit a point where stuff like api access is whitelisted. You will have to build a real relationship with a real human at the company to validate you aren’t a bot. This might include in person meeting as anything else could be spoofed. Back to the 1960s business world we go. Thanks, technologists, for pulling the rug under us all.

Scraping implies not API - they're accessing the site as a user agent. And whitelisting access to the actual web pages isn't a tenable option for many websites. Humans generally hate being forced to sign up for an account before they can see this page that they found in a Google search.

Scraping often uses the same APIs that the website itself does, so to make that work a lot of sites will have to put their content around authentication of some sort.

For example, I have a project that crawls the SCP Wiki (following best practices, ratelimiting, etc). If they were to restrict the API that I use it would break the website for people, so if they do want to limit the access they have no choice but to instead put it behind some set of credentials that they could trace back to a user and eliminate the public site itself. For a lot of sites that's just not reasonable.

I don't know if the AI's have an endgame in mind. As for the humans, I think it's an internet built for a dark forest. We'll stop assuming that everything is benign except for the malicious parts which we track and block. Instead we'll assume that everything is malicious except for the parts which our explicitly trusted circle of peers have endorsed. When we get burned, we'll prune the trust relationship that misled us, and we'll find ways to incentivize the kind of trust hygiene necessary to make that work.

When I compare that to our current internet the first thought is "but that won't scale to the whole planet". But the thing is, it doesn't need to. All of the problems I need computers to solve are local problems anyway.

Arguably, trying to scale everything to the whole planet is the root cause of most of these problems. So "that won't scale to the whole planet" might, in the long view, be a feature and not a bug.

Right. If your use case for the internet is exerting influence over people who don't trust you, then it's past time that we shut you down anyhow.

For everyone else, this transition will not be a big deal (although your friends may ask you to occasionally spend a few cycles maintaining your part of a web of trust, because your bad decisions might affect them more than they currently do).

API-based interactions w/ Authentication.

Websites previously would have their own in-house API to freely deliver content to anyone who requests it.

Now, a website should be a simple interface for a user that communicates with an external API and display it. It's the user's responsibility to have access to the API.

Any information worth taking should be locked away by Authentication - which has become stupid simple using oAuth w/ major providers.

So these people trying to extract content by paying someone or using a paid service should rather use the API which packages it for them and is fairly priced.

Lastly, robots.txt should be enforced by law. There is no difference from stealing something from a store, and stealing content from a website.

AI (and greed) has killed the open freedoms of the Internet.

There is no reason to protect against bots using regular captchas (Seems like I'm weaker than your average bot in passing those). Brave search has a proof of work captcha and everytime I face it I'm glad it's not google's choose the bicycle one. Having a captcha be a hevy process ran for a couple of seconds might be a nuisance to me who needs to complete it once a day but to the person who has to do it a lot of time for scraping, the costs might add up rather quickly. And the foundamental mechanism of it makes its effectivenes irrelevant to how much progress AI has made.

Also maybe the recent rise in captcha difficulty is not companies making them harder to prevent bots but rather bots twisting the right answer. As I know it captcha works based on other users' answers so if a huge portion of these other users are bots they can fool the alghorithm into thinking their wrong answer is the right answer.

You can rather easily set up semi-hard rate limiting with a proof of work scheme. Will very trivially affect human users, while bot spammers have to eat up the cost of a million hash reversions per hour or whatever.

Many would oppose the idea, but if any service (e.g. eBay, LinkedIn, Facebook) were to dump the snapshot to S3 every month, that could be a solution. You can't prevent scraping anyway.

Would the snapshot contain the same info ( beyound any doubt ) that an actual user would see if they opened LinkedIn/Facebook/Service from Canada on an IPhone at a saturday morning (for example)? If not, the snapshot is useles for some usecases and we are back to scraping.

Yeah, you can get dumps of Wikipedia and stackoverflow/stackexchange that way.

(Not sure if created by the admins or a 3rd party, but done once for many is better than overlapping individual efforts).

An optimistic outcome would be that public content becomes fully peer-to-peer. If you want to download an article, you must seed at least the same amount of bandwidth to serve another copy. You still have to deal with leechers, I guess.

> What's the endgame here?

We've had good success with

- Cloudflare Turnstile

- Rate Limiting (be careful here, as some of these scrapers use large numbers of IP addresses and User Agents)

> AI can already solve captchas, so the arms race for bot protection is pretty much lost.

Require login, then verify the user account is associated with an email address at least 10 yrs old. Pretty much eliminates bots. Eliminates a few real users too, but not many.

I must be an outlier here, but I don't keep email addresses that long. After a couple years they're on too many spam lists. I'll wind those addresses down and use them for a couple years only for short interactions that I expect spam from, and ultimately close then down completely the next cycle.

At best any email I have is 4 or 5 years old.

You are definitely an outlier in that you abandon email addresses deliberately. But many people do not have an old address simply because they lost access to their previous ones for one of many possible reasons, the most common one being that it was provided with a business relationship (e.g. ISP contract) that no longer exists.

That's before even getting into how you'd possibly verify email adress age, especially without preventing self-hosting.

That doesn't seem remotely compatible with modern privacy laws like the GDPR. And it certainly ads even more false negatives of people locked out because they didn't have their email leaked long enough ago.

You can't cache this stuff for bot consumption. Humans only want to see the popular stuff. Bots download everything. The size of your cache then equals the size of your content database.

But you can still make sure that you save the data in a form where generating the served webpage takes the least amount of time. For most websites this means saving the HTML - in a giant cache or with a more deliberate pre-generation setup.

Clearly not the case with most websites. And "milliseconds" are already a huge amount of time. Video games simulate huge worlds and render complex 3D graphics within 16ms or even much less with the >60 framerates that are expected these days.

OpenStreetMap Foundation chairperson here.

OpenStreetMap's data is available for free in bulk from https://planet.openstreetmap.org. We encourage using these instead of scraping our site.

Scraping puts a high load on our donated resources. We block scraping IPs, but even that takes us work and time.

Respecting our time and resources helps us keep the service free and accessible for everyone.

And how exactly do you block scraping IPs? I suspect some of scrapers are just confused and are not aware of better ways to get OSM data.

Responding with a 403 error code will only lead to them changing their IP addresses.

A more effective approach might be to provide a response containing instructions on where to download data in bulk or a link to a guide that explains how to process OSM dumps.

You can literally set up your own OpenStreetMap instance in ten minutes. It's a simple 'docker run'-command. Sure, indexing will take a bit but even that can't take that long given their resources. That's just ridiculously greedy.

A while ago I very briefly tried Headway out of curiosity. This is the easiest Docker based option for the "full stack". It didn't work out of the box. Things went wrong. Which is no surprise, there's a ton of moving parts. And maybe it's not a big deal to work around but I highly doubt that it's 10 minutes of work to get everything working reliably.

I used OSMRouter maybe 7 or 8 years ago to process a couple billion routes and it was about as simple as GP described. Just need Docker and an index file. the US one was massive so I kept needing a bigger VM just to load it, but once I did I was able to make HTTP calls over the network to get what I needed. Took a few days to get a working setup but only a few hours to rip through billions of requests, and I was making them synchronously with R; could have been much faster if I was smarter then.

I needed osm data at one point. Never managed to figure out how to do it the proper way. To get data you need, you need to download massive 100Gb files, in obscure formats, and use obscure libraries. Info is scattered, there are HTTP APIs but they’re limited or rate-limited and it’s not clear if you’re supposed to use them.

I know I’m ignorant and I’m happy the project exists, but the usability in the era where devs expect streamlined APIs is not great.

I ended up using some free project that had pre-transformed osm data for what i needed.

That's kind of by design. Providing streamlined APIs requires a funding model to both host those APIs and pay an army of devops to maintain them. The OSM Foundation is intentionally small and doesn't do that. Rather, it encourages a decentralised ecosystem where anyone can take the data and build services on it - some commercial, some hobbyist, some paid-for, some free. It works really well, and IMO better than the big-budget maximalist approach of the Wikimedia Foundation.

If you're talking about the new-ish data dumps provided in protobuf format, this is a heavily optimised binary format. OrganicMaps uses these files directly to be able to store and lookup whole countries locally. With this format, the dump for France is only 4.3Gb at the time of writing. Also, instead of downloading the whole map, you can use one of the numerous mirrors like Geofabrik [0] to download only the part you're interested in. [0] https://download.geofabrik.de/

What non-obscure formats or libraries would you suggest for a planet's worth of geographic data?

I've also downloaded planet.osm before and parsed it on my desktop with iirc osmosis. Never used that format or tool anywhere else but it's not like OSM has so many competitors offering you large amounts of geospatial data in a freely usable way. What do you considered established mechanisms for this?

On https://www.openstreetmap.org/, click "Export" (upper-left). It lets you choose a small rectangle (click "Manually select a different area"). It gives you a .osm right from the browser.

For literally single point, on the map icons on the right, one is arrow with question mark ("Query features"). With this you can click on single features and get their data.

> I ended up using some free project that had pre-transformed osm data for what i needed.

That seems close enough to "the proper way". The OSM core devs can concentrate on providing the data in the format that existing OSM front ends are optimised to work with; if you want it transformed into some other popular format then it's great that the ecosystem already has free projects that will do that for you.

13-15 years ago I was able to download the OSM data for my country, import it in Postgre (PostGIS), run GIS query on it, then render and print my own maps. I don't remember being difficult, though indeed it required lots of disk space.

Not dissimilar to how instead of just cloning my pre-compressed repos in a simple few seconds operation, "AI" scrappers prefer to request every single revision of every single .c file through the web interface with all the useless (for them) bells an whistles.

Web interface which I have set up as cgi and therefore it will take them longer to finish scrapping than the age of the universe. But in the meanwhile they waste me power and resources.

Put planet.osm on torrent. Allow "scraping" only through torrent. Now scrapers are sharing the network load between themselves. Not to mention improved network speed, as probably they all sit on the same AWS instance.

If it is Russian and Chinese AI companies, they are unlikely to care. If it is Western companies, they might care, because they have more respect to the rules. Faked user agent is usually unfortunately the indication of the former.

> they might care, because they have more respect to the rules

Do they? Didn't OpenAI scrape everything regardless of licence forbidding reuse without attribution or for commercial interest?

Kind of sad that CommonCrawl, or something like it, has not removed the need for tons of different companies hitting all the servers in the world.

I guess part of it is wanting more control (more frequent visits, etc) and part is simply having lots of VC money and doing something they can do to try and impress more investors - "We have proprietary 5 PB dataset!" (literally adds nothing to commoncrawl).

Someone recently pointed out the Aaron Schwartz was threatened with going to prison for scraping, meanwhile there's hundred of billion of dollars right now invested in AI LLMs build from... scraping.

Generally it should be "More powerful entities can scrape you, but you can't scrape them back"

Google scraping JSTOR (hey, don't they do that already with Google Scholar?" is much less of a problem then JSTOR attempting to scrape Google.

Aaron didn’t have an army of lawyers that Megacorps do.

He took the papers and put it public. Blatant copy right violation.

LLMs are in gray waters of derative work, not verbatim copy of original.

Different judges have had varying rulings.

Because - and I cannot stress this enough - they are both ignorant and greedy.

Whenever I've traced back an AI bot scraping my sites, I've tried to enter into a dialogue with them. I've offered API access and data dumps. But most of them are barely above the level of "script kiddies". They've read a tutorial on scraping so that's the only thing they know.

They also genuinely believe that any public information is theirs for the taking. That's all they want to do; consume. They have no interest in giving back.

I don't know that this take is wrong, per se, but I think it's possibly a situation where the "actor with a single mind" model of thinking about corporate behavior fails to be particularly useful.

Scraping tends to run counter to the company's interests, too. It's relatively time-consuming - and therefore, assuming you pay your staff, expensive - compared to paying for an API key or data dump. So when engineers and data scientists do opt for it, it's really just individuals following the path of least resistance. Scraping doesn't require approval from anyone outside of their team, while paying for an API key or data dump tends to require going through a whole obnoxious procurement process, possibly coordinating management of said key with the security team, etc.

The same can be said for people opting to use GPT to generate synthetic data instead of paying for data. The GPT-generated data tends to be specious and ill-suited to the task of building and testing production-grade models, and the cost of constant tweaking and re-generation of the data quickly adds up. Just buying a commercial license for an appropriate data set from the Linguistic Data Consortium might only be a fraction as expensive once you factor in all the costs, but before you can even get to that option you first need to get through a gauntlet of managers who'll happily pay $250 per developer per year to get on the Copilot hype train but don't have the lateral thinking skills to understand how a $6,000 lump sum for a data set could help their data scientists generate ROI.

> Scraping tends to run counter to the company's interests, too. It's relatively time-consuming - and therefore, assuming you pay your staff, expensive - compared to paying for an API key or data dump.

It's the other way around, not? Scraping is fairly generic and requires little staff time. The servers/botnet doing the scraping need to run for a while but are cheap or stolen anyway. APIs on the otherhand are very specific to individual sites and need someone competent to develop a client.

Torrent users who do seed (assuming it’s copyrighted material) are no better. They’re just stealing someone else’s content and facilitating its theft.

If a company scrapes data, and then publishes the data for others to scrape.. they are still part of the problem — the altruism of letting other piggyback from their scraping doesn’t negate that they essentially are stealing data.

Stealing from grocery store and giving away some of what you steal doesn’t absolve the original theft.

> assuming it’s copyrighted material [...] They’re just stealing someone else’s content and facilitating its theft.

All content created by someone is copyrighted by default, but that does not mean it is theft to share it. Linux ISOs are copyrighted, but the copyright allows sharing, for example. But even in cases where this is not permitted, it would not be theft, but copyright infringement.

> the altruism of letting other piggyback from their scraping doesn’t negate that they essentially are stealing data.

It does. OpenStreetMap (OSM) data comes with a copyright licence that allows sharing the data. The problem with scraping is that the scrapers are putting unacceptably load on the OSM servers.

> Stealing from grocery store and giving away some of what you steal doesn’t absolve the original theft.

This is only comparable if the company that scrapes the data enters the data centre and steals the servers used by the OpenStreetMap Foundation (containing the material to be scraped), and the thing stolen from the grocery store also contains some intellectual property to be copied (e.g. a book or a CD, rather than an apple or an orange).

Stealing is not theft because so-called intellectual "property" is not property at all.

If anything, it's copryght holders that are infringing on everyone else's right to free speech by wanting to control the communication between others.

The is a similar argument used to argue against people stealing music and movies — people would pirate content that someone else invested money to create. But the dominant attitude prior to streaming ubiquitousness, among the tech “information should be free crowd” was that torrents of copyright material were perfectly fine. This is no different. But it is different — when it’s your resources that are being stolen/misused/etc.

My opinion is that if you are building a business that relies on someone else’s creation — that company should be paid. This isn’t just about “AI” companies — but all sorts of companies that essentially scour the web to repackage someone else’s data. To me this also includes those paywall elimination tools — even the “non profits” should pay — even if their motives are non-profit, they still have revenue. (A charity stealing food from the grocery store is wrong, a grocery store donating to a charity is a different thing.)

However another aspect of this is government data and data created with government funds — scientific research for example. If a government grant paid for the research, I shouldn’t have to pay Nature to access it. If that breaks the academic publishing model — good. It’s already broken. We shouldn’t have to pay private companies to access public records, lawsuit filings, etc.

Anything they can, judging by the fact that they're hitting random endpoints instead of using those offered to developers. Similar thing happened to readthedocs[1] causing a surge of costs that nobody wants to answer for.

In the readthedocs situation there was one case that was a bugged crawler causing it to try and scrape the same HTML files repeatedly to the tune of 75TB, could also be happening here with OSM (partially).

[1] https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse...

Street names and numbers, businesses etc. associated with those streets, and stuff like that.

Say you have some idea, like...you want to build a tool that aids cargo ships, fishing vessels, or other vessels with the most efficient route (with respect to fuel usage) between ports.

The first thing you need to do, is to map all ports. There may not exist any such pre-compiled list, but you could always use map tools like OSM to scan all coastlines, and see if there are any associated ports, docks, etc. there.

Then when you find one, you save the location, name, and other info you can find.

This is pure brute force, and can naturally be quite expensive for the providers. But since you're a dinky one-man startup with zero funds, that's what you do - you can't be bothered with searching through hundreds (to thousands) of lists in various formats, from various sites, that may contain the info you're looking for.

I can hear the AI groaning about regular humans suddenly caring a lot about IP protection and discussing ways to add DRM to protect it.

I really hope the irony isn't lost on everyone.

The joke is that you can already download it for free, no donation or bandwidth reimbursement needed

https://wiki.openstreetmap.org/wiki/Planet.osm

I guess since it's posted to osm.town Mastodon, this is assumed to be known. Was surprised to see it without context here on HN; I can understand the confusion. Apparently most people here are already aware that one can download the full OpenStreetMap data without scraping

The people working for these companies are just clueless, arrogant, ignorant, unaware of others, just trying to hit some productivity target to get promoted, of course they're not going to bother checking whether there are other ways to do something avoiding annoying open source projects

In my company it’s easier to buy commercial software for $10000 than it is to donate $100 for open source voluntarily. I think they need to open up a store where you can buy donations disguised as licenses so the bean counters don’t even realize this could be free.

Same here. We are all Linux users and people linked articles about openssl being underpaid and all that (back when that was a topic), but after migrating from a paid chat tool to Signal, nobody agreed with me that we should maybe donate to Signal now. Both chat solutions are open source SaaS, but the former tool has a paid subscription for commercial use (which does nothing more than the personal version) whereas Signal calls it a voluntary donation. I still don't understand my colleagues' opinion

Paying thousands of euros for some enterprise Java software as well as Microsoft Office licenses we barely ever use, meanwhile: no problem

Never heard of any NTA or of business expenses being converted into money to myself when it's not on my account. Looking up what an NTA is in relation to taxes, they're a nonprofit themselves and don't appear to have authority of any kind besides PR https://en.m.wikipedia.org/wiki/National_Tax_Association

Regardless, if this were a concern in Germany, I'm sure our boss/director would have mentioned it on that call as a simple reason rather than finding excuses and saying we could pick a different nonprofit who needs it more to donate to

Companies donate all the time... this argument about it being considered income makes no sense, and if it did, just donate {income tax rate}% less if the company can't afford more and no problem either

The difference in quality is apparent though.

$10,000 and you have an account manager that will actually follow up on issues.

I recently paid $5k for software and its incredible the difference. Its like I have a part time contractor and software.

It's not clear from your comment, did you pay for commercial software or paid for an open source contributor's time?

And, regardless of the answer, what was your experience with the other option, for comparison?

That’s not my experience. We have systems that cost way into the six digits. You get an account manager that follows up and schedules a lot of meetings. But this doesn’t mean that difficult problems actually get resolved. Just a lot of talk. Even MS super premium support so far has never worked for me. They want an enormous amount of data and then go quiet.

that just highlight ignorance of your company dept handling the purchase, which is exactly the point of the comment you're replying.

would they be more competent if they allowed the company to make the better "purchase"?

"we're not arrogant, we just can't be bothered to do things that don't annoy others in any other way that is the way we expect it to be, and volonteers working for free should also account for our way to expect things"

bloody hell, corporate world is unbelievable

And like most bad things that companies do, it happens inevitably. The person who doesn't reject anything that might interfere with income generation will be fired and replaced with someone who will.

Meanwhile, the owners will maintain and carefully curate their ignorance about any of those subjects.

Counter argument, its just an excuse to get rid of scraping. Google and every search engine scrapes websites, internet archive scrapes websites to archive stuff, and I scrape data when using excel to import data. Also their are people who want to archive everything.

Iv had my own stuff be scrapped. My biggest issue was bandwidth but I wasn't a big site so it wasn't a big issue.

How come, in this era of no privacy, pixel trackers, data brokers, etc can they not easily stop scraping? Somehow bots have a way to be anonymous online yet consumers have to fight an uphill battle?

If their scraper is sufficiently diligent, they will also download the data dump. The ultimate hope is that when the AI wakes up, it will realize that its training data has both a fragmented dataset and also a giant tarball, and delete the fragments. This sounds like one of those situations where people prefer amortized cost analysis to avoid looking like fools.

If.

(unfortunately the historical example is poor, since Philip II proved both able and willing to make it happen, whereas AI has no demonstration of a path to utility)

I mean is it though? Even before gpt4 llm's existed....and not just from openAI.

I get not liking crawling and I hate openAI for how they ruined the term open source, but this is not new.

Iv had stuff scraped before and iv done web scrapping as well. Hell even excel will help you scrape web data. While some of the increase of training data has helped models like gpt4, its not just a factor of more data.

From my perspective as a small blogger, the general sentiment is yes, web scraping was a thing before the AI boom, but now it is an even bigger and far more careless thing. Just me two cents, though, mostly formed from chatting with other small bloggers about it. I suppose it could be argued that the size of the vehicle does not matter to the roadkill; they still get run over. But it honestly feels like the various AI platforms suddenly made web scraping much, much more fashionable.

It's been collapsing since digital piracy had first appeared. Then perpetuated by countries being elective at what kind of "intellectual property" rights they choose to respect.

"People with money" are a crucial milestone, because they were the ones who were actually actively benefiting from and upholding this institution.

Is it really? It's all boils down to powerplay, really. And lately lots of powerful entities, from nations to corporations, seem to be poised on disrupting this institution.

I noticed too that now that the old dinosaur incumbents in various industrial complexes can no longer compete, they want to get rid of non-compete clauses - so they can instead poach talent or at least those who had access to the latest technologies and process of actually innovative companies.

Except it's not collapsing. The only legal changes have been to allow billionaires to do whatever they want whenever they want, and have been made by judges and not legislatively. You're still going to get sued to oblivion.

edit: if we let them, they're just going to merge with the media companies and cross-license to each other.

There is a whole wide world beyond just obsessing over "billionaires" in a tiny us-centric corner of the world.

One could point out China, who totally has so much respect for someone's notion of legality. Or France that had never cared for software patents. Or one could point out good old pirates, who were always relatively successful at giving a middle finger to the notion of "intellectual property". "Billionaires" are simply another straw, peculiar only in the sense that they were the pillar upholding this institution.

And speaking of "not legislatively, but judges": 1. Not every country has a common law legal system 2. Just take a look at Japan's AI legislations.

（评论） (comments)

（评论）
(comments)