![]() |
|
![]() |
|
If I say, “Hey, please don’t text me anymore. I’m going to block this number,” and you respond by buying 500 phones in five cities and text me nonstop, is that ethical?
|
![]() |
|
The scientific aspects (algorithms, incl. implementations, performance evaluation) of Web crawling (including focused crawling) is covered by conferences like WWW, ACM SIGIR, BCS ECIR, ACM WSDM and ACM CIKM. But you may refer to informal MeetUps or trade fairs; if so, google "Web Data Extraction Summit", "OxyCon Web Scraping Conference", "ScrapeCon 2024" (all past) or the forthcoming: https://www.ipxo.com/events/web-data-extraction-summit-2024/ |
![]() |
|
Anti bot stuff also seems to be a security threat and privacy threat: preventing users from accessing your site if using VMs, port scanning, various froms of fingerprinting
|
![]() |
|
Even a very slight challenge is a problem for scrapers: they have to do it far more frequently. Its better than captchas and whatever Cloudflare does in terms of overall nuisance. |
![]() |
|
I use web scraping to identify and monitor fraud. Exhibit A: https://archive.ph/0ZUA8 This website is used to recruit people to set up "lead generation" Google Business Profiles and leave paid reviews. Exhibit B: https://archive.ph/WWZuw This is an example of the Craigslist ad used to initially attract people to the website above. Exhibit C: https://archive.ph/wip/7Xig4 This is one of the Google Maps contributors which left paid reviews. If you start with the reviews on that profile, you'll find a network of Google Business Profiles for fake service-area businesses connected through paid reviews. Web scraping allows me to collect this type of data at scale. I also use scraping to monitor the status of fake listings. If they are removed, the actor behind them will often get them reinstated. This allows me to report them again. |
![]() |
|
Not that I think you shouldn't do it or you're doing something wrong, but describing it as a right irks me the wrong way. You don't have any right to expect someone else's computers to work for you.
|
![]() |
|
I'm not sure how to phrase it except in terms of competing rights, but I take your point. At the point where I'm scraping, the data's on my computer though. |
![]() |
|
If that destroys someone's business, I don't actually care. Maybe it's selfish, but my right to re-format data for my own convenience outweighs their right to make a profit. Exhibit A |
![]() |
|
> What makes you think putting data on the Internet all the sudden means I unilaterally surrender the rights to my intellectual property? Because intellectual property doesn't exist. |
![]() |
|
Considering Walgreens is still fighting shoplifters, it’s obvious they’re not in a position to restrict their merchandise. They must not own it.
|
![]() |
|
I find it aptly hilarious that your own business model at broadcastify.com is recording publicly accessible radio broadcasts and then selling access to those recordings for commercial gain.
|
![]() |
|
Google benefits from legal scraping - ban them from robots.txt and they'll stop. Please don't mix consensual and non-consensual scraping, the difference is huge. |
I wrote an article about that last fall that got some attention here.
https://news.ycombinator.com/item?id=37264676