…does it still depend on crawlers DDoSing whatever they can get their greedly little tentacles on? While also trying to pretend they’re not AI scrapers?
Ever heard of reusing data? Its not the AI wildwest anymore. Scraping random data gives low quality (try SD1.5 to see what I mean). Good models need high-quality datasets.
Here you go. Daily stats from my defense system. All those disguised bots? ~60% of them are from Alibaba’s ASN.
It is easy to verify, too: throw up any https site, and all the crawlers will be on your neck within days.
There is a reason why Anubis’s botPolicies.yaml includes Alibaba. There’s a reason why a whole lot of sites - Codeberg included - blocks their entire ASN on the firewall.
It seems like I was wrong and they do need more data. But I think they have every right to go into their enemy’s imperialism tool and disrupt it however they see fit.
…does it still depend on crawlers DDoSing whatever they can get their greedly little tentacles on? While also trying to pretend they’re not AI scrapers?
Ever heard of reusing data? Its not the AI wildwest anymore. Scraping random data gives low quality (try SD1.5 to see what I mean). Good models need high-quality datasets.
I wonder why scrapers hit my sites with millions of requests every day. Alibaba in particular is quite aggressive there.
Prove it
Here you go. Daily stats from my defense system. All those disguised bots? ~60% of them are from Alibaba’s ASN.
It is easy to verify, too: throw up any https site, and all the crawlers will be on your neck within days.
There is a reason why Anubis’s botPolicies.yaml includes Alibaba. There’s a reason why a whole lot of sites - Codeberg included - blocks their entire ASN on the firewall.
You’re welcome.
It seems like I was wrong and they do need more data. But I think they have every right to go into their enemy’s imperialism tool and disrupt it however they see fit.