Can I use fail2ban like a pi-hole LLM scrapers?

Maroon@lemmy.world · 9 days ago

Can I use fail2ban like a pi-hole LLM scrapers?

lungdart@lemmy.ca · 9 days ago

Fail2ban is not a static security policy.

It’s a dynamic firewall. It ties logs to time boxed firewall rules.

You could auto ban any source that hits robots.txt on a Web server for 1h for instance. I’ve heard AI data scrapers actually use that to target big data rather than respect web server requests.

kn33@lemmy.world · edit-2 9 days ago

You could try putting up a cloudflare proxy and Turnstile (their captcha product) to try to help with it.

The truth is, though, if it’s static content, then you have to be able to stop them every time. Once they get it once, they got it. With how frequently they can try, it’s going to be difficult to stop them.

catloaf@lemm.ee · 9 days ago

No. You’d have to ban all the cloud providers. Good luck enumerating them all.