After dabbling in the world of LLM poisoning, I realised that I simply do not have the skill set (or brain power) to effectively poison LLM web scrapers.

I am trying to work with what I know /understand. I have fail2ban installed in my static webserver. Is it possible now to get a massive list of known IP addresses that scrape websites and add that to the ban list?

  • kn33@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    7 days ago

    You could try putting up a cloudflare proxy and Turnstile (their captcha product) to try to help with it.

    The truth is, though, if it’s static content, then you have to be able to stop them every time. Once they get it once, they got it. With how frequently they can try, it’s going to be difficult to stop them.