• muimota@lemmy.ml
    link
    fedilink
    English
    arrow-up
    5
    ·
    1 day ago

    AI race is between Chinese scientists and Chinese scientists based in US

  • algernon@lemmy.ml
    link
    fedilink
    arrow-up
    3
    arrow-down
    2
    ·
    1 day ago

    …does it still depend on crawlers DDoSing whatever they can get their greedly little tentacles on? While also trying to pretend they’re not AI scrapers?

    • m532@lemmygrad.ml
      link
      fedilink
      arrow-up
      3
      arrow-down
      1
      ·
      1 day ago

      Ever heard of reusing data? Its not the AI wildwest anymore. Scraping random data gives low quality (try SD1.5 to see what I mean). Good models need high-quality datasets.

      • algernon@lemmy.ml
        link
        fedilink
        arrow-up
        3
        arrow-down
        2
        ·
        1 day ago

        I wonder why scrapers hit my sites with millions of requests every day. Alibaba in particular is quite aggressive there.

          • algernon@lemmy.ml
            link
            fedilink
            arrow-up
            3
            ·
            17 hours ago

            Here you go. Daily stats from my defense system. All those disguised bots? ~60% of them are from Alibaba’s ASN.

            It is easy to verify, too: throw up any https site, and all the crawlers will be on your neck within days.

            There is a reason why Anubis’s botPolicies.yaml includes Alibaba. There’s a reason why a whole lot of sites - Codeberg included - blocks their entire ASN on the firewall.

            You’re welcome.

            • m532@lemmygrad.ml
              link
              fedilink
              arrow-up
              1
              arrow-down
              1
              ·
              edit-2
              12 hours ago

              It seems like I was wrong and they do need more data. But I think they have every right to go into their enemy’s imperialism tool and disrupt it however they see fit.