Dropsitenews published a list of websites Facebook uses to train its AI on. Multiple Lemmy instances are on the list as noticed by user BlueAEther

Hexbear is on there too. Also Facebook is very interested in people uploading their massive dongs to lemmynsfw.

Full article here.

Link to the full leaked list download: Meta leaked list pdf

  • litchralee@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    1
    ·
    24 hours ago

    IANAL either, but I’m vaguely familiar that this realm of USA law is known as “choice of law” provisions and the applicability of “click wrap” contracts, and it’s a thorny issue in the digital age. Essentially, the problem is whether Meta can be made reasonably aware that a ToS exists for a given web server. Unlike a “NO TRESPASSING” sign posted on a gate, or a sticker on the packaging of a physical copy of Microsoft Word 97 that says “opening this package constitutes agreement to the EULA, at this URL…”, it can be argued that unless the ToS is made so blitheringly obvious to a web scraper, it might not pass muster.

    To be clear, this isn’t a problem for normal web users, because the ToS link will very easily appear at the bottom of the page, when rendered in a standard web browser. The issue is whether scrapers – including AI scrapers but also bot-crawlers and even plain ol Curl – would see the notice of the ToS. There is no convention – either de facto or in law – about where or what format a ToS has to be. And it would be problematic to say that all scrapers need to thoroughly search a website for a “legal.txt”, because such a file might be somewhere non-obvious and because it exacerbates the whole “scrap servers until they collapse” issue.

    So already, getting a ToS to bind Meta – or any other high-volume scraper – is an upward battle. Hence why I suggested a remedy rooted in common law, premised on the idea that actively causing expenses for the server owner is actionable, even without a ToS.

    That said, I do want to point out one other detail about choice-of-law: normally if a contract specifies the venue for disputes, that will be honored. Example: the courts of Santa Clara County in California. But supposing the instance owner lives in Montreal and specifies the venue as the Court of Quebec, and if the issue with binding Meta to the ToS was solved, then there’s the challenge of actually targeting Meta. As a USA domiciled corporation, they’re not automatically within the jurisdiction that the Quebec courts can reach. If there’s a Canadian subsidiary, that might be a valid target. But if not, the Quebec courts wouldn’t be able to compel Meta’s lawyers to even show up, let alone rule in favor of the instance owner. And then there’s the whole aspect of getting an American court to ratify a judgement issued by an overseas court. It’s doable, but it’s so much harder than specifying a venue within the USA.

    But again, that’s problematic if the instance isn’t located within the USA, because then the owner must travel to the USA for their court dates. And I can’t really recommend that anyone travel to the USA except for only the most critical or dire of situations.