Will's Blog: A profile of nonprofit Common Crawl, which scraped billions of web pages since 2013, including paywalled articles, to build an archive used by OpenAI and others (Alex Reisner/The Atlantic)

4 November 2025

A profile of nonprofit Common Crawl, which scraped billions of web pages since 2013, including paywalled articles, to build an archive used by OpenAI and others (Alex Reisner/The Atlantic)

Alex Reisner / The Atlantic:
A profile of nonprofit Common Crawl, which scraped billions of web pages since 2013, including paywalled articles, to build an archive used by OpenAI and others — Common Crawl claims to provide a public benefit, but it lies to publishers about its activities.

Posted from: this blog via Microsoft Power Automate.