7 February 2024

An in-depth look at Common Crawl, the 9.5PB web crawl archive dating back to 2008 run by a small nonprofit, its role in generative AI, its dataset, and more (Mozilla Foundation)

Mozilla Foundation:
An in-depth look at Common Crawl, the 9.5PB web crawl archive dating back to 2008 run by a small nonprofit, its role in generative AI, its dataset, and more  —  Common Crawl's Impact on Generative AI  —  Common Crawl's mission: Enabling others to work like Google  —  Common Crawl's data: Machine scale analysis

Posted from: this blog via Microsoft Power Automate.

Do your Amazon shopping through this link.