7 June 2024

Zyphra debuts Zyda, a 1.3T language modeling dataset it claims outperforms Pile, C4, arxiv - 2024-06-07 13:00:00Z

Title:Zyphra debuts Zyda, a 1.3T language modeling dataset it claims outperforms Pile, C4, arxiv Summary: Zyphra's Zyda is a 1.3T open dataset combining RefinedWeb, Starcoder, C4, Pile, Slimpajama, pe2so, and arxiv to help train large language models. Link: Zyphra debuts Zyda, a 1.3T language modeling dataset it claims outperforms Pile, C4, arxiv

Best Sellers