Title:Zyphra debuts Zyda, a 1.3T language modeling dataset it claims outperforms Pile, C4, arxiv Summary: Zyphra's Zyda is a 1.3T open dataset combining RefinedWeb, Starcoder, C4, Pile, Slimpajama, pe2so, and arxiv to help train large language models. Link:
Zyphra debuts Zyda, a 1.3T language modeling dataset it claims outperforms Pile, C4, arxiv Best Sellers