Will's Blog: DeepSeek researchers detail a new mHC architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden (Vincent Chow/South China Morning Post)

1 January 2026

DeepSeek researchers detail a new mHC architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden (Vincent Chow/South China Morning Post)

Vincent Chow / South China Morning Post:
DeepSeek researchers detail a new mHC architecture they used to train 3B, 9B, and 27B models, finding it scaled without adding significant computational burden — DeepSeek has published a technical paper co-authored by founder Liang Wenfeng proposing a rethink of its core deep learning architecture

Posted from: this blog via Microsoft Power Automate.