Pinned post

As inference splits into prefill and decode, Nvidia's Groq deal could enable a "Rubin SRAM" variant optimized for ultra-low latency agentic reasoning workloads (Gavin Baker/@gavinsbaker)

Gavin Baker / @gavinsbaker : As inference splits into prefill and decode, Nvidia's Groq deal could enable a “Rubin SRAM” variant optim...

24 September 2024

Google Cloud debuts Gemini 1.5 Flash and 1.5 Pro with a 2M context window, twice as big as before, "grounding" for better Google search accuracy, and new agents (Larry Dignan/Constellation Research)

Larry Dignan / Constellation Research:
Google Cloud debuts Gemini 1.5 Flash and 1.5 Pro with a 2M context window, twice as big as before, “grounding” for better Google search accuracy, and new agents  —  Google Cloud launched a series of updates including new Gemini 1.5 Flash and 1.5 Pro models with a 2 million context window …

Posted from: this blog via Microsoft Power Automate.

Daily Deals