Voiceflow named a 2026 Best Software Award winner by G2
Read now

The story of Groq just changed. On December 24, 2025, Nvidia announced a non-exclusive licensing deal worth roughly $20 billion for Groq's Language Processing Unit (LPU) inference architecture, plus the move of founder Jonathan Ross and president Sunny Madra to Nvidia. Groq continues to operate as an independent company under new CEO Simon Edwards, but the competitive landscape that defined the AI chip conversation through 2024 and most of 2025 has reorganized around this deal.
This guide covers what Groq actually is in 2026, what the Nvidia deal means for builders considering the platform, the LPU architecture that drove the whole thing, and how to use GroqCloud's current model lineup. If you're picking inference infrastructure or evaluating whether to build voice and real-time agents on Groq, this is the practical overview.
Groq is a US chipmaker focused on AI inference. The company designs and operates its own inference hardware (the LPU) and runs GroqCloud, a token-as-a-service platform that lets developers run open-source large language models like Llama 4 and DeepSeek R1 at speeds that conventional GPUs can't match.
The headline number that gets repeated about Groq: Llama 4 Scout currently runs at over 460 tokens per second on GroqCloud, compared to roughly 100–150 tok/s for the same model on Nvidia H100 hardware. That speed advantage is the entire pitch.
Before the Nvidia deal, Groq had raised roughly $1.75 billion in total equity across multiple rounds. The most recent equity event was a Series E in September 2025 that brought in $750 million at a $6.9 billion post-money valuation. The round was led by Disruptive, with participation from BlackRock, Neuberger Berman, Samsung, Cisco, DTCP, D1 Capital, Altimeter, 1789 Capital, Infinitum, and earlier backers including Tiger Global.
Three months later, the Nvidia licensing deal added a $20 billion payment for non-exclusive access to Groq's inference IP. The capital structure is unusual: the $20 billion is a licensing payment that flowed to Groq's shareholders and the company, not a traditional acquisition price, and it's why the deal closed in under four months instead of the year-plus an M&A review would have taken.
The deal has four moving parts that matter for anyone evaluating Groq:
For builders, the practical near-term implication is that GroqCloud continues to run, the model catalog continues to expand, and the speed advantage continues to hold. The medium-term question is whether Nvidia's licensed implementation of the LPU architecture will compete with GroqCloud directly. That's worth watching.
Groq's primary chip is the Language Processing Unit (LPU), originally branded as the Tensor Streaming Processor (TSP). The LPU is purpose-built for inference (running trained models) as opposed to training, which is a different and more compute-heavy workload.
The architecture difference vs. GPUs:
These choices trade flexibility for speed. LPUs are not good general-purpose chips: they're inference chips, and they're optimized for the specific pattern of running a trained transformer model token-by-token. For that one workload, they're meaningfully faster.
Inference is the act of running a trained model on new input to produce output. Training is the act of building the model in the first place by feeding it labeled data and iteratively adjusting weights. The two have different compute profiles:
Groq's bet from the start was that inference would become the dominant cost in AI, and that the chip best suited to inference would not be the same chip that trains models. The Nvidia deal validates that bet. Nvidia's H100 and Blackwell GPUs are training-optimized chips that also do inference. Groq's LPU is an inference-only chip that's faster at the inference job.
The pre-2026 framing of "Groq vs. Nvidia" was a clean rivalry. Groq the startup, faster but smaller. Nvidia the incumbent, with 80%+ of the AI accelerator market. After the December 2025 licensing deal, the framing flipped: Nvidia now has the IP, Groq still has the cloud business, and the question for builders is which one of them to deploy on.
Two reasonable paths in 2026:
The pricing dynamics aren't fully settled yet (Nvidia hadn't published commercial LPU pricing as of May 2026), but Groq's "guarantees to beat published prices per million tokens by other providers for equivalent models" stance from 2024 remains in place. For most builders, GroqCloud directly is the right starting point until Nvidia's commercial implementation is clearer.
Groq's catalog is open-source-only. No GPT-4, no Claude, no Gemini. If you need proprietary frontier models, you're going to another provider for those (and many builders run a hybrid: Groq for fast open-source inference, Anthropic or OpenAI for frontier reasoning).
As of May 2026, the supported model list includes:
Model availability changes frequently. Check the live Groq supported models page for the current catalog, and the model deprecation page before standardizing on a specific model.
Groq runs three tiers:
Get a free API key at console.groq.com/keys.
Use cases where Groq's speed advantage actually changes what you can ship:
Cases where Groq is the wrong pick:
Voice is where latency matters most. A phone-based AI agent has roughly 500–800 ms of end-to-end latency budget per turn before callers notice "weird pauses." That budget has to cover speech-to-text, LLM inference, text-to-speech, and network round-trips.
On a typical GPU inference stack, the LLM call alone consumes 400–600 ms for a moderate response. That leaves almost nothing for STT and TTS. The result is the well-known phone-agent feel of slow turn-taking.
On Groq, the same LLM call completes in 100–150 ms. That puts the full turn comfortably under the 800 ms threshold, and the agent feels natural. For AI phone calls and AI call-center agents, this isn't a "nice to have." It's the difference between an agent customers tolerate and one they hang up on.
Conversation design for voice is also easier with Groq's speed. Designers can write longer, more nuanced agent responses without worrying about whether the TTS will start before the LLM finishes generating.
Voiceflow integrates Groq as a first-party LLM provider in the Agent Builder. That means you can pick Groq-hosted Llama 4 or DeepSeek R1 from the model dropdown the same way you'd pick GPT or Claude, without a custom integration. The provider list includes Anthropic, OpenAI, Google, Groq, and others, plus an OpenRouter test tier for experimenting with anything else.
What that gives you:
Why is Nvidia buying Groq?
Nvidia paid roughly $20 billion in December 2025 for a non-exclusive license to Groq's LPU architecture, plus the move of CEO Jonathan Ross and president Sunny Madra to Nvidia. The structure is licensing-plus-acqui-hire rather than acquisition, which let the deal close in under four months without a full M&A review. The strategic logic: LPU-style deterministic inference is becoming structurally important to AI compute, and Nvidia wanted access to the IP and the team rather than letting Groq capture inference share at GPU's expense. Senators Warren and Blumenthal are probing whether the licensing structure improperly sidesteps Hart-Scott-Rodino review.
Is Groq AI free?
Yes for evaluation, with rate limits. Groq Chat (the consumer-facing interface) is free. The Groq API has a free tier with around 30 requests per minute and ~14,400 requests per day on most models. Production usage typically needs the paid On-Demand or Business tier.
Who owns Groq?
Groq is a privately held company. Original founders Jonathan Ross (former Google engineer, designed the original Google TPU) and Douglas Wightman led the founding team in 2016. Equity is held by employees and a long list of institutional investors, including Tiger Global, Disruptive, BlackRock, Samsung, Cisco, D1 Capital, and Altimeter. After the Nvidia licensing deal, Simon Edwards is the CEO.
Is Groq publicly traded?
No. Groq is a private company. Shares are not listed on any stock exchange, and the company is not required to disclose financials. Pre-IPO secondary-market platforms occasionally offer Groq shares to accredited investors.
How do I invest in Groq?
As a private company, Groq isn't accessible through public markets. The realistic paths for individual investors: pre-IPO secondary platforms (Forge, EquityZen), venture and private-equity funds that hold Groq positions, or waiting for a future IPO. Most retail investors are better off treating Groq as a strategic player to watch rather than an investable position.
Groq AI vs. Grok by Elon Musk: what's the difference?
They're different products with confusingly similar names. Groq (with a Q) is the AI inference hardware and cloud company covered in this article. Grok (with a K) is the xAI conversational AI assistant Elon Musk built for X (formerly Twitter). Groq predates Grok by years and trademarked its name first.
The Groq story in 2026 is a different story than the Groq story in 2024. The chip architecture that drove the company forward has been validated by the loudest possible signal: Nvidia paid $20 billion for a license to it. And the company continues as an independent inference cloud with new leadership and a current open-source model catalog.
For builders, the practical decision is whether you need the speed. If you're shipping voice agents, real-time conversational AI, or agentic loops, the LPU advantage is concrete and measurable. If you're shipping non-real-time batch text processing, conventional GPU inference is fine.
The Voiceflow integration makes the trial trivial. Pick Groq from the model dropdown, point your existing agent design at Llama 4 or DeepSeek R1, and measure the latency difference for yourself.