In the world of chips, the fastest doesn’t always win, but the most well-timed often does.
Back in 2007, Nvidia was known mostly to gamers and graphics nerds. Then a few PhDs started using its GPUs to train neural networks. It wasn’t elegant, but it was fast, and fast enough turned out to be good enough. The rest is market cap history: Nvidia surfed the AI wave to a $3 trillion valuation, largely because it was in the right place with the right architecture when compute demands changed.
Today, another shift is underway. Less flashy than training billion-parameter models, but arguably more important for what comes next. As AI moves from research labs into real-world applications, performance is being redefined. It’s no longer about how much data you can crunch. It’s about how fast you can respond.
That’s latency: the time it takes for an AI to go from input to output. Ask a question—how long before you get a useful answer? Start typing—how quickly does your copilot complete your thought? In a world where AI agents are expected to think in real time, delays of even half a second feel broken.
Low-latency inference, the art of making AI feel instant, is fast becoming the new competitive frontier. Coders want completion while they type. Doctors want image analysis before the patient leaves. Traders want predictions before the next market move.
Most chipmakers are still chasing throughput: how many tokens they can generate at scale. But a handful are asking a different question. What if speed isn’t about volume, but about immediacy?
One company bet early, and bet hard, on that premise. It didn’t just tweak GPU design. It tore up the playbook. It built a chip from scratch, optimized for inference rather than training, latency rather than scale. For years, it looked like a weird bet. Now, it looks like a wedge into one of the fastest-growing corners of the AI stack.
The company is Groq.
Groq has been building toward this moment for nearly a decade.
Founded in 2016 by Jonathan Ross, former lead architect of Google’s Tensor Processing Unit, Groq set out to rethink AI chips from the ground up. Rather than optimizing for training or repurposing GPU designs, Groq focused exclusively on one challenge: ultra-low-latency inference.
It took years of architectural work to bring that vision to life. By 2020, Groq had deployed its first chip, the GroqChip 1, with early enterprise partners. A $300 million funding round in 2021 helped scale engineering and go-to-market efforts. But it wasn’t until 2023 that Groq crossed a commercial threshold, generating $3.4 million in revenue and launching its public-facing cloud platform, GroqCloud.
Since then, growth has accelerated. Groq ended 2024 with an estimated $25 to $30 million in revenue, fueled by developer adoption and early enterprise use. GroqCloud now serves over 20,000 developers, with usage doubling every few months. These developers are powering real-time applications—autonomous agents, voice assistants, and more—where response speed matters.
Performance is Groq’s edge. As of early 2025, the platform delivers 250 tokens per second at a cost of $1.00 per million tokens. That’s more than triple the speed of most cloud inference services and faster than offerings from Microsoft and Amazon. Even newer startups rarely exceed 75 tokens per second.
Enterprises are beginning to take note. Saudi Aramco has deployed Groq chips on-premise, enabling secure, local LLM inference without relying on third-party clouds.
What began as a niche bet on speed is becoming a wedge into AI infrastructure. While still early in its revenue journey, Groq’s developer traction and architectural focus suggest it’s positioned for breakout scale.
Milestones
2016 – Founded by Jonathan Ross, ex-Google TPU lead
2018 – Series B at ~$6.75/share
2020 – GroqChip 1 deployed for early inference workloads
2021 – Series C at $11.54/share
2023 – $3.4M revenue and GroqCloud launch
Early 2024 – $640M Series D from BlackRock, Cisco, Samsung, Tiger Global, and D1 Capital; share price peaks at $16.08
Late 2024 – $25M+ revenue, 20,000+ GroqCloud developers, enterprise deals expanding
2025 – Performance benchmarked at 250 tokens/second; 4nm chip rollout on track
Groq operates at the intersection of semiconductors and AI infrastructure, with a hybrid model spanning chip sales, cloud services, and enterprise deployments. While revenue is still modest, the structure is built for scale.
At its core is the Tensor Streaming Processor (TSP), a custom chip purpose-built for inference. Unlike general-purpose GPUs, Groq’s architecture was designed from scratch to prioritize latency: the time between input and output. That decision shapes every aspect of the business.
Groq’s primary revenue stream comes from GroqCloud, which lets developers run LLMs using Groq chips via API. It’s a pay-as-you-go model, priced aggressively at $0.27 per million tokens.
For context:
Microsoft Azure charges $1.60
Amazon Bedrock clocks in at $2.10
Newer startups range from $0.80–$1.20
Despite lower pricing, Groq outperforms most on speed. This is likely a land-grab strategy—subsidize usage to build loyalty and capture market share. Margins are likely close to zero today, but chip-level efficiency and scale could improve them over time.
The long game resembles AWS or Snowflake: own the full stack, grow volume, then optimize for margin.
Groq also sells chips and servers directly to enterprises. These customers—often in regulated sectors—need local compute for privacy, reliability, or speed. The model is more traditional:
One-time hardware sales
Annual support and maintenance
Potential for private GroqCloud instances
Saudi Aramco is a flagship example, running LLMs in-house with Groq’s hardware. While this side of the business is smaller today, margins can exceed 50 percent.
Groq’s chips are significantly cheaper to manufacture than Nvidia’s. Estimated wafer costs:
Groq TSP (14nm): under $6,000
Nvidia H100 (5nm + HBM): over $16,000
On a per-token basis, Groq’s raw chip cost is roughly 70 percent lower. However, Groq systems require more chips—576 versus 8 for similar throughput—raising system complexity.
Even so, Groq maintains an estimated 40 percent total cost advantage for latency-sensitive workloads. As it moves to 4nm chips in 2025, that gap could widen, with higher performance per unit and fewer chips per rack.
Groq’s opportunity stems from a shift in how AI is used. As models move from batch tools to real-time agents, latency becomes the bottleneck. Groq is one of the few chipmakers built around that constraint.
Most infrastructure today is optimized for throughput—how much data can be processed in aggregate. That works for training models or running long tasks in the background. But it breaks down when users expect instant responses: chatbots, copilots, autonomous systems. These applications require hardware built for speed, not just scale.
Groq is positioned as the anti-GPU. It’s a chip designed for inference, not training; for responsiveness, not bulk. Its architecture delivers over 250 tokens per second, enabling near-instant outputs from models like Llama, Mixtral, and Mistral. That speed opens the door to new use cases, from AI voice assistants to enterprise copilots.
The market for AI inference acceleration was estimated at $12 billion in 2022 and could reach $83 billion by 2027. Groq is initially focused on the $3 to $5 billion slice that depends on ultra-low latency. That niche is growing fast as AI gets more interactive.
Many startups try to out-GPU Nvidia and struggle. Groq avoids that fight by focusing on workloads where GPUs fall short. Its architecture fits emerging needs that traditional infrastructure can’t serve efficiently.
Groq’s current chips require more units per deployment than GPUs, which limits use cases. But the move to 4nm chips in 2025 could shrink system size and improve economics, opening up broader applications.
Groq also has a chance to co-design models that better exploit its architecture. Just as Nvidia shaped the ecosystem with CUDA, Groq could partner with labs to tailor LLMs to its chip—deepening performance gains and increasing lock-in.
Real-time AI is moving into sectors where delays are costly:
Autonomous Systems – Whether in defense, aerospace, or industrial automation, split-second decisions can’t rely on distant cloud infrastructure. Groq's on-prem inference capabilities could power autonomous drones, vehicles, or robotics with minimal lag.
Financial Services – High-frequency trading, fraud detection, and risk analysis increasingly use LLMs and predictive models. Low-latency inference could mean faster insights—and in some cases, competitive advantage.
Healthcare – Real-time medical imaging analysis, diagnostic assistants, and voice-to-text EMR systems require instant feedback. Hospitals deploying LLMs at the edge could benefit from Groq’s responsiveness and data sovereignty.
Telecom and Edge AI – As networks adopt AI-driven optimization and customer service agents, Groq’s chips could support localized, real-time language understanding and anomaly detection in decentralized environments.
Consumer Applications – AI-native user interfaces like voice assistants, augmented reality overlays, and personal copilots demand a level of responsiveness that GPUs struggle to deliver—especially on mobile or edge devices.
Each of these verticals shares a common trait: the value of AI is degraded if there’s a lag. In that sense, Groq isn’t just competing on performance. It’s competing on user experience—and increasingly, that’s the thing companies are willing to pay for.
The AI hardware market is crowded, but not all players are chasing the same prize. Groq is focused on a narrower target: low-latency inference.
Today, Groq faces pressure from three major categories of competitors: incumbent chipmakers, cloud hyperscalers, and a growing class of well-funded AI hardware startups. Each group brings different strengths, and different threats.
Nvidia remains the market’s gravitational center. Its general-purpose GPUs power nearly all model training today and a growing share of inference workloads. With a market cap exceeding $3 trillion, an end-to-end software ecosystem (CUDA), and deep cloud integrations, Nvidia is the default.
But GPUs are fundamentally designed for throughput. Groq’s performance advantage—delivering over 250 tokens per second comes specifically in low-latency scenarios, where Nvidia’s architecture is less efficient. This gives Groq a lane to operate in without directly taking on Nvidia’s core training business.
AMD and Intel are also in the mix, with AI-focused chips like MI300 (AMD) and Gaudi (Intel), but they lag behind Nvidia in both performance and developer ecosystem depth.
The cloud giants have become chipmakers. Google has TPUs, Amazon has Inferentia, and Microsoft is developing custom silicon. These companies can bundle hardware with cloud services, giving them an advantage in pricing and distribution.
But these chips are often built for internal use and cost control, not for third-party adoption. Groq’s ability to support both cloud and on-premises deployments gives it flexibility that hyperscaler chips don’t always match.
AI chip startups raised more than $10 billion in 2021 alone. Cerebras built room-sized wafers for model training. Graphcore and Tenstorrent have focused on more general-purpose AI compute.
Few have achieved strong commercial traction. Many struggle to differentiate on performance or build developer ecosystems.
Groq’s strength is its specificity. It targets a slice of AI—LLM inference, real-time agents, on-device copilots—where low latency matters most. That focus allows it to outperform general-purpose rivals in targeted scenarios.
Groq has raised more than $1 billion since 2016, with a steady climb in valuation and growing support from top-tier investors. Its most recent funding came in August 2024, when it closed a three-part Series D totaling $640 million at a $2.81 billion valuation.
The round was led by BlackRock and Neuberger Berman, joined by strategic investors including Cisco Investments, Samsung, Global Brain, and Type One Ventures. Many were repeat participants, reinforcing long-term conviction. Earlier rounds were backed by firms like D1 Capital, Tiger Global, The Spruce House Partnership, and Social Capital, signaling strong institutional continuity across stages.
Since its 2016 seed round at $0.97 per share, Groq’s equity has appreciated more than 16x. That upward momentum reflects demand for high-preference securities and growing confidence in the company’s technical edge and market timing.
At $25 to $30 million in estimated 2024 revenue, Groq is trading at approximately 100x forward sales. This is a long-duration bet on a differentiated architecture and a fast-emerging category. For investors, the appeal lies in following capital, backing a company where some of the most sophisticated allocators are already in.
Pros | Cons |
---|---|
Market-leading inference speed: 250+ tokens/sec, 3–5x faster than peers | Low current revenue base: Estimated $25–30M in 2024; early-stage scale |
Purpose-built architecture: Custom chip optimized for low-latency AI use | High system cost: Requires more chips per deployment than GPU-based setups |
Aggressive pricing: $0.27 per million tokens to drive developer adoption | Narrower market focus: Optimized for latency over throughput |
Backed by top-tier investors: BlackRock, Cisco, Samsung, Tiger Global | Facing hyperscaler competition: AWS, Google, and Microsoft building in-house chips |
Dual GTM model: Cloud + on-prem enterprise allows broad deployment options | Unproven long-term margins: Cloud unit economics may remain tight for years |
Groq is currently trading around $20 per share on the secondary market, though most opportunities come with minimums in the low six figures. While the company was among the most actively traded private securities just a few months ago, shares have become increasingly scarce.
Platforms like Hiive and StartEngine have historically hosted Groq offerings, but as of now, no active deals are available. Supply may return, but access is becoming more sporadic as holders grow more selective.
If you’d like to be notified when a new opportunity opens up, feel free to reach out at [email protected].
This material has been distributed solely for informational and educational purposes only and is not a solicitation or an offer to buy any security or to participate in any trading strategy. All material presented is compiled from sources believed to be reliable, but accuracy, adequacy, or completeness cannot be guaranteed, and Cold Capital makes no representation as to its accuracy, adequacy, or completeness.
The information herein is based on Cold Capital’s beliefs, as well as certain assumptions regarding future events based on information available to Cold Capital on a formal and informal basis as of the date of this publication. The material may include projections or other forward-looking statements regarding future events, targets or expectations. Past performance of a company is no guarantee of future results. There is no guarantee that any opinions, forecasts, projections, risk assumptions, or commentary discussed herein will be realized. Actual experience may not reflect all of these opinions, forecasts, projections, risk assumptions, or commentary.
Cold Capital shall have no responsibility for: (i) determining that any opinions, forecasts, projections, risk assumptions, or commentary discussed herein is suitable for any particular reader; (ii) monitoring whether any opinions, forecasts, projections, risk assumptions, or commentary discussed herein continues to be suitable for any reader; or (iii) tailoring any opinions, forecasts, projections, risk assumptions, or commentary discussed herein to any particular reader’s objectives, guidelines, or restrictions. Receipt of this material does not, by itself, imply that Cold Capital has an advisory agreement, oral or otherwise, with any reader.
Different types of investments involve varying degrees of risk, and there can be no assurance that the future performance of any specific investment, investment strategy, company or product made reference to directly or indirectly in this material, will be profitable, equal any corresponding indicated performance level(s), or be suitable for your portfolio. Due to rapidly changing market conditions and the complexity of investment decisions, supplemental information and other sources may be required to make informed investment decisions based on your individual investment objectives and suitability specifications. All expressions of opinions are subject to change without notice. Investors should seek financial advice regarding the appropriateness of investing in any security of the company discussed in this presentation.