Artificial intelligence has become the modern-day gold rush, sparking visions of a future transformed by its potential. From revolutionizing pharmaceutical drug discovery to crafting an unbiased legal system where algorithms weigh evidence, the hype is as boundless as it is bold. Yet, amidst the glittering promises of AI's capabilities, a critical yet often overlooked ingredient powers these breakthroughs: training data.
Consider the ambitious vision of an AI judge. Before it could ever weigh in on court rulings or analyze evidence, the system would require immense learning. This isn’t just a matter of gathering mountains of legal documents or historical rulings. That raw material is useless without precise, context-rich labeling. Humans provide this direction, painstakingly tagging data to teach machines what matters—and, just as importantly, what doesn’t.
Scale AI has positioned itself as the quintessential "pick and shovel" provider in this gold rush. By specializing in data labeling, Scale ensures that AI systems across industries—from self-driving cars to generative AI tools—can fulfill their promise. Without clean, labeled data, even the most sophisticated AI is nothing more than a very expensive shot in the dark.
Scale AI has grown from a niche startup focused on autonomous vehicle data labeling into a linchpin of the modern AI ecosystem. Its trajectory reflects the surging demand for high-quality data as AI permeates industries.
2016: Scale AI launches with a simple but essential idea: label data better and faster for AI applications. The company gains traction in the autonomous vehicle sector, winning early customers like Lyft, Zoox, and General Motors.
2018: ARR hits $17M, a clear signal that Scale is solving a real pain point in data labeling. The autonomous vehicle market serves as the launchpad for its rapid expansion.
2020-2022: Scale diversifies its customer base, moving into generative AI, government, and enterprise verticals, helping companies like OpenAI and Airbnb deploy models in production.
2023: ARR soars to $760M, growing 162% year-over-year, fueled by the generative AI boom and multi-year government contracts.
Key milestones include the AWS partnership in 2024, which integrates Scale’s generative AI solutions into AWS’s ecosystem, and contracts with U.S. defense agencies like the Department of Defense and the Air Force for geospatial data labeling. With 87 million generative AI data points labeled and 13 billion total annotations completed, Scale has become indispensable for companies building next-generation AI systems.
More recently, the company has launched evaluation tools, such as the SEAL Leaderboards, to help organizations assess the safety and accuracy of their AI models. These offerings position Scale as not just a data provider but an end-to-end infrastructure player for AI development.
The phrase "garbage in, garbage out" applies as much to AI as it does to software. Without accurately labeled data, AI models can make biased or unreliable decisions—a major obstacle for the ambitious systems Scale AI supports. To address this, Scale AI offers its Data Engine, which anchors its three product segments: Build AI, Apply AI, and Evaluate AI.
At its core, Scale AI provides annotated data to train machine learning models. Its offerings include:
Scale Rapid: A self-serve solution where companies can upload their data and have it annotated quickly and affordably by Scale’s workforce of over 240,000 contractors across Kenya, the Philippines, and Venezuela. Ideal for teams with urgent project timelines.
Scale Studio: Tools for enterprises that want to manage their own annotator workforce. Companies like Toyota Research Institute use Scale Studio to streamline internal labeling efforts and boost productivity.
Scale Pro: A premium, fully managed solution tailored for complex datasets like 3D LiDAR and video, with service-level guarantees to ensure accuracy.
This hybrid approach—leveraging human annotators and automation—enables Scale to handle everything from simple object labeling (e.g., identifying cars in a video) to more nuanced tasks like sentiment analysis and transcription. It’s the backbone of Scale’s business and a critical enabler for industries where precision matters.
As AI has evolved, so too has Scale’s business. Recognizing that data labeling alone won’t drive long-term differentiation, the company has invested heavily in expanding its product portfolio:
Scale Nucleus: A “data debugging” platform that allows customers to explore datasets, identify bad labels, and find failure cases. It’s especially valuable for AI teams fine-tuning generative models.
Scale GenAI Platform: A toolkit for developing and optimizing generative AI applications using retrieval-augmented generation (RAG) pipelines. This product positions Scale as a key enabler for large enterprises deploying custom LLMs.
Scale’s Donovan platform caters specifically to federal agencies, combining natural language querying with geospatial data analysis. Donovan allows operators to ask questions of maps and sensor feeds, translating insights into actionable intelligence. It’s a critical tool for the defense sector, where accurate data can mean the difference between success and failure.
Scale’s revenue comes from two core models:
Enterprise Agreements: Customized contracts with large organizations, with pricing based on data volume and complexity. These agreements account for the bulk of Scale’s ARR.
Pay-as-You-Go Tiers: Designed for startups and smaller teams, offering flexible pricing with no upfront commitment.
Scale’s 50-60% gross margins lag behind the SaaS average of 75%, reflecting the labor-intensive nature of its business. However, automation in areas like pre-labeling and model-assisted annotation could improve margins over time.
Scale AI is poised to move up the AI value chain by expanding beyond data labeling into broader AI software development. With new products like Launch and Validate, the company aims to provide end-to-end solutions for data management, model training, deployment, and performance monitoring. This strategy positions Scale as a foundational tool for AI developers, akin to Atlassian in the software development space. By integrating these offerings with its existing tools, Scale hopes to carve out market share in the rapidly growing AI software market, projected to reach $120 billion by 2025.
Beyond new products, Scale’s ability to serve diverse industries offers significant opportunities. After a decline in autonomous vehicle workloads in 2022, Scale pivoted to meet rising demand from large language model developers, demonstrating its adaptability. The $22 billion data labeling market continues to grow, with demand coming from sectors like government, public enterprises, and legacy industries digitizing their operations. Recent wins, such as a $249 million contract with the U.S. Department of Defense, highlight Scale’s success in securing high-value contracts and entering new sectors.
International expansion offers another avenue for growth. While most of its revenue comes from the U.S., Europe’s AI software market is expected to grow to $26.5 billion by 2025, and China’s AI economy could contribute $600 billion annually by 2030. Competitors like Appen have already seen success in these regions, particularly in autonomous vehicles, and Scale’s expertise positions it well to compete globally.
Strategic partnerships also play a vital role in Scale’s growth. Collaborations with organizations like Toyota Research Institute and OpenAI have enhanced its capabilities and credibility, while providing access to diverse datasets and international markets. By continuing to develop targeted solutions and fostering partnerships, Scale can expand its reach and solidify its position as a critical enabler of the AI revolution.
Scale AI operates in a crowded data labeling market with competitors like Amazon Mechanical Turk, Labelbox, Appen, and Hive. These companies similarly rely on human labor to provide labeled datasets for AI models, a space often perceived as commoditized. Scale AI distinguishes itself by integrating automation into its labeling workflows, reducing costs over time while improving scalability and accuracy. This hybrid approach gives Scale an edge, as its algorithms improve with each labeled dataset, creating a virtuous data flywheel that competitors like Appen, with less automation, struggle to replicate.
Scale AI’s evolution into a broader ML infrastructure company puts it in competition with two primary categories: machine learning-focused companies like Databricks and enterprise cloud platforms like AWS, Google, and Microsoft. While Scale leverages its expertise in labeling as a wedge into AI workflows, it lacks an integrated data storage solution, making its ecosystem less seamless than competitors like AWS. Still, Scale’s specialized focus on high-quality data annotation and its enterprise-ready tools give it a foothold in industries requiring precision, such as defense and generative AI.
In the broader data labeling market, companies like Labelbox and Hive offer targeted solutions. Labelbox focuses on ML-specific workflows, while Hive emphasizes content moderation for B2C applications. Meanwhile, smaller players like V7 Darwin cater to niche needs in computer vision, offering accessible platforms for smaller teams. These competitors highlight Scale AI’s positioning as a provider for large-scale, high-stakes applications.
In the ML SaaS space, competitors like Databricks and C3 AI provide differentiated platforms that integrate data storage and AI application development. Databricks’ data lakehouse infrastructure, for instance, offers seamless integration for AI workflows, while Scale’s platform requires external storage solutions like AWS S3. Despite these limitations, Scale’s ability to partner with major organizations and deliver tailored solutions ensures its relevance in the evolving AI landscape.
Scale AI is uniquely positioned at the crossroads of two powerful trends: the rise of generative AI and the growing demand for high-quality data. Its share price, which has skyrocketed from less than $0.01 in 2017 to $14.55 in 2024, reflects its dominance in the space. Now, with shares available at $15.30 through a secondary opportunity, investors can buy in at the current implied valuation of $13.8 billion.
Pros | Cons |
---|---|
Essential to AI Development: Provides the foundational labeled data required for AI systems to function effectively. | Highly Competitive Market: Competes with established players like AWS, Appen, and Labelbox, making differentiation challenging. |
Diversified Applications: Serves industries such as generative AI, autonomous vehicles, real estate, and government. | Commoditized Sector: Data labeling is often viewed as a low-moat service, reliant on operational efficiency rather than unique innovation. |
Strong Financial Traction: Achieved $760M ARR in 2023, with 162% year-over-year growth, demonstrating strong market demand. | High Labor Dependency: Heavy reliance on a large contractor workforce may limit scalability and automation in the short term. |
Strategic Partnerships: Collaborates with major players like OpenAI, AWS, and Toyota Research Institute, enhancing its credibility and market access. | Lack of Integrated Ecosystem: Relies on external storage solutions like AWS S3, making its products potentially more expensive for customers compared to integrated offerings. |
Expanding Product Portfolio: Moves beyond data labeling to AI tools like Launch, Validate, and the SEAL Leaderboards, positioning itself as a comprehensive AI development partner. | Pressure on Margins: Gross margins of 50–60% lag behind SaaS averages (75%), reflecting the labor-intensive nature of its operations. |
Geographic Expansion Potential: U.S.-based revenue dominance leaves room to capture growth in Europe and China, where AI markets are growing rapidly. | Rising Costs of Competition: Competing globally and expanding into new markets requires significant investment in infrastructure and partnerships. |
Industry-Specific Expertise: Proven track record of serving industries with high-stakes data needs, such as defense and autonomous vehicles. | Automation Risk: As automation increases in the data labeling industry, maintaining competitive pricing and quality may become more challenging. |
Currently, accredited investors can purchase Scale AI shares through Augment, with a $5K minimum at around $15.30/share. This opportunity allows participation in the company that fuels the entire AI ecosystem.
Scale AI is not a company which allows much trading of its stock at this level. Typically investors would have to buy preferred shares at a block of $1M or more, which makes this particular opportunity unique.
If you choose to invest, there are a few parting words of advice I’d like to offer:
Accredited investors only: Private market opportunities require accreditation.
Trust the platforms: Augment and Hiive are reputable operators with a strong track record.
Expect delays: Private market transactions can take time to close, and not every deal goes through. Don’t be discouraged—other opportunities will follow.
Scale AI sits at the heart of the AI revolution, solving a problem that many overlook: the painstaking work of teaching AI how to think. As the internet becomes more optimized—and sometimes less insightful—Scale’s focus on quality data ensures AI applications can remain reliable, fair, and transformative. For investors, this could very well be one of the most important infrastructure plays of the decade.
As always, if you want us to clarify anything in this material, shoot us an email at [email protected] and we’ll respond as soon as we can.
This material has been distributed solely for informational and educational purposes only and is not a solicitation or an offer to buy any security or to participate in any trading strategy. All material presented is compiled from sources believed to be reliable, but accuracy, adequacy, or completeness cannot be guaranteed, and Cold Capital makes no representation as to its accuracy, adequacy, or completeness.
The information herein is based on Cold Capital’s beliefs, as well as certain assumptions regarding future events based on information available to Cold Capital on a formal and informal basis as of the date of this publication. The material may include projections or other forward-looking statements regarding future events, targets or expectations. Past performance of a company is no guarantee of future results. There is no guarantee that any opinions, forecasts, projections, risk assumptions, or commentary discussed herein will be realized. Actual experience may not reflect all of these opinions, forecasts, projections, risk assumptions, or commentary.
Cold Capital shall have no responsibility for: (i) determining that any opinions, forecasts, projections, risk assumptions, or commentary discussed herein is suitable for any particular reader; (ii) monitoring whether any opinions, forecasts, projections, risk assumptions, or commentary discussed herein continues to be suitable for any reader; or (iii) tailoring any opinions, forecasts, projections, risk assumptions, or commentary discussed herein to any particular reader’s objectives, guidelines, or restrictions. Receipt of this material does not, by itself, imply that Cold Capital has an advisory agreement, oral or otherwise, with any reader.