Invest in the centralization of data

An investment opportunity in the private markets

Invest in the centralization of data

History is shaped by moments when scattered systems are brought together, creating something far greater than the sum of their parts. Consider the unification of the railways in the 19th century. Before standard gauges and centralized networks, train lines operated in isolation, forcing passengers and goods to change trains repeatedly. This inefficiency stifled trade and hindered progress. But once rail systems were standardized and interconnected, transportation became faster, cheaper, and more reliable. This transformation enabled the rapid movement of goods across vast distances, opened up new markets for businesses, and created industries that had never existed before, from mass manufacturing to national retail chains. It was a turning point that fueled unprecedented economic expansion and innovation.

Databricks is driving a similar revolution in the world of data. For decades, businesses have struggled with fragmented data—silos that limited insights and stifled innovation. Databricks’ platform eliminates these barriers, unifying data into a single, seamless ecosystem. By enabling real-time analytics and scalable AI solutions, Databricks transforms how organizations extract value from their data, much like how the unified railways unlocked unprecedented possibilities for trade and transportation.

The future of data isn’t siloed—it’s connected, and Databricks is leading the charge.

Traction

Databricks has experienced extraordinary growth, firmly establishing itself as a leader in the data analytics and AI space. Its consistent revenue expansion, growing customer base, and successful product launches highlight its strong market position.

Key Milestones: A Timeline of Growth

  • 2013: Databricks was founded by the creators of Apache Spark at UC Berkeley, focusing on simplifying large-scale data processing.

  • 2017: Partnered with Microsoft to launch Azure Databricks, a pivotal move that fueled early enterprise adoption.

  • 2019: Reached a $200M annual revenue run rate by the end of Q3.

  • 2020: Surpassed $425M in annual revenue, doubling from the prior year.

  • 2022: Achieved $1B in annual recurring revenue (ARR) by August, reflecting a 70% YoY growth rate.

  • 2023: Databricks SQL, the company’s data warehousing product, hit $100M ARR in just one year after launch.

  • June 2024: ARR reached $2.4B, growing 60% year-over-year, while its customer base expanded to over 11,500 globally.

Growth Metrics

Databricks has demonstrated remarkable revenue growth, reaching $2.4 billion in ARR by June 2024, a 60% year-over-year increase from $1 billion in ARR achieved in August 2022. Its Databricks SQL product has been a standout performer, generating $400 million in ARR by April 2024—quadrupling its contributions just one year after launching in 2023. This rapid expansion underscores the company’s ability to innovate and capture market share in competitive segments like data warehousing.

The customer base has also grown significantly, climbing from 9,000+ customers in March 2023 to over 11,500 by mid-2024. Databricks counts global enterprises such as Disney, Apple, HSBC, Toyota, and Adobe among its clients, serving a broad range of industries, including finance, healthcare, and manufacturing. The company’s average contract value (ACV) has risen steadily to $208,696 as of June 2024, reflecting its ability to secure higher-value enterprise contracts.

Databricks continues to excel in retaining and expanding its customer relationships, with a Net Dollar Retention Rate (NDR) of 140% in mid-2024, signaling robust upselling and cross-selling opportunities. Profitability remains a strong suit, with gross margins of 80% as of 2024, though slightly down from 85% the previous year as the company scales. This combination of retention, contract value growth, and strong gross margins highlights the resilience of Databricks’ business model.

Noteworthy Product Impact

Databricks' products have been central to its growth:

  • Azure Databricks: Microsoft partnership boosted enterprise adoption, driving ARR from <$1M in early 2017 to $100M by 2018.

  • Databricks SQL: Rapid adoption among data analysts drove significant ARR contributions in less than two years post-launch.

  • MosaicML Acquisition: Enhanced AI/ML capabilities, positioning Databricks as a major player in the rapidly expanding AI market.

Business Model

Databricks has built a thriving business by monetizing its open-source foundation while delivering enterprise-grade solutions for data-driven organizations. Its approach simplifies complex data infrastructures and offers scalable tools for businesses of all sizes.

  1. Open-Source Foundation

    Databricks’ platform is built on popular open-source projects, including Apache Spark, Delta Lake, and MLflow. These tools enable businesses to process, analyze, and manage vast datasets for machine learning and advanced analytics. The open-source approach attracts a diverse community of users while positioning Databricks as a leader in the data engineering and AI space.

  2. Enterprise Solutions

    While the open-source tools are free to use, most companies lack the engineering resources to manage their complexities. Databricks bridges this gap by offering a fully managed platform with proprietary features, enterprise-level security, governance, and collaboration tools. Its premium and enterprise tiers cater to organizations with advanced data processing and analytics needs.

  3. Revenue Streams

    Databricks generates revenue from its suite of offerings, including:

    • Databricks SQL: A data warehousing tool enabling real-time analytics and SQL-based BI dashboards.

    • Delta Lake: A robust storage layer combining the flexibility of data lakes with the structured data management of data warehouses.

    • MLflow: A machine learning lifecycle management platform, widely adopted for developing and deploying AI models.

    • Unity Catalog: A governance solution that ensures data security and accessibility across the organization.

  4. Consumption-Based Pricing

    Databricks employs a pay-as-you-go pricing model, where customers are charged based on the compute resources consumed. The platform’s proprietary unit, the Databricks Unit (DBU), standardizes billing based on usage, tier, and cloud provider (AWS, Azure, or Google Cloud). This approach ensures customers only pay for what they use, improving cost efficiency.

  5. Lakehouse Advantage

    Databricks has pioneered the lakehouse concept, combining the cost efficiency and scalability of data lakes with the performance and reliability of data warehouses. By offering a unified platform for all data needs—structured, semi-structured, and unstructured—Databricks eliminates the need for multiple data systems. This differentiation positions it strongly against competitors like Snowflake and traditional cloud-based solutions.

  6. Dual Go-to-Market Strategy

    Databricks utilizes both a bottom-up and enterprise sales approach:

    • Bottom-Up: A free community edition serves as an entry point, converting users into paying customers as they scale their data needs.

    • Enterprise Sales: Databricks leverages endorsements from data engineers and scientists already using its tools to shorten sales cycles when pitching to CIOs and other decision-makers.

Opportunity

Databricks is well-positioned to capitalize on several transformative trends shaping the data analytics and artificial intelligence markets. Its Lakehouse architecture, strategic acquisitions, and strong partnerships place it at the intersection of some of the fastest-growing opportunities in technology.

Market Expansion

The global data analytics market is projected to grow from $271.8 billion in 2022 to $655.5 billion by 2029, fueled by enterprises shifting from siloed systems to centralized data architectures. Databricks’ Lakehouse platform, which integrates the flexibility of data lakes with the reliability of data warehouses, is ideally suited to drive this transformation. Additionally, the data lake market itself, valued at $7.9 billion in 2019, is expected to grow at a compound annual growth rate (CAGR) of 21% to reach $20.1 billion by 2024.

The artificial intelligence (AI) market offers an even larger opportunity, forecasted to grow from $119.8 billion in 2022 to $1.6 trillion by 2030. Databricks’ AI capabilities, strengthened by its acquisition of MosaicML in 2023, enable organizations to train cost-effective large language models (LLMs). This positions Databricks as a compelling alternative for businesses seeking to control their AI workflows while reducing dependency on proprietary providers like OpenAI.

Data Centralization and AI Adoption

As businesses increasingly adopt centralized data architectures, the demand for unified platforms like Databricks' Lakehouse continues to grow. The ability to integrate structured and unstructured data on one platform simplifies infrastructure management, facilitates real-time insights, and enables scalable machine learning models. This convergence addresses a key pain point for companies grappling with siloed systems and growing data complexity.

Databricks is also poised to benefit from the rapid adoption of AI. By integrating popular frameworks like TensorFlow and PyTorch and partnering with Nvidia for AI infrastructure optimization, Databricks ensures that its platform is AI-ready. These capabilities position it as a leader in helping companies deploy machine learning and AI at scale while reducing costs and complexity.

Future Opportunities

Beyond its current market focus, Databricks has several avenues for further expansion:

  • AI Boom: As enterprises race to adopt AI, Databricks’ MosaicML tools provide a cost-effective solution for building proprietary AI applications.

  • Digital Transformation: Industries increasingly prioritize data-centric strategies to improve customer experiences and streamline operations. Databricks’ ability to unify data and generate real-time insights makes it a critical partner for these transformations.

  • Data Governance: With increasing regulatory pressures, Databricks’ Unity Catalog offers a scalable solution for managing data access, lineage, and security within a unified environment.

These trends place Databricks at the center of the rapidly evolving data and AI ecosystem, providing significant long-term growth potential.

Competition

Databricks operates in a highly competitive environment, contending with both established players and emerging rivals across data warehousing, machine learning, and data governance.

Data Warehousing and Analytics

In the data warehousing space, Databricks' main competitor is Snowflake. Both companies aim to provide comprehensive solutions for storing, processing, and analyzing large volumes of data.

Snowflake has gained significant traction with its cloud-native data warehouse, offering seamless scalability and separation of storage and compute. Databricks differentiates itself with its lakehouse architecture, which combines elements of data lakes and data warehouses. This approach enables Databricks to handle both structured and unstructured data more efficiently than traditional data warehouses, making it particularly advantageous for companies dealing with diverse data types or implementing machine learning models.

Databricks’ foundation in Apache Spark also provides an edge in processing large-scale data and running complex analytics workloads.

Other competitors in this space include cloud providers such as:

  • Amazon Web Services (Redshift)

  • Google Cloud Platform (BigQuery)

  • Microsoft Azure (Synapse Analytics)

These cloud-native platforms offer integrated solutions that appeal to organizations deeply embedded in specific cloud ecosystems. Databricks counters this with multi-cloud support, allowing customers to avoid vendor lock-in and leverage services across AWS, Azure, and Google Cloud.

Machine Learning and AI Infrastructure

In the machine learning and AI infrastructure space, Databricks faces competition from specialized platforms like:

  • DataRobot

  • H2O.ai

  • Cloud-based services such as AWS SageMaker and Azure Machine Learning.

Databricks distinguishes itself with an integrated, end-to-end solution that spans data preparation, feature engineering, model training, and deployment. This holistic approach simplifies ML workflows, reducing the complexity of managing multiple tools.

The MosaicML acquisition in 2023 for $1.3 billion further bolstered Databricks’ capabilities, enabling it to train and deploy large language models (LLMs). This positions Databricks to compete with specialized AI infrastructure providers like OpenAI and Anthropic. Additionally, the open-source MLflow project—a platform for managing the ML lifecycle—has gained significant traction, establishing Databricks as a thought leader in the ML space and attracting potential customers for its commercial offerings.

Data Governance and Collaboration

As organizations grapple with increasing data volumes and regulatory requirements, data governance and collaboration tools have become mission-critical. In this segment, Databricks competes with:

  • Collibra

  • Alation

  • Informatica

The Unity Catalog, introduced in 2022, provides a unified governance layer across all data and AI assets within an organization. This seamless integration with Databricks' core platform supports data discovery, access control, and lineage tracking. While specialized governance tools may offer more depth, Databricks’ advantage lies in delivering governance capabilities natively within its processing and analytics environment.

Databricks also addresses collaboration needs through enterprise-grade notebooks and workspace features, competing with tools like Jupyter and Google Colab. Its enterprise security and scalability make it a more suitable choice for large-scale production environments.

Competitive Differentiators

Despite the intense competition, Databricks stands out with its unique strengths:

  1. Lakehouse Innovation: Combining data lake scalability with data warehouse reliability reduces the complexity of managing diverse data systems.

  2. Multi-Cloud Flexibility: Compatibility with AWS, Azure, and Google Cloud helps customers avoid vendor lock-in.

  3. Open-Source Roots: Tools like Apache Spark and MLflow create a strong developer community while lowering entry barriers.

  4. AI-First Strategy: MosaicML and integrated ML capabilities position Databricks as a leader in AI infrastructure.

Challenges in the Competitive Landscape

Databricks faces significant hurdles:

  • Entrenched Players: Snowflake and traditional data warehouses are deeply integrated into enterprise workflows, complicating migration efforts.

  • Dependence on Cloud Providers: Heavy reliance on AWS, Azure, and Google Cloud exposes Databricks to potential pricing or partnership shifts.

  • Lengthy Sales Cycles: Transitioning large enterprises to the Lakehouse platform often involves extended implementation timelines, delaying revenue realization.

Valuation and Investor Growth

Databricks’ journey from a scrappy startup to a $62 billion valuation is a testament to its innovative approach and market leadership. Over the past decade, the company has secured billions in funding from top-tier investors like Andreessen Horowitz, New Enterprise Associates, and Thrive Capital. These backers have consistently supported Databricks through its rapid growth, from its early days of pioneering Apache Spark to its current dominance in data analytics and AI.

The company’s valuation trajectory reflects its increasing market relevance, climbing from under $3 billion in 2019 to $43 billion by 2023, before hitting $62 billion in its latest round in December 2024. This incredible growth highlights investor confidence in Databricks’ ability to scale its Lakehouse architecture, expand AI capabilities through acquisitions like MosaicML, and capture a larger share of the booming data and AI markets.

Despite this momentum, Databricks has chosen to remain private. On December 17, 2024, CEO Ali Ghodsi explained, “It’s dumb to IPO this year,” citing macroeconomic uncertainties like inflation and interest rates. By staying private, Databricks avoids the volatility of public markets and retains the flexibility to focus on long-term growth and innovation without external pressures. For now, Databricks appears content to write the next chapter of its story outside the spotlight of Wall Street.

Pros and Cons

Pros

Cons

Explosive Revenue Growth: Databricks’ ARR grew from $1B in 2022 to $2.4B by mid-2024, reflecting a 60% YoY growth rate.

High Valuation Multiples: Databricks’ implied revenue multiple of 27.1x is higher than Snowflake's 21.5x, signaling premium pricing but potential overvaluation.

Lakehouse Architecture Leadership: Combines the scalability of data lakes with the structure of data warehouses, positioning Databricks as an innovator in unified data processing.

Reliance on Cloud Partners: Databricks depends heavily on AWS, Azure, and GCP infrastructure, making it vulnerable to shifts in cloud pricing or partnerships.

Strong Customer Retention: Boasts a Net Dollar Retention Rate of 140%, showcasing its ability to expand revenue within its customer base.

Competitive Landscape: Faces intense competition from Snowflake, AWS Redshift, Google BigQuery, and other major players in the data warehousing and analytics space.

AI and Machine Learning Tailwinds: Acquisition of MosaicML bolsters Databricks’ capabilities in training cost-effective large language models (LLMs), tapping into the $1.6T AI market.

Margin Compression: Gross margins declined from 85% in 2023 to 80% in 2024 as the company scales operations and invests heavily in AI infrastructure.

Enterprise Adoption and ACV Growth: Over 11,500 global customers, including Disney, Apple, and Toyota, with an average contract value rising to $208,696 as of June 2024.

Stickiness of Data Warehouses: Many enterprises rely on deeply entrenched data warehouse solutions, making it challenging to drive migrations to Databricks’ Lakehouse.

Multi-Cloud Flexibility: Supports AWS, Azure, and GCP, allowing customers to avoid vendor lock-in and optimize their cloud strategies.

Regulatory Challenges: Increasing data privacy regulations (e.g., GDPR, CCPA) require robust governance, adding complexity and potential costs for customers.

Collaborative Ecosystem: Databricks’ platform unifies data engineers, data scientists, and analysts, streamlining workflows and improving productivity.

High Cost of Innovation: Investments in R&D, such as MosaicML and AI initiatives, are resource-intensive and could strain profitability in the short term.

Global Market Leadership: Expanding international presence with enterprise-grade features like Unity Catalog for governance and Databricks SQL for BI.

Lengthy Sales Cycles: Transitioning enterprises from legacy data systems often requires extended sales and implementation timelines.

Open-Source Foundation: Apache Spark, Delta Lake, and MLflow create flexibility and a strong developer community while lowering barriers to entry for new users.

Execution Risk: As Databricks scales aggressively, maintaining operational efficiency and consistent service quality could become increasingly difficult.

How to Invest

Currently, accredited investors can purchase Databricks shares through Augment with a $25K minimum investment, reflecting an implied valuation of $62B—on par with the company’s current private valuation. For investors who believe in Databricks’ ability to maintain its leadership in data analytics and AI, and potentially outpace competitors like Snowflake, this represents a rare opportunity to join a rapidly growing industry leader. The terms are “0/0,” meaning there are no management or carry fees, and commitments must be made by December 27th.

While Databricks’ valuation places it ahead of Snowflake, this premium reflects its broader focus on AI, machine learning, and unified data solutions, which Snowflake does not address as comprehensively. Investors should weigh this expanded market scope against the potential for slower growth as the company matures. If you choose to invest, here are a few parting words of advice:

  • Accredited investors only: Private market opportunities require accreditation.

  • Trust the platform: Augment is a reputable company with a strong track record.

  • Expect delays: Private market transactions can take time to close, and not every deal goes through. Don’t be discouraged—other opportunities will follow.

As always, if you want us to clarify anything in this material, shoot us an email at [email protected], and we’ll respond as soon as we can.

Disclaimers Below

This material has been distributed solely for informational and educational purposes only and is not a solicitation or an offer to buy any security or to participate in any trading strategy. All material presented is compiled from sources believed to be reliable, but accuracy, adequacy, or completeness cannot be guaranteed, and Cold Capital makes no representation as to its accuracy, adequacy, or completeness.

The information herein is based on Cold Capital’s beliefs, as well as certain assumptions regarding future events based on information available to Cold Capital on a formal and informal basis as of the date of this publication. The material may include projections or other forward-looking statements regarding future events, targets or expectations. Past performance of a company is no guarantee of future results. There is no guarantee that any opinions, forecasts, projections, risk assumptions, or commentary discussed herein will be realized. Actual experience may not reflect all of these opinions, forecasts, projections, risk assumptions, or commentary.

Cold Capital shall have no responsibility for: (i) determining that any opinions, forecasts, projections, risk assumptions, or commentary discussed herein is suitable for any particular reader; (ii) monitoring whether any opinions, forecasts, projections, risk assumptions, or commentary discussed herein continues to be suitable for any reader; or (iii) tailoring any opinions, forecasts, projections, risk assumptions, or commentary discussed herein to any particular reader’s objectives, guidelines, or restrictions. Receipt of this material does not, by itself, imply that Cold Capital has an advisory agreement, oral or otherwise, with any reader.