Case Studies

DeepSeek AI: A Deep Dive

The True Impact of DeepSeek's Latest Release
January 31, 2025,
5:24 pm

Some thoughts on DeepSeek

This past week, when DeepSeek's AI app hit #1 on Apple's App Store, it triggered a $1 trillion market selloff and widespread panic about the future of AI companies. It also sparked numerous reactions, both good and bad. What most of these takes (and panic) missed is a trend with far wider implications beyond market movements:

AI democratization is happening faster than expected. The barriers to entry are crumbling, impacting every organization's AI strategy.

This piece will examine how DeepSeek's achievements are accelerating the commoditization of AI models and ushering in a new era focused on applications rather than model development. We will explore why this matters through three sections: the economic implications for the AI industry, the technical innovations that made it possible, and most importantly, a reason or two as to why you and your organization should care.

The Major Players

Before all that though, let’s walk through the relevant players and characters in the story. We’ll start with the names everyone has been talking about:

  • High-Flyer: a Chinese hedge fund founded in 2016 by three Zhejiang University graduates. The most well-known of those founders is Liang Wenfeng, now CEO of DeepSeek. Originally, their group focused on algorithmic trading and high-frequency trade strategies, which did fairly well all things considered. One important move they made was a prescient investment in 10,000 A100 GPUs in 2021, before export restrictions took effect. This hardware foundation, combined with their trading success, enabled them to self-fund an ambitious AI research spinoff without external capital, also known as:

  • DeepSeek: Led by Wenfeng, with ~150 employees and a robust infrastructure of ~50,000 Hopper GPUs (including H100s, H800s, and H20s), the lab has an open-source ethos and server infrastructure worth about $1.6B in CapEx and $944M in operating costs, which they control at a data center level. This infrastructure has allowed them to establish a reputation domestically and internationally for their ability to optimize across the hardware stack, including bypassing CUDA and going deeper into the metal, leading to lower cost and higher efficiency training strategies. They’ve released several models competing with rivals in traditional chat applications, API-based inference, and now, reasoning.

  • OpenAI: Unfortunately, these guys need no introduction. As the biggest AI brand and de facto leaders in the “AI Arms Race,” they have set many of the industry’s expectations around compute requirements and capital needs. Their release of o1 in 2024 established a benchmark for reasoning models, which DeepSeek challenged. They’ve lost a lot of shine after DeepSeek’s more efficient approach became public, with a broader market reassessment of OpenAI’s capital-intensive strategy contributing to a broader tech selloff approaching uh a trillion dollars in market value (yep, that is with a t).

  • Anthropic: The techno-hipster’s preferred AI lab, is well known for their Claude family of models and ecosystem (chat bot, APIs, etc). Not nearly as well-capitalized as OpenAI or as well-known by Main Street, Anthropic nevertheless represents another major Western AI lab, backed by the other hyperscalers (AWS & Google are both investors). They also…often ask for more money.

  • Mistral: “Europe’s AI champion” or something along those lines. A French startup, they offer both closed and open source models. They have less capital at their disposal than their American counterparts, which means they’ve done a lot of work with efficiency and lowered training resource requirements, similar to DeepSeek. However, they haven’t really ever set off a dramatic unfolding of global narratives, so uh there’s that.

Other characters involved in slightly less direct ways include everyone’s favorite cloud providers like Microsoft and AWS. Oracle too, I guess. Probably should get used to them being involved in things from now on.

Why does it matter?

Economic Implications

As mentioned in the introduction, this past Monday, markets opened with a massive selloff affecting AI-focused companies like NVIDIA (down 13% at one point) and AI-adjacent tech firms like Microsoft and Meta.

Why? Well, DeepSeek is why. Specifically, DeepSeek R1. R1 is an open source “reasoning” model recently released that competes with the premium offerings from closed competitors like OpenAI’s o1. Dropping this model forced an analysis of a narrative that goes something like this:

The modern machine learning industry, aka the one revolving around “generative AI,” has operated on a simple assumption: better models require massive investments in compute, data, and talent. This “scaling law” (observed and uh perpetuated by Western labs) had some pretty intense implications that boiled down to a fervent need (desire?) for more capital expenditures, infrastructure build-outs, and generally speaking, more of everything (including head count and salaries for certain people).

DeepSeek's recent releases challenge this foundational belief. While their reported $6M training cost for DeepSeek V3 doesn't include research and development expenses, even accounting for their full infrastructure costs, they've achieved results competitive with companies spending many times more. This efficiency gap explains why markets reacted so dramatically - if AI development doesn't require the massive capital expenditures previously assumed, the industry's economics need reevaluation.

Here’s another fun angle to consider. The drop-off happened this week, but DeepSeek has been releasing increasingly competitive models over the last couple years. Their paper actually dropped over a week ago. Why the lag in reaction? Let’s walk through the timeline. DeepSeek drops their paper, showing they can train models for way cheaper than anyone thought possible. Markets basically yawn. Then they release R1, which can actually go toe-to-toe with OpenAI’s latest. Markets still sleeping. But then their app hits #1 on the App Store and suddenly everyone loses their minds?

What changed? Well, two things. First, just like when OpenAI dropped ChatGPT, giving users a familiar interface makes everything real. It goes from some academic paper to a tangible thing that people can use…in this case, a thing they can use instead of ChatGPT. Second, it exposed the fragility of model development moats. If a little-known company could build something this good, this cheap, and get this much adoption this quickly...what does that say about all those billions being poured into AI companies?

The market underestimated how quickly AI could be commoditized. Think about it - even a year ago, the story was "you need tens of billions of dollars and the best researchers in the world to compete in AI." DeepSeek proved that's not true anymore. And when markets finally got that memo, via an app store ranking of all things, they had to dramatically recalculate these companies' worth.

Technical Implications

Now that the money talk is out of the way, let’s jump into the other relevant factors: technology. Costs don’t drop for no reason. The underlying trends driving the improved economics are the innovations in training techniques in the DeepSeek R1 paper.

I’m no expert researcher, so I’ll avoid the extremely technical details, but we can explore the high-level ideas. DeepSeek’s technical achievements stem from a counterintuitive source: limitation. The export controls mentioned above had several effects (probably not as large or as dramatic as some have painted), but still, the main effect is that it placed constraints on teams in China like DeepSeek, creating a need for alternative solutions. Necessity is the mother of invention.

Instead of being able to basically just throw more capital and more compute at problems (aka buying better chips), DeepSeek’s team had to find workarounds around the limitations of the hardware that they did manage to gather before the effects of the restrictions hit. This led them to explore solutions that people in the West simply didn’t need to, because the West had…well…uh they had a lot of money and a lot more freedom to procure things. It’s not a case of a pure intellectual advantage (the take here is not necessarily “China is leapfrogging the US in innovation”), rather, it’s the advantages created by properly aligning incentives. I’m quite confident the American labs could have come up with some of the innovations leading to DeepSeek’s success. They just…never had to, they simply weren’t incentivized to do so. While Western labs could throw virtually unlimited resources at AI development, export controls forced DeepSeek’s team to find creative solutions with constrained resources. This necessity drove innovations in training techniques that have now been published openly, accelerating AI models’ commoditization.

However, just to quickly play a bit of devil’s advocate on myself, here’s some nuance to the perspective on the role of export controls. The reality of export controls’ impact is a bit more complex than initially appears. While restrictions theoretically limit access to cutting-edge hardware in China, DeepSeek’s story actually challenges this simplistic narrative. According to SemiAnalysis, their infrastructure investments exceed $500M even after accounting for restrictions - hardly the picture of a resource-starved operation. So while it’s true that DeepSeek’s team did not have easy access to the most advanced components, it’s not like they were playing with just a couple server racks and old graphics cards. Export controls may have played a role, but perhaps not as large as some originally thought.

Anyway, the implications of the innovation shown with this release extend far beyond just DeepSeek, even with a more nuanced take on the export restrictions. Their work demonstrates that competitive AI development is possible without hyperscaler-level resources. First, the nature of their model architecture leverages a mixture of experts technique, which is like having a bunch of specialist consultants (what researchers call "experts") working together, with a really smart secretary (their "gating network") figuring out which specialist should handle what part of your request. This increases efficiency because you're not bothering the whole team for every little thing, as it were. Additionally, their work on Multi-head Latent Attention (MLA) and Multi-Token Prediction (MTP) led to immense breakthroughs in memory constraints and computational costs at an unprecedented scale. This includes reducing the memory needed for processing conversations ("KV Cache") by over 93% and using additional attention modules to predict multiple upcoming tokens instead of just one token at a time. In plain English? That means these models require less energy usage, cost way less, and can handle longer conversations without choking, at a scale considered impossible by what was conventional wisdom. This is just scratching the surface, as they have done great (albeit controversial) work in other areas like model distillation as well.

So, with all that said, what does this mean for technologists? Well, if the release marked a negative moment for markets, you can argue that it should mark an extremely positive moment for developers, builders, and generally, anyone not at a blue chip lab or hyperscaler. Unlike some prior eras, there was initially a limited sense of tinkering and bottom-up innovation with the generative AI boom. The prevailing wisdom was that to play this game, you needed state level infrastructure, whether it was through strategic partnerships, monopoly profits, or sovereign wealth. But this release blows that up. DeepSeek has shot to the top of the app store, with researchers already replicating their techniques, and major players like Microsoft’s Azure already offering the model through their cloud platform. This is not a pure win for DeepSeek (they uh still have some issues to take care of), and there’s plenty of concerns from a privacy and governance perspective, but the fact remains:

This moment should serve as a warning shot to the entire ecosystem that the competition is still wide open.

No model can resist commoditization, which means it’s highly unlikely the real winners will come from this part of the layer. Instead, it’s increasingly likely that they will be built on top of it. The era of applications has begun.

Conclusion

Okay, so some interesting ideas and things to think about from geopolitical, economic, and technical perspectives. Why should you or your organization care? Well, DeepSeek's rise represents more than just market drama or technical breakthroughs - it signals a fundamental restructuring of the AI landscape. In short, the reason is (forgive the cliche)…democratization.

This democratization manifests in three ways:

First, the economics have fundamentally changed. The implications of cheaper, more widely available models, research techniques, and overall development mean that there will be more efficient and cost-effective ways to tinker, experiment, and adopt. It’s not just DeepSeek by itself. It’s the proliferation of other open source models like Llama as well. When you can get ~SOTA performance at a fraction of previously assumed costs, the entire industry has to rebuild its assumptions.

You can already see this playing out as OpenAI adjusts o1 pricing in response to open-source competition. Because of the open source alternatives, consumers and enterprises will have more leverage with vendors (you can already see this happening with OpenAI adjusting their o1 pricing this week). IT departments can treat procurement like they do with any other SaaS or API provider. Now, legal and product can reach more amicable compromises, with builders getting, if not equal, than close to equal quality of inference output from models that can be hosted and managed by internal teams instead of a vendor.

Second, the technical innovations - from MLA to efficient training techniques - aren't just staying locked in research labs. They're being published, replicated, and improved upon by a growing community of developers. What was once the domain of hyperscalers is becoming accessible to teams of all sizes.

Finally, and most importantly for organizations, the practical barriers to adoption are dissolving. IT teams can now treat AI like any other technology procurement. Legal and product teams can find middle ground with models that can be hosted and managed internally. The focus is shifting from who can build the biggest model to who can apply these tools most effectively.

While this may spell the end of a certain period in the AI story, a brand new one is just starting, with more space for the rest of us.

Disclaimer: this is most definitely not legal or financial advice. Just some things to think about.

Further Reading & Credits

There have been a lot of sharp and insightful pieces that I used as both references and inspiration throughout the week, as well as open-sourced documentation dropped by DeepSeek. Full credit to them:

  • https://www.404media.co/deepseek-mania-shakes-ai-industry-to-its-core/
  • https://semianalysis.com/2025/01/31/deepseek-debates/
  • https://www.bloomberg.com/news/newsletters/2025-01-27/17-thoughts-about-the-big-deepseek-selloff
  • https://www.chinatalk.media/p/deepseek-what-the-headlines-miss
  • https://epoch.ai/blog/trends-in-the-dollar-training-cost-of-machine-learning-systems?

Bottom Text

What if we emailed you the secrets to the entire universe?

We wont, but that’d be cool, right?

Wait, there's more!