Why behind AI: Build vertically, open horizontally

Apr 01, 2026

As part of GTC week, there are a lot of recorded conversations that Jensen is having with investors, reporters and other executives. Most of them are just a repetition of the GTC keynote that I've already covered, but we occasionally get somebody opinionated (and well-plugged-in) like Ben Thompson to push Jensen into providing stronger argumentation on some of his ideas.

BT: “Just to go back to that software bit, you mentioned Excel wasn’t designed to be used by AI. You have things like Claude has this new functionality to use Excel, so when you talk about that, you want to invest in these libraries, is that to enable models like that to do better? Or is that something for Microsoft or for enterprises?”
JH: “AI is going to use Excel, AI is going to use Photoshop, AI is going to use logic synthesis tools, Synopsis tools, and Cadence tools. Those tools have to be super-accelerated, they’re going to use databases they have to be super-accelerated because AI’s are fast. And so I think in this era, we need to get all of the world’s software now as fast as possible accelerated, and then put them in front of AI so that AI could agentically use them.”
BT: “They’re gonna need to do it way faster.”
JH: “They’re gonna need to do it way faster.”

One of the most difficult mental models for most is the idea where enterprise software is going in the (very near) future. If we see agentic workflows as leading, then logically speaking all software becomes infrastructure for agents. The problem with the current incumbents then is that they focus on a UX that's not suitable for agents and their vision of agentic workflows is to integrate them within the same architecture. This doesn't make a lot of sense, even if you take the basic difference between doing a task with clicks and scrolling vs API speed. The vendors who adapt their data layers to this future are likely to capture a lot of the workloads rapidly.

BT: “You’ve talked a lot about accelerated computing, I think you’ve trash talked as it were, maybe the CPUs to the day, they’re all gonna be removed, like everything’s gonna be accelerated. Suddenly CPUs are hot again. It turns out they’re pretty useful and important to the extent you are selling CPUs now, how’s it feel to be a CPU salesman?”
JH: “There’s no question that Moore’s law is over. Accelerated computing is not parallel computing... We were never against CPUs, we don’t want to violate Amdahl’s Law. Accelerated computing, in fact, inside our systems, we choose the best CPUs, we buy the most expensive CPUs, and the reason for that is because that CPU, if not the best and not the most performant, holds back millions of dollars of chips.”
BT: “When it comes to branch prediction, you worried about wasting CPU time, now you’re worried about wasting GPU time.”
JH: “That’s right, you just never can have GPUs be squandered, GPU time be idle... The way that CPUs were designed in the last decade, they were all designed for hyperscale cloud and the way that hyperscale cloud monetizes CPUs is by the CPU core. So you want to design CPUs that have as many cores as possible that are rentable, the performance of it is kind of secondary... For tool use, where you have this GPU waiting for the tool use — you want the fastest single-threaded computer you can possibly get. Vera’s bandwidth-per-CPU is three times higher than any CPU that’s ever been designed.”

What we do with the existing and future CPU compute is an interesting topic, since for all intents and purposes the majority of hyperscaler datacenters are filled with CPUs, not GPUs. With agentic AI being focused on applications, Nvidia can try to get "accelerated" computing a push (i.e. let existing applications leverage the GPU to conclude certain workflows faster), but practically speaking, they still need CPUs to reach a full outcome. The main gap right now, which is a fair one, is the idea that if GPUs and accelerated compute lead to a lot of fast workflows with agents that are still stuck waiting on slow existing CPU architecture, there will have to be some sort of redesign in order to alleviate these bottlenecks. That could be a whole build-out cycle by itself, outside of the current AI compute investment cycle.

BT: “I do have to say, last year you had a slide talking about this Pareto Curve, and you talked about, I think it was when you introduced Dynamo, how your GPUs could cover the whole thing, and so you didn’t have to think about it, just buy an Nvidia GPU, and Dynamo will do both. But now you’re here saying, ‘Well, it doesn’t quite cover the whole thing’.”
JH: “We cover the whole thing still better than any system that can do it. Where we could extend that Pareto is particularly on the extremely high token rates and extremely low latency... because of coding agents, because they’re now AI agents that are producing really, really great economics, and because the agents are being attached to humans that are actually making extremely, I mean, they’re extremely valuable.”
BT: “They’re even more expensive than GPUs.”
JH: “And so I want to give my software engineers the highest token rate service, and so if Anthropic has a tier of Anthropic Claude Code that increases coding rate by a factor of 10, I would pay for it, I would absolutely pay for it.”
BT: “So you’re building this product for yourself?”
JH: “I think most great products are kind of because you see a pain point and you feel the pain point and you know that that’s where the market’s going to go.”

If we take the logic of the Groq acquisition two steps forward, there are two things that stand out. The first one is that Jensen spent a lot of time in recent years claiming that the latency extreme of inference was irrelevant, which clearly is not. Second, the talk track is now shifting to admit that not only is the latency extreme important, but it's arguably where the highest ROI for tokens sits. While commodity workloads might trend towards almost zero cost for inference (text editing), the value outcomes of fast coding or medical research can be measured in the millions of dollars. The companies that can offer these premium knowledge+inference speed models will likely be the biggest revenue winners, which to a certain extent we can already see with the rapid growth that Anthropic had in the last nine months.

BT: “Was that a bit of a problem with Blackwell? I’ve heard mutters that the training runs were maybe a little more difficult than they were sort of previously.”
JH: “The challenge with Blackwell was 100% NVLink 72, NVLink 72 was backbreaking work. And it was the only time that I thanked the audience for working with us.”
BT: “Yeah, I noticed when you said that today, it came across as very sincere.”
JH: “Yeah, because we tortured everybody, but everybody loves it now.”

One of the most underreported themes of 2025 was that while reasoning was extremely useful for improving practical outcomes (and token usage), companies like OpenAI really struggled to deliver better base models because of the difficulty of procuring, deploying, and then scaling Blackwell capacity.

BT: “Am I right to tease out a consistent element here, where you’re happy to supply the leading provider, or the inventor in a space with chips, but then you’re going to fast follow what they do for everyone else that is threatened by them? So you simultaneously broaden your customer base, you’re not just dependent on the leaders, but then also the leaders are helping you sell to everyone else because they’re worried about being left behind.”
JH: “No, nothing like that. We’re at the frontier on so many different domains. In a lot of ways, we are the leader in many of these domains, but we never turn them into products. We’re a technology stack and so we have to be at the frontier, we have to be the world leader of the technology stack, but we’re not a solutions manufacturer, we’re not a service provider.”
BT: “Will that always be the case?”
JH: “Yeah, always be the case. There’s no reason to, and we’re delighted not to.”

Nvidia supplies Tesla with chips while building its own autonomous vehicle stack. It supplies OpenAI while building open source models. It supplies the hyperscalers while building AI factory software. Jensen calls this building vertically and opening horizontally.

The reality is that every infrastructure company that has ever said it would never compete with its customers has eventually faced the moment where the economics of not competing became too painful to sustain. Nvidia is not there yet but pretending that this “will always be the case” and that they’ll remain at the infrastructure layer is optimistic, to say the least.

BT: “Well, it’s funny though, if you go back to like your boards, for example, like the products you ship, more and more of that, there’s what, 30,000 specific SKUs in a rack today or something like that. More and more of those are defined by you. Is there a bit where that’s gonna happen on the software side too?”
JH: “We create a thing vertically and then we open it horizontally and so everybody could use whatever piece they would like... We have to build it vertically, we have to integrate it vertically and optimize it vertically. But afterwards, we give them source, we give them — they just figure out how they want to do it.”
BT: “As long as they’re running on Nvidia chips?”
JH: “Whatever piece they would like, they don’t have to use all Nvidia chips, they don’t have to use all Nvidia software.”

The reality is that the performance result that makes the vertical integration worthwhile only fully materializes on Nvidia hardware. The openness is real but the optimization is not portable. This is the CUDA dynamic restated for the systems era. CUDA was open in the sense that anyone could write CUDA code, but CUDA code only ran on Nvidia GPUs. Dynamo is open in the sense that anyone can deploy it, but Dynamo's full capability only surfaces on Nvidia infrastructure.

BT: “Is it fair to say, is there a bit where Nvidia is actually the biggest beneficiary of scarcity, though, to the extent it exists? Like, if there’s a power scarcity, you’re the most efficient chip, so you’re going to be utilizing that power better. Or if there’s fab capacity, like you just said, you’ve been out there securing the supply chain, you got it sort of sorted, are you the big winners in that regard?”
JH: “Well, we’re the largest company in this space, and we did a good job planning. And we plan upstream of the supply chain, we plan downstream of the supply chain and so I think we’ve done a really good job preparing everyone for growth.”

The Nvidia performance-per-watt advantage means that in a power-constrained world, customers achieve more intelligence per megawatt on Nvidia hardware than on alternatives. Their supply chain depth, built over years of planning upstream and downstream simultaneously, means they have priority access to constrained fab capacity that competitors cannot replicate in the short term. TSMC capacity, CoWoS advanced packaging, HBM allocation: Nvidia has preferred access across all three. Every quarter that supply remains tight is a quarter that moat compounds.

BT: “Right, but is this a bit where, at its core, why not having access to the Chinese market maybe is a threat? Like if China ends up with plenty of power and plenty of chips, even though those chips are only 7nm, they have the capacity to build up an ecosystem to potentially rival CUDA in the long run, is that the concern that you have?”
JH: “There’s no question we need to have American tech stack in China... No country contributes more to open source software than China does and we also know that 50% of the world’s AI researchers come from China... DeepSeek is not a nominal piece of technology, it’s really, really good. And Kimi is really good, and Qwen is really good... To the extent that American tech stack is what the world builds on top of, then when that technology diffuses out of China, which it will, because it’s open source, and when it comes out of China, it goes into American industries, it goes into Southeast Asia, it goes into Europe, the American tech stack will be prepared to receive them.”
BT: “Yeah, when we talked last time, the Trump administration had banned the H20. Were you surprised you were able to get the Trump administration to see your point of view? And then were you even more surprised that now you’re stymied by the Chinese government?”
JH: “I’m not surprised by us being stymied by them and the reason for that is because, of course, China would like to have their tech stack develop. In the time that we’ve left that market, you know how fast the Chinese industry moves, and Huawei achieved a record year for their company’s history.”

I think that in the last two years Jensen has been pretty consistent on advocating for Nvidia chips being made available on the Chinese market and was even able to win part of the argument by lifting some of the US export controls. What happened afterwards of course was getting slapped back by the Chinese government, who preferred to support the internal ecosystem, while still acquiring compute through black market channels as seen with the Supermicro scandal. While I think that it's better for Nvidia to double down on ensuring western companies are the ones that win the AI race, there is some argument to be made that having relevant tech leaders be brokers (and almost diplomats) between the two sides is a net benefit.

BT: “Everyone was scared instead of optimistic.”
JH: “That’s right, and I think it has two fundamental problems. In this Industrial Revolution, if we don’t allow the technology to diffuse across the United States and we don’t take advantage of it ourselves, what will happen to us is what happened to Europe in the last Industrial Revolution... I hope that we have the historic wisdom, that we have the technological understanding and not get trapped in science fiction, doomerism, these incredible stories that are being invented to scare the living daylights out of policy makers who don’t understand technology very well.”
BT: “I think a characteristic you see all the time is people put on their big thinking hats and try to tease out all these nuances and forget the fact that actual popular communication is done in broad strokes. You don’t get to say, ‘Oh, you’re a little scared of this, but not this XYZ’ — you’re just communicating fear as opposed to communicating optimism.”
JH: “Yeah, and somehow it makes them sound smarter... sometimes it helps them with their fundraising and sometimes it helps them secure regulatory capture. So there’s a lot of different reasons why they do it, and these are incredibly smart people but I would just warn them that most of these things will likely backlash.”

How Washington resolves the doomer-versus-accelerationist tension will shape datacenter buildout timelines, export policy, and model deployment rules for the next decade. The current administration is still leaning toward the e/acc side and Anthropic did not do itself any favors by getting into conflict with the DoW.

BT: “This is the second time we’ve had a chance to talk in person, and my takeaway when I met you previously in Taipei was the extent that Nvidia still feels like a small company. Are you worried about getting stretched too thin, or do you still think you have sort of that CUDA-esque flywheel where, ‘It looks like we’re doing a lot, we’re just kind of doing the same thing over and over again?’”
JH: “The reason why Nvidia can move so fast is because we always have a unifying theory for the company, and that’s my job, I need to come up with a unifying theory for what’s important and why things connect together and how they connect together and then create an organization, an organism that’s really, really good at delivering on that unifying theory.”
BT: “That whole first hour of the keynote felt like you talking to your employees, reminding them of what you do.”
JH: “It’s important that we’re always constantly reminded of what’s important to us and AI is important to us, but of course CUDA-X and all of the solvers and all of the applications that we can accelerate is really important to us.”

I’ll conclude this article by highlighting the idea that what separates good from great companies is having a unified vision (or theory) of what the purpose, structure, and agenda for this organization is. The Nvidia unified theory is that they provide accelerated computing across the full stack, which is what allows them to not simply make GPUs, but full datacenter hardware and software infrastructure, robotics, open source models, self-driving tools, weather predicting models and many other initiatives that still fit the core vision. Funnily enough, the unified theory is also a constraint. The problem with most large SaaS players for example is that they don’t have a unified vision, they offer a portfolio of products, then slap a marketing story around being a platform. One simple way of thinking about it is asking to explain the unified theory of your company in a sentence.

Most companies fail that test. Nvidia doesn’t and that unified theory of the company drove it to become the (unlikely) most valuable company in the world.

Infra Play

Discussion about this post

Ready for more?