Infra Play #122: The semiconductor angle
In the age of “you can just do things”, some have decided to go all in.
Last week we took a look at the future of AI from the perspective of the researchers working on training new models. In the spirit of exploring different mental models, this time we will pull back in a completely different direction: the hardware that drives both training and inference.
There has been a lot of back and forth on the timeline surrounding the recent Semianalysis article on Google’s TPUs. The implications of a shift in NVIDIA and CUDA’s dominance have significant implications for the industry, with many arguing that it could be either positive or a complete disaster.
This article will serve as a primer on the alternative hardware used in datacenters and how it relates to both training and serving models. It will also explore directional bets based on this information.
The key takeaway
For tech sales and industry operators: When technology shifts create genuinely new value chains, technical validation replaces relationship selling as the primary purchase driver, i.e. customers aren’t buying based on who took them to dinner but on who can demonstrably solve problems that couldn’t be solved before. In the case of TPUs, this dynamic is visible in the widening gap between leaders investing in compiler teams, custom kernels, and hardware relationships that create defensible positions, and laggards running the same open source inference stack as everyone else while competing purely on application layer differentiation that can be replicated in months. The same technical sophistication that separates winning buyers now separates winning sellers; if your customer’s infrastructure team can out argue you on TCO math, you’ve already lost the deal before pricing discussions begin. This means the winning sales motion in cloud infrastructure is fundamentally different from enterprise software sales of the past decade; if you cannot credibly explain the architecture, walk through the unit economics, and defend technical decisions against sophisticated buyer scrutiny, you will lose to competitors who can. The implications for GTM teams are significant: generalist enterprise sellers who relied on process mastery and relationship cultivation are being displaced by technical specialists who combine domain expertise with commercial capability. The organizations building sales teams of “selling engineers” rather than “engineers who support sellers” are capturing disproportionate share of AI infrastructure spend. The relationship sellers aren’t disappearing; they’re migrating upmarket into executive alignment roles or downmarket into transactional volume sales, but the technical middle where complex deals get won is no longer their territory. The surest way to escape commodity competition in this market is developing genuine technical fluency that most sellers consider beneath them or beyond them.
For investors and founders: NVIDIA’s gross margins in the 70%+ range are extraordinary by any historical standard and imply either an insurmountable moat or an unsustainable position that competitors will eventually arbitrage away. The pattern of margin compression following viable alternative entry repeated with PCs, servers, networking equipment, and storage. The question is whether AI accelerators follow the same trajectory or whether CUDA’s ecosystem lock-in breaks the pattern. The next four to six quarters will reveal which interpretation is correct as TPUv7 and Trainium3 reach production scale. The critical leading indicator is not revenue growth but gross margin trajectory: if NVIDIA maintains margins while competitors ship viable alternatives, the moat thesis is validated; if margins compress despite growing revenue, the monopoly-rent thesis is validated and terminal value estimates need significant revision. For portfolio construction, the question is whether you want exposure to NVIDIA specifically or to AI infrastructure broadly; the latter can be achieved through diversified positions across hyperscalers, neoclouds, and frontier labs that benefit regardless of which hardware platform dominates. For founders, the Anthropic playbook is the clearest template available: secure commitments from multiple infrastructure providers, drive active competition between them for your workloads, and capture the margin benefit that flows from their desperation to win flagship AI customers. The recent 67% price cut on Opus 4.5 API pricing, enabled by TPU inference with lower TCO, demonstrates that infrastructure optionality translates directly into pricing power that can be weaponized against competitors locked into single vendor economics. Your infrastructure vendor selection should weight operational execution and deployment track record at least as heavily as benchmark specifications. The fastest chip in the world creates no value sitting in a warehouse waiting for datacenter readiness, and the cheapest compute creates no value if your team lacks the compiler expertise to utilize it efficiently. The founders who treat infrastructure as a strategic function and not a cost center managed by ops teams are the ones building durable cost advantages that compound over multiple product cycles.
Vertical integration is the name of the game
The two best models in the world, Anthropic’s Claude 4.5 Opus and Google’s Gemini 3 have the majority of their training and inference infrastructure on Google’s TPUs and Amazon’s Trainium. Now Google is selling TPUs physically to multiple firms. Is this the end of Nvidia’s dominance?
The dawn of the AI era is here, and it is crucial to understand that the cost structure of AI-driven software deviates considerably from traditional software. Chip microarchitecture and system architecture play a vital role in the development and scalability of these innovative new forms of software. The hardware infrastructure on which AI software runs has a notably larger impact on Capex and Opex, and subsequently the gross margins, in contrast to earlier generations of software, where developer costs were relatively larger. Consequently, it is even more crucial to devote considerable attention to optimizing your AI infrastructure to be able to deploy AI software. Firms that have an advantage in infrastructure will also have an advantage in the ability to deploy and scale applications with AI.
While hardware has always been relevant in terms of what you can run in a datacenter, for most applications in the cloud era this has not been a significant point of discussion. While companies like to overstate their “complex technology,” the reality is that most of the code being used today in production can run on surprisingly low-end hardware. It’s a bad comparison, but a recent MacBook Pro is more powerful than most AWS EC2 instances available today to rent (i.e., the most widely used rented hardware in a data center). The reason why this is a simplistic way to look at it is because the point of cloud is to be able to scale many of these instances toward a single application, and there are other factors in performance (including location, tech stack to deploy the VMs, and networking) that impact performance. Cloud has played a significant role in making on-demand compute availability scalable, rather than offering the type of hardware you could never afford as a company. In fact, for many SaaS organizations, the investment of moving from their own datacenters to the cloud has been counterproductive if the goal was to save costs.
AI dramatically changes this dynamic.
Keep reading with a 7-day free trial
Subscribe to Infra Play to keep reading this post and get 7 days of free access to the full post archives.

