Why behind AI: The Google performative comeback

May 20, 2026

Google I/O is the most important developer event for the company and it's happening at a critical moment, as Claude Code and OpenAI Codex have overtaken the Gemini models in industry-wide coding use case adoption. While GCP is showing its highest growth in years (thanks to some significant changes in their sales teams and approach to customers), the biggest moneymaker in AI today remains coding. As such, this year's Google I/O was a great opportunity to strike back at the frontier labs and put Google back on the map.

Let's review their announcements, in the order that Google themselves presented them:

AI momentum across the full stack

These stories of how people are using AI are the best measure of progress. To understand the scale at which people are adopting AI, there is another great proxy — tokens, the fundamental units of data our models process, many representing a problem being solved.

Two years ago, we were processing 9.7 trillion tokens a month across our surfaces — a huge number. Last year at I/O, that grew to roughly 480 trillion tokens. Fast forward to today, that number jumped 7x to over 3.2 quadrillion per month.

It tells an important story about our products and how others are building as well — especially developers and enterprises:

Over 8.5 million developers are now building new apps and experiences with our models monthly.
Our model APIs are now processing roughly 19 billion tokens per minute.
Over the past 12 months, over 375 Google Cloud customers each processed more than one trillion tokens, representing incredible demand for AI from across industries.

The AI token factory is no longer a meme, as the total consumption of tokens has ballooned exponentially. The figures themselves, however, are quite revealing, when we account for stats like the OpenRouter usage (where Gemini models have traditionally been widely adopted):

According to this, in May so far the Gemini models have been used to generate around 11T tokens, which can be seen as quite big (as much as the full monthly usage two years ago!) or quite a significant drop (0.35% of the current monthly run). What this tells us is that the significant reduction in Gemini model usage for coding is not imaginary.

Momentum with our products

Today we have 13 products with over a billion users each. Five of those have more than 3 billion users.

Our Gemini models are a big reason more people are using our products, and why they’re using our products more.

It all starts with Search, which is bringing the benefits of generative AI to more people than any other product in the world. AI Overviews now has over 2.5 billion monthly active users. And AI Mode has been a revelation, our biggest upgrade to Search ever. People love it, and in just a year, it’s already surpassed 1 billion monthly active users.

When people use our AI-powered features in Search, they use Search more. Search has become less about individual queries and feels more like an ongoing conversation, giving you deeper insights and connecting you with the vastness of the web.

Another place where we’ve been rapidly innovating is in the Gemini app. Last year at I/O, the Gemini app had 400 million monthly active users. Today, we’ve surpassed 900 million, more than doubling in a year. In that same time, daily requests have grown over seven times.

We’ve been adding a lot of unique features like Personal Intelligence, which make responses more customized and helpful. And to date more than 50 billion images have been generated with our Nano Banana image generation models. It was a breakout star this past year, showing how much latent creativity there is in the world.

So there is a place where all of these tokens are going, and it's clearly not in coding usage. Google has been redirecting its resources toward powering AI experiences in its search products, most of it essentially subsidized.Natural, conversational AI in products

There’s also a lot of latent productivity to be unlocked. Over the last year, we’ve been bringing the ability to have more natural conversations with Gemini directly inside our products. Recently, Maps got its biggest upgrade in a decade, including a new feature called Ask Maps. People are using Ask Maps for more complex, and much longer questions.

Now we’re bringing more natural conversational AI to more products.

Ask YouTube

People come to YouTube everyday to ask a lot of questions. There’s a lot of great videos, but sometimes it’s hard to know where to start.

Ask YouTube entirely reimagines the experience, making information much more digestible and easy to navigate. You’ll see videos that best match your interest, and most importantly, it jumps right to the part of the video most relevant to you.

A search results page from "Ask YouTube" answering the question, "How to teach my 3 year old how to ride a pedal bike, they already know how to ride a balance bike?" and showing a video of a child on a bike.

We’re starting to test Ask YouTube now, and it will roll out broadly in the U.S. this summer.

Those tokens are now also going to be burned across traditional Google surfaces such as Maps and YouTube. On the question of why I need to go specifically to YouTube to find these insights instead of bringing them into the supposedly widely adopted Gemini app with 900M users, well, that's a question for another day.

Voice-powered Docs Live

There are a lot of times I want to get things done at the speed of my voice. That is much more possible today thanks to technical leaps in our audio models.

A new feature called Docs Live takes this to another level. To create a doc with Gemini before, you had to type out a precise prompt. With Docs Live, you can just verbally “brain dump” whatever is on your mind, and let Gemini do the rest. Here’s a demo in real-time:

In the future, you’ll be able to create new docs and edit them directly, all with your voice. Docs Live is rolling out for subscribers this summer, and powerful voice capabilities will come to Gmail and Keep then too.

We are now into the third announcement for the developer event and Sundar is pitching how you can yell at a docs file and get a formatted version out of it.

Infrastructure supporting innovation at scale

It’s incredible to see the pace of innovation rolling out across our products. Supporting all of this scale for our users, while also serving enterprises and developers around the world, requires massive investments in infrastructure. We’ve been investing for now and for the future. In 2022, we were spending $31 billion annually in capex. This year, we expect that number to be about six times that, approximately $180 to $190 billion. A key part of this investment is our custom silicon.

A decade ago, we announced our very first commercial tensor processing unit, or TPU, on the I/O stage. Since then, we have transformed how the industry builds for AI. We recently announced our 8th generation of TPUs at Cloud Next. For the first time, we’ve taken a dual chip approach with specialized architectures for training and inference: TPU 8t and 8i.

TPU 8t is optimized for large-scale pretraining, and it’s nearly three times the raw computing power of our previous generation. We’ve taken a fundamentally different approach with our training infrastructure. With JAX and Pathways, our training is no longer constrained by the limits of a single, massive data center. Instead, we can now seamlessly distribute training across multiple sites, scaling training across more than 1 million TPUs globally. This gives us the ability to create the largest training cluster in the world. For model builders, this means training larger, more capable models in weeks rather than months.
TPU 8i is designed for inference. We have dramatically improved speed at every step. Because if we learned anything in 27 years of working on Search, it’s that latency matters.

In addition to speed, we’re also thinking about scaling sustainably. Both chips are more energy efficient, delivering up to two times better performance-per-watt.

While for developers it might not be very exciting to hear the company is going to be spending six times more CapEx this year, it's definitely bullish for The Infra Play portfolio focused on the bottlenecks of the AI buildout:

Infra Play #141: Q2'26 Infra Play portfolio

The Deal Director

Apr 26

Infra Play #141: Q2'26 Infra Play portfolio

Being in the public markets over the last year has been a rocky experience. After most of 2025 was dominated by the back-and-forth on tariffs, 2026 has been overshadowed by what has long been considered one of the worst possible scenarios: a hot war with Iran.

Read full story

Gemini Omni

This progress with TPUs is how we can make compute advances across models, coding and agents. With world models, AI is moving from predicting text to simulating reality. We have been working to push the boundaries of what these models can do.

Gemini Omni is our new model that is capable of generating samples in any output modality from any input. We’re starting with video outputs, and over time we’ll enable image and text. This new model combines Gemini’s intelligence with our generative media models — a huge leap forward in world understanding. We’re launching the first model in the Omni family: Gemini Omni Flash.

Gemini Omni Flash is available starting today. You will be able to try it on the Gemini app, Google Flow and on YouTube Shorts. We’ll also be rolling it out to developers and enterprise customers via APIs in the coming weeks.

Gemini Omni looks genuinely exciting, assuming it can actually be productized properly. It’s difficult to overstate how badly Google misplayed the Nano Banana lead that they had, as they offered very limited generations and censored usage significantly. OpenAI’s own image models today are able to deliver much more interesting visual work (including for public figures) without any of these limitations. So a great video world model like this will likely struggle to see actual adoption if the same poor execution post-launch continues.

New SynthID updates and partners

As generative AI gets better, so does the need for greater transparency. Research shows people can correctly identify high-quality deepfake videos only about a quarter of the time. Three years ago, we launched SynthID, our watermark that is invisible to the naked eye. Since launch, SynthID has now watermarked over one hundred billion images and videos, along with sixty thousand years of audio assets.

Millions of people are using our SynthID detector in the Gemini app to verify AI-generated content. And now we’re going a step further and adding Content Credentials verification across products. This will show you if the origin of the content was AI or a camera, and if it’s been edited with generative AI tools. We want more people to have easy access to these tools, so we’re expanding both Content Credentials and SynthID verification to Search and Chrome.

We are at the sixth announcement, which is focusing on corpo compliance for fake images. Is Google even trying to grab the attention of developers?

Gemini 3.5 Flash

Today, we’re introducing Gemini 3.5, our latest family of models combining frontier intelligence with action. This represents a major leap forward in building more capable, intelligent agents. We’re kicking off the series by releasing 3.5 Flash. It delivers frontier performance for agents and coding, excelling at complex long-horizon tasks that deliver real-world utility.

3.5 Flash is available today to billions of people globally:

For everyone via the Gemini app and AI Mode in Google Search
For developers in our agent-first development platform Google Antigravity and Gemini API in Google AI Studio and Android Studio
For enterprises in Gemini Enterprise Agent Platform and Gemini Enterprise.

We’re also hard at work on 3.5 Pro. It’s already being used internally, and we look forward to rolling it out next month.

At long last, an actual announcement that matters (besides the TPU ramp-up, thank you on behalf of my semiconductor bags). Traditionally the Gemini Flash models have been seen as a great value, offering reasonable capabilities at a significantly lower cost.

According to the internal benchmarks by the team, the model is trading blows with the other leading frontier models.

The first thing that we need to acknowledge is that the "value play" here is completely gone, as the price per token has been raised almost three times and 3.5 Flash ends up at a significantly higher cost envelope than ever before.

If the frontier models are expensive but get the most difficult jobs done, then 3.5 Flash is not making a meaningful step for Google in either direction. It remains more expensive to run than the open-source models and the latest xAI Grok version, while not being best in class either (not even on benchmarks).

Cursor also runs its own evals (CursorBench) and the results do not look particularly exciting either. If anything, the latest Composer 2.5 model completely outplays the Gemini launch, which potentially could be a very interesting early sign of success for SpaceXAI if they acquire Cursor.

User feedback has not been kind either:

Which brings us to the most “hidden” announcement, the push for Antigravity as a product:

Antigravity 2.0

We’re also bringing 3.5 Flash to developers in Antigravity.

Antigravity is expanding beyond the coding environment, turning it into a platform to develop and manage cohorts of autonomous AI agents. This includes Antigravity 2.0, a new standalone desktop application that acts as a central home for agent interaction, where anyone can orchestrate agents for all sorts of tasks. And we developed an even more optimized version of Flash: not just 4x but 12x faster than other frontier models.

Users in Antigravity can get a taste of this experience starting today. Read more about Antigravity 2.0 here.

Gemini Spark is your 24/7 agent

Gemini 3.5 and Antigravity are unlocking a new world of agents and agentic capabilities. We’ve been bringing agents to developers and enterprises for a while. Now we are super focused on bringing the power of agents, safely and securely, to consumers so that it works for everyone. You’ll see agentic experiences across many of our products today.

I’m particularly excited for Gemini Spark, your personal AI agent in Gemini app that helps you navigate your digital life, taking action on your behalf and under your direction.

It runs on dedicated virtual machines on Google Cloud. And it’s 24/7 so you don’t need to keep your laptop open.
It’s powered by Gemini 3.5 and the Google Antigravity harness, which allows it to perform long-horizon tasks easily in the background.
Spark will integrate seamlessly with tools, starting with our own, and in the coming weeks with third-party tools through MCP.
And you can work with Spark however is most convenient: in the Gemini app or soon, through email and chat.
On Android, you will be able to view live updates and task progress of agents like Spark through a new UI space called Android Halo, coming later this year. Later this summer, Spark will operate directly within Chrome, acting as your agentic browser across the web.

We’re starting to roll out Gemini Spark to trusted testers this week and the Beta is coming to Google AI Ultra subscribers in the U.S. next week.

Google will actually retire Gemini CLI, an open-source coding harness that was widely used by the open-source model providers to build their own offerings on top. This is replaced with the closed-source Antigravity CLI (born out of the acquihire of Windsurf), which will now be pushed further as the "best way" to utilize Gemini models, which is ironic since the public pushback has been quite negative.

This is not surprising, since Google is essentially getting outplayed on the "surfaces" side, as Anthropic was able to heavily popularize the usage of their Claude app for chat, Cowork, and Code, while OpenAI is trying to expand on its successful launch of the dedicated coding application Codex. If SpaceXAI is able to acquire Cursor, they will also offer their own widely adopted coding surface, leaving Google as the only player struggling to consolidate around a primary product.

There is one silver lining here, which is enterprise agentic usage.

As long as Gemini remains usable for enterprise agentic use cases, the GCP team will be able to keep selling it aggressively at a discount. Still, unless something completely changes with the Gemini 3.5 Pro launch, it's becoming obvious that Google is struggling to be competitive in coding, and at a time when this is leading to exploding revenues at Anthropic and OpenAI, it's a mistake that might be very difficult to recover from. Meta is considering it so existential to training new models that they are now recording all of the computer use of their developers in order to somehow catch up on the high-quality training data required to progress on this (and Musk is willing to spend $60B for Cursor).

Infra Play

Infra Play #141: Q2'26 Infra Play portfolio

Discussion about this post

Ready for more?