Why behind AI: xAI progress update
We are training AGI in the volatile makeshift offices of a carnival company
In my deep dive on xAI I concluded:
For tech sales: xAI has had a chaotic year, making significant progress on product and infrastructure while bungling their hiring process, signaling that Elon Musk has far less control over this area than at his previous companies. He hasn’t hired or managed a software sales team since the early 2000s, so one of the investors has been “helping.” Currently, your odds of getting hired are minimal unless you come from a playbook company and have personal connections with the VC driving the hiring process. Since they are currently burning $1B monthly and will be lucky to get to $500M ARR by the end of the year, it’s difficult not to see this wave of hiring as anything but pigs to get slaughtered, while the cleanup crew gets all the spoils.
For investors: The value proposition of xAI depends on whether AI is a high-variance, power-law game or a low-variance, bell-curve one. If all frontier labs benefit from a massive adoption cycle, xAI has positioned itself to capture value. Taking a long-term view toward AGI, the future looks murkier. They’ve had their source code stolen, key researchers remain poaching targets, Grok 4 Fast is their first potentially competitive product, and the company is massively leveraged and under pressure to deliver wins. On the plus side, outside of DeepMind, they’re the only player with their own infrastructure (for now) and they have Elon Musk highly committed to reaching AGI. A lot rides on Grok 5 and whether they can pull off a “Claude Code” moment. Typically, xAI would be considered a contrarian bet, but based on the current dynamics there are way too many pain points that need urgent attention, while they are going against multiple heavily funded organizations that are executing at a higher level in almost every category. The bear thesis here is that they need other players in the ecosystem to help them drive adoption, and none of them has the incentive to do so.
This was back in late September ‘25, or as it’s known in AI, ten thousand years ago. Since then, xAI has raised another $20B in an upsized Series E round, invested further in the Macrohard team, deployed 1GW compute capacity with Colossus 2 and released Grok 4.1.
For this article, we’ll take a look at how an insider experienced working at xAI from an engineering point of view. It appears that Sulaiman has now left xAI (with the implication he was fired for this interview or he was already in the process of departing), so we will have to see the actual products in order to understand the direction of the company.
What’s happening at xAI right now?
We don’t really have due dates. It’s always yesterday.
There’s no blockers for anything—at least nothing artificial. The whole Elon thing about going down to the root, the fundamental, whatever the physical thing is—we get there pretty quick, as quick as we can. Which is funny in software. It’s not really something you think about, the physics, but we do try quite a bit.
We’re not really fully a software company, given all the infrastructure built out. Kind of hardware at this point. It’s hardware constrained. That’s probably our biggest edge—the hardware—because nobody else is even close on the deployment there. All the talent that ended up on software is incredible. I’ve never been anywhere like that. It’s really cool.
The benefit of working at an Elon Musk company is that the pecking order is very clear: anything that engineering needs to solve a problem will be made available, as long as they are also trying to fix it in the most economically viable manner.
How does Elon’s focus on bottlenecks work day to day?
Elon is very good at figuring out what the bottlenecks will be, even a couple months or years in the future, and then working backwards from that to make sure he’s in a really good position.
Usually when we spin something up new, either one of us or he comes up with a metric. It’s usually very core to either the financial or the physical return, or both sometimes. Everything is just focused on driving that metric.
There’s never a fundamental limitation to it. Or whatever the fundamental limitation is, it better be rooted deep down and not something artificial. There are a lot of perceived limitations, especially in the software world coming from the last 10 years of webdev and all these things. People just assume or accept certain limitations, especially when it comes to speed and latency, and they’re not true.
You can get rid of a lot of overhead. There’s a lot of stupid stuff in the stack. If you can knock out a lot of that, you can usually 2x to 8x most anything. At least anything invented relatively recently.
Probably the most important reason why Anthropic's IPO is more interesting than OpenAI's is the strong focus on scaling models and delivering inference efficiently. xAI benefits from a similar focus; the big question is whether they can truly compete on the performance vs. efficiency graph against Anthropic.
How fast are you iterating on models?
Most recently, it’s our model iterations on Macrohard.
We’re working on some novel architectures, actually multiple at the same time. We’re coming out with new iterations daily, sometimes multiple times a day, which is from pre-train in some cases. That’s not something you ordinarily see.
It comes from having a pretty great supercompute team. They’ve knocked out a lot of the typical barriers it takes to train, even with how variable our hardware is. Within a day of standing up a rack, you can usually be training, sometimes within the same day. Even within a few hours in some cases.
This is not normal. Normally the timelines are days or weeks. In most cases over the last 10 years, you abstract this away and let Amazon or Google take care of it. Whatever their capacity is, that’s what their capacity is. But you can’t have that be the case and win in AI now. So the only solution is to build it yourself.
The biggest edge right now of xAI is how quickly compute is being delivered to the engineering teams. I think that other teams have done a good job of it as well (AWS, Nebius), but the pipeline of "put Blackwell GPU in data rack → instance now available for your cluster" is nearly instant. Satya got a lot of flak for saying that Azure had so much GPU backlog to install that they were just having them sit there until the data center got refurbished.
What is Macro Hard and the human emulator concept?
The basic concept is very simple. With Optimus, you’re taking any physical task a human can do and allowing a robot to do it automatically at a fraction of the cost, with 24/7 uptime. We’re doing the same with anything a human does digitally.
Anything where they need to input keyboard and mouse, look at a screen, and make decisions—we just emulate what the human is doing directly. No adoption from any software is required at all. We can deploy in any situation where a human currently is.
One thing that we’re thinking about: we’re building this human emulator with Macrohard. How do we deploy it? If we want to deploy a million human emulators, we need a million computers. How do we do that?
The answer showed up two days later in the form of the Tesla computer. Those things are actually very capital efficient. We can run our model and the full computer that a human would otherwise work at on the Tesla computer for much cheaper than on a VM on AWS or Oracle, or even just buying hardware from Nvidia.
So we want a million VMs? There are about 4 million Tesla cars in North America alone. Maybe two-thirds or half of them have Hardware 4. Somewhere between 70-80% of the time, they’re sitting there idle, probably charging. They have networking, cooling, power. We can potentially pay owners to lease time off their car and let us run a human emulator on it. They get their lease paid for, we get a full human emulator we can put to work. That’s something without any buildout requirement—purely a software implementation.
The concept around Macrohard will either be a massive flop or one of the most interesting work we've seen yet in AI. While companies like Shortcut have delivered a lot of value by adding AI features within the spreadsheet format, rethinking the AI-native workflow as "spreadsheet emulator that the human never touches" could be a very different productivity uplift, particularly since it sounds like it has low compute requirements (thousands in hardware costs vs. tens of thousands).
What was the biggest strategic decision that shaped everything?
One of them was certainly the decision to go with a model that would be at least 1.5 times faster than a human for Macrohard. It’s looking like significantly faster than that. 8x maybe? Maybe more.
For other human emulator type attempts at other labs, the approach has been: let’s do more reasoning and build a bigger model. That decision put us on totally the opposite track of what everyone else is doing. Everything we’re doing is downstream of that.
It was very early that this was decided. It was expected, especially given the analog to Full Self-Driving. No one’s going to wait around 10 minutes for the computer to do something they could have done in five. But if it can be done in 10 seconds? I’d be happy to pay whatever amount of money for that. It’s just obvious.
Going with the smaller model, we’re able to iterate much faster. So not only does the model react to situations faster and can be more tolerant of time frames, you can also deploy iterations much faster. If it was 4 weeks before, maybe it’s one week now. That actually goes back to experimentation—why we can have 20 different experiments going in parallel is a result of that particular decision early on.
A small model that could run tasks immediately during demos is likely going to be perceived very positively when presented in sales meetings. If it's accurate, it will become the next big thing.
How does engineering work with so few people?
I’ve jumped around a lot of different projects, mostly just because someone asked for my help and I kept helping. Then I ended up owning some of the stack, or a lot of the stack. This is the case for everyone. This is just how it is. If you have any particular experience or can iterate on something very quickly, within days you own that component. There’s no formal anything.
The fuzziness between teams and what everyone is responsible for is definitely not what I expected, and I don’t think exists nearly as much in any large company. If I need to fix something on our VM infrastructure, I will do it, show it to the guy who owns that, they’ll be like “okay,” and it’s merged immediately and deployed.
Everyone is allowed to update everything. There’s some checks for dangerous things, but largely you’re trusted to do the right thing and do it right.
We did the math earlier this week. Right now we’re at about $2.5 million per commit to the main repo. And I did 5 today.
So engineers add $2.5M in value to the company at every commit, while sales does…
What does engineering culture look like?
When I joined, I think every manager also wrote code. And largely today they still do. Not as much now that some have 100+ people reporting to them, but everyone’s an engineer.
I remember on my first week, I sat down for dinner and this guy sits next to me. I asked “What team are you on?” He tells me “I’m on sales, enterprise deals.” And I was like, I don’t want to talk to this guy, a sales guy. Then he starts telling me about this model he’s training. He’s an engineer too. The sales team are all engineers. Everyone is an engineer.
I think at the time, it was probably less than 8 people who were not engineers at the company in some capacity.
For hiring, Elon would go on for 10 minutes: “Engineers. Just engineers. It doesn’t matter. Good engineers. Someone who’s fundamentally a problem solver. It doesn’t matter if they did this particular architecture or infrastructure.”
Thankfully, this is no longer true, as the sales team has been replaced with playbook hires without cloud infrastructure software experience. I wouldn't want to be on a sales leadership call with Elon.
What’s the most fun thing about working there?
No one tells me no.
If I have a good idea, I can usually go and implement it that same day and show it off. We’ll see if it makes sense, run whatever eval, show it to a customer or show it to Elon or whoever, and we’ll get an answer usually that same day as to whether or not that was the right move.
There’s no deliberation, no waiting for any bureaucracy. I like that a lot.
I was expecting to sacrifice some amount of this coming from extremely small startups to a larger company. Joining at 100 people—to me that was a 10x leap from anywhere else I’ve been. But relative to Elon companies it’s pretty small, and it does feel very small. There’s not a lot of overhead in anything.
The biggest handicap of most large organizations today is that once they go public, it's very rare that almost any role can "just do things". In the age of AI, this has become not simply a con of working at a large company, but rather an existential threat.
War stories from building Colossus?
Tyler took this bet with Elon. We were setting up new racks. Elon said “You get a Cybertruck tonight if you can get a training run on these GPUs in 24 hours.” And we were training that night. Yeah, he got it. I see it from our lunch window in the cafeteria.
For power, we have to collaborate very tightly with the municipal and state power companies. When load goes high on their end, we have to shut off and go fully on the 80+ mobile generators we brought in on trucks. We have to do that seamlessly without interrupting anyone’s extremely volatile training runs on extremely volatile hardware, which scales up and down by megawatts in milliseconds. It’s a lot.
The lease for the land itself was actually technically temporary. It was the fastest way to get the permitting through and actually start building things. I think there’s a special exception within local and state government that says if you want to modify this ground temporarily—it’s for carnivals and stuff. xAI is actually just a carnival company.
That was the way to get it done quickly. 122 days.
We are training AGI in the volatile makeshift offices of a carnival company.
What happens when something breaks publicly?
When Elon sees something went wrong on X, he shows us what went wrong. Quickly, whoever is awake at the time will start up a thread to go and solve it. Usually individually, pull in a few others if need be. Then give a postmortem on what happened and everyone will understand what went wrong and how to avoid it in the future.
Generally, making mistakes once is okay. Making the same mistake twice is a big problem.
This is an interesting twist on the developer concept of "blameless postmortems". They still do it, but only once: the next time, you get blamed out the door.
Internal testing stories with human emulators?
We started testing some of our human emulators internally within the company as employees. In some cases we didn’t really tell anyone about this. Someone will be doing some work and someone says “Hey, can you help me with this thing?” The virtual employee says “Yeah sure, come to my desk.” They go there and there’s nothing there.
Multiple times I’ve gotten a ping saying “Hey, this guy on the org chart reports to you, is he not in today or something?” It’s just an emulation. It’s an AI. It’s a virtual employee.
This is one of the key insights here, together with the general Macrohard vision.
Basically, they are trying to create “human emulators”, i.e., agents that will behave the same as humans operating existing interfaces. This should reduce adoption friction, but more importantly also offer something different: a significant speedup for the performance of those tasks.
It appears that at this stage they are able to do a lot of training runs and try to solve a variety of tasks, which is also helping with adjacent activities they might not be training for.
It’s not unlikely that a similar type of work is happening over at Tesla for Optimus. If we look at the Series E announcement:
2025 was a year of breakthrough momentum, where the xAI team advanced a multitude of key initiatives including:
Data Centers: xAI continues to expand its decisive compute advantage with the world’s largest AI supercomputers at Colossus I and II, ending the year with over one million H100 GPU equivalents.
Grok 4 Series: our frontier language models are built on the best-in-class training infrastructure powered by Colossus. xAI has pushed reinforcement learning training to unprecedented levels, refining Grok’s intelligence, reasoning, and agency using pretraining-scale compute.
Grok Voice: the most intelligent voice agent driving real-time conversations in voice mode and available via agent API. Grok Voice delivers low-latency speech in dozens of languages, tool calling, and real-time data access, serving millions of users across the Grok mobile app and in Tesla vehicles.
User metrics: our reach spans approximately 600 million monthly active users across the 𝕏 and Grok apps.
Grok Imagine: our lightning-fast image and video generation models that brings state-of-the-art multimodal understanding, editing, and generation capabilities.
Grok on 𝕏: leveraging the 𝕏 platform to understand what’s happening in the world in real-time, utilizing xAI’s most powerful model to date.
Looking ahead, Grok 5 is currently in training, and we are focused on launching innovative new consumer and enterprise products that harness the power of Grok, Colossus, and 𝕏 to transform how we live, work, and play.
This financing will accelerate our world-leading infrastructure buildout, enable the rapid development and deployment of transformative AI products reaching billions of users, and fuel groundbreaking research advancing xAI’s core mission: Understanding the Universe.
Interestingly enough, 2026 might look quite different from a product perspective for the “larger” Musk group of companies. If he actually decides to take SpaceX public, and more importantly merge it with xAI, the funding required to keep scaling these efforts will be significantly easier to find and maintain, even if the company remains deeply unprofitable.
Still, they have to solve the GTM challenges, and right now it’s not obvious how Grok wins with consumers or enterprises.

