Why we invested in Standard Kernel
Fixing the software layer that’s holding AI back.
There’s a strange paradox at the heart of the AI boom. The world has never invested more in compute. Hyperscalers are spending hundreds of billions on data centers. NVIDIA’s data center revenue crossed $130.5B last year, growing 142% annually. And yet, the average GPU utilization rate sits below 30%. Most of the world’s most expensive hardware is sitting idle.
This isn’t a hardware problem. It’s a software problem, and it’s one most people in AI haven’t thought much about.
The Hidden Bottleneck
Between your AI model and the GPU running it, there’s a layer of software that almost nobody talks about: the instructions that tell the chip exactly how to execute each computation. These are called GPU kernels — and how well they’re written determines how much of your hardware’s potential you actually get. Think of it like the difference between a car running at 60% fuel efficiency versus 100%. The engine is the same. The outcome is dramatically different.
The problem is that writing high-quality kernels is among the most specialized work in software engineering. It requires deep expertise in how a specific chip is built, down to how it moves data internally and executes parallel instructions at the hardware level. Every chip generation is different. Every model architecture can require a different set of optimizations.
And there are only a few thousand engineers in the world capable of doing this at a high level. Even the best teams at the best companies spend weeks or months tuning individual kernels. Even then, they’re often working off-cycle from the hardware releases that made optimization necessary in the first place.
The result is a market-wide tax on AI performance. Compute that’s been purchased, deployed, and paid for, but never delivers its full value. And with each new model architecture and chip generation, the gap between what’s theoretically possible and what teams actually achieved keeps widening.
A Problem That Scales With AI
This matters more now than ever, and it’s going to matter more still.
GPU supply is already constrained. Demand from training and inference outpaces availability by years. And the obvious fix – build more compute – isn’t as simple as it sounds. New data centers take years to plan, permit, and construct. Power infrastructure is increasingly scarce. The capital requirements are staggering. Even with unlimited funding, timelines can only compress so much.
Against that backdrop, low utilization is not just inefficient; it’s untenable. There’s no quick hardware-side fix for the gap between theoretical and realized performance. The only realistic way to “create supply” is to extract more from the infrastructure that already exists.
Meanwhile, companies building AI products are burning months of scarce engineering time on low-level performance work rather than advancing their models or shipping products. For teams that can’t attract or afford elite systems talent, this becomes a hard ceiling. For teams that can, it’s still an enormous drag.
In a market where inference costs, latency, and throughput are increasingly competitive differentiators, performance at this layer compounds in ways that matter to the business.
This is the problem Standard Kernel is built to solve.
What Standard Kernel Does
Standard Kernel is building what, until now, only a handful of elite engineers could do: AI systems that can automatically generate highly optimized, hardware-specialized software that extracts peak performance from modern GPUs.
The vision is straightforward but powerful. You hand Standard Kernel your AI workload. The platform produces the best possible version of the underlying code for your exact hardware configuration. No manual kernel tuning. No months of performance debugging. No waiting on engineering cycles. No settling for generic libraries that were never built for your use case.
At the core is an automated feedback loop: generate candidate implementations, test and benchmark against real hardware, refine based on results, and repeat. Each cycle narrows the gap between what’s running and what the chip is actually capable of. The result is software that’s tailored to the workload, not one-size-fits-all solutions that leave performance and efficiency on the table.
The performance numbers bear this out. In partner testing on NVIDIA H100 GPUs, Standard Kernel has demonstrated improvements ranging from 80% to 4x on end-to-end workloads. In some cases, it has outperformed NVIDIA’s own highly optimized libraries.
That’s not a small delta. In infrastructure, where performance wins are typically measured in single-digit percentages, those numbers are remarkable.
From an investment standpoint, what makes this particularly compelling is that Standard Kernel delivers a rare double unlock. Engineers spend less time writing and tuning kernels, and the kernels they produce perform better. Most infrastructure tools deliver one or the other, Standard Kernel delivers both.
And because every point of GPU utilization recaptured is compute that has already been purchased and deployed, the economic impact compounds. This is not theoretical efficiency. It is stranded capital being put back to work.
The Team
Anne Ouyang and Chris Rinard are exactly the kind of founders you want building this company.
Anne spent time on the cuDNN team at NVIDIA, writing the same hand-tuned CUDA kernels that Standard Kernel is now automating. She went on to Stanford for her PhD in CS, where she co-authored KernelBench, the first standardized, open-source benchmark for evaluating LLM-generated GPU kernels. It’s since become a reference point across the industry, cited by NVIDIA in its own developer evaluations.
Chris brings equally deep systems expertise and technical depth that match the ambition of what they’re building. They two met while teaching Performance Engineering at MIT, which tells you something about how they think about this problem: rigorously, from first principles, and with a clear view of where performance actually breaks down.
Founder-market fit in technical infrastructure is hard to fake. This team has written these kernels by hand. They built the benchmark to measure how well AI can generate them. Now they’re building the company to automate the process entirely.
The distance between where they’ve been and what they’re building is exactly zero.
Why We Invested
The AI compute crunch isn’t going away. Global data center demand is expected to nearly triple by 2030, with the majority driven by AI workloads. Hardware supply will remain constrained. And as model complexity increases, the pressure to extract maximum performance from every deployed chip will only intensify.
In that environment, the most important companies aren’t just the ones selling access to more compute. They’re the ones solving the problems that compound and unlocking more value from what’s already there.
Standard Kernel sits squarely in that category. This isn’t a productivity tool at the margin. It’s infrastructure for making the entire AI economy more efficient.
We’re proud to lead Standard Kernel’s $20mm seed and to partner with Anne, Chris, and the team. We look forward to building alongside them.
This article is for informational purposes only and does not constitute investment advice. Jump Capital is an investor in Standard Kernel. Views expressed represent the opinions of the authors and Jump Capital. Forward-looking statements involve risks and uncertainties, and references to specific companies and their capabilities do not constitute investment recommendations or guarantee future performance.
Front Page