Two research preprints

Your GPU is fast enough.
The software isn't.

GPU frameworks waste 92% of their time on overhead — sending tasks one by one instead of all at once. I proved it and fixed it. Two preprints, two domains, one technique.

458×
faster transformer inference parallel fused kernel
720×
faster sequential compute CUDA on same T4
0
things to install just open Chrome

How it works (simply)

No jargon. Here's the intuition.

1

The problem: 92% overhead

GPU frameworks (PyTorch, JAX, WebLLM) send one small task to the GPU, wait for it to finish, send the next one. For a 64-token generation with 4 layers, that's 1,024 separate round-trips. Each round-trip takes longer than the actual math.

2

The fix: one dispatch

Pack the entire computation — all tokens, all layers, all operations — into a single GPU instruction. The GPU loops internally. No round-trips. No waiting. Same math, same result.

3

The proof: two preprints

Paper 1: 159-720× on sequential fitness evaluation across CUDA, WebGPU, JAX, Triton. Paper 2: 66-458× on transformer inference with a parallel kernel. Both on the same hardware as the baselines. All code and data public.

4

The result: AI in the browser

16,000 tokens per second at D=32 in Chrome. No Python, no CUDA, no cloud. A laptop running a browser outperforms PyTorch on the same GPU by up to 161×.

What actually changes

Not theoretical. Here's what's different tomorrow.

Before

ChatGPT in your browser types 5 words per second. You assume your laptop isn't powerful enough.

After

Your GPU was idle 92% of the time. The waiting is eliminated. Same GPU, 6-458× faster.

Before

Running AI locally means installing Python, CUDA, PyTorch, downloading model weights, debugging driver conflicts.

After

Open a browser tab. That's it. The AI runs on the GPU you already have, at near-native speed.

Before

Every AI feature costs $2-4/hour in cloud GPU. 100K users = $50K/month in servers.

After

The user's GPU does the work. Server cost: $0. The browser IS the infrastructure.

Before

A student in rural India can't afford a GPU cluster or cloud API credits to learn AI.

After

A $300 phone with Chrome can run transformer inference locally. No internet needed after model download.

Who this is for

💬

Anyone who uses AI chatbots

Browser-based AI assistants could respond 6-458× faster. Not by buying better hardware — by fixing how the software talks to your GPU.

🏫

Teachers and students

Run AI models live in the classroom. Every student's laptop becomes an AI workstation. No lab, no cloud account, no IT department.

🔬

AI researchers

Ship a live demo of your model as a URL. Reviewers run it in their browser instead of fighting with your Docker container.

🚀

Startups building AI products

Add AI features to your web app without GPU servers. Your users' devices do the compute. Scale to millions at zero marginal cost.

🔒

Privacy-sensitive industries

Healthcare, legal, finance — the AI runs on the device. Data never leaves the laptop. Compliance by architecture.

🌍

The developing world

3 billion people have a WebGPU-capable device. Browser-native AI makes intelligence a capability your device already has, not a service you rent.

See it for yourself

Run the benchmarks on your hardware, right now.