Google Splits Next-Gen TPU Into Two Chips, Betting on 'Agentic Era' AI
Via Techinasia, Malaymail, Interestingengineering, TechCrunch and Arstechnica
- •Google introduced TPU 8t for training and TPU 8i for inference, splitting functions previously handled by a single chip design.
- •Google frames the new chips around the "agentic era," targeting autonomous digital agents as a primary workload.
- •Apple, Microsoft, Meta, and Amazon are simultaneously expanding custom AI chip efforts, per Tech in Asia.
- •Both new TPUs are faster and cheaper than predecessors, though Google Cloud still incorporates Nvidia technology.
- •The chips are proprietary to Google Cloud, unlike Nvidia's broadly available GPU accelerators.
What Happens Next
+ Show− Hide
- →Google's train/infer split establishes a design template that Apple, Microsoft, Meta, and Amazon — all actively developing custom AI silicon — are likely to replicate, fragmenting the AI chip market into specialized subcategories within 12-18 months.
- →By optimizing TPU 8i specifically for inference, Google reduces per-query costs for agentic workloads on its cloud, creating pricing pressure on AWS and Azure for inference-heavy enterprise contracts and pulling workloads toward Google Cloud.
- →The proprietary, cloud-locked nature of the new TPUs deepens vendor lock-in for Google Cloud customers adopting agentic architectures, raising switching costs and reinforcing Google's position as the default platform for autonomous agent deployment.
Near-term: Google Cloud customers running large-scale inference workloads begin migrating to TPU 8i within existing contracts, driving measurable cost reductions that Google uses as competitive benchmarks against AWS and Azure pricing. Long-term: The AI hardware market splits into distinct training and inference ecosystems with divergent supply chains, reducing Nvidia's dominance in inference while cloud providers internalize an increasing share of chip design — shrinking the addressable market for general-purpose GPU accelerators.