AWS makes its most powerful custom Arm processor generally available, positioning 192-core density and formally verified isolation as an essential orchestration fabric for multi-agent workloads at hyperscale.
What Is Covered in This Article:
- AWS Graviton5 general availability via M9g and M9gd instances
- The CPU’s repositioning as an agentic AI orchestration infrastructure
- Meta’s commitment to tens of millions of Graviton cores
- Formally verified security through the Nitro Isolation Engine
- Real-time context platforms validating latency and throughput gains
The News: AWS announced that Graviton5-powered Amazon EC2 M9g and M9gd instances are now generally available. First previewed at re:Invent 2025, Graviton5 features 192 cores, a 5x larger L3 cache, DDR5-8800 memory, PCIe Gen 6, and up to 33% lower inter-core communication latency compared to its predecessor.
Early access customers reported significant gains: ClickHouse achieved a 36% performance boost with zero code changes, Honeycomb saw 36% better throughput per core across a six-month production test, and HubSpot observed query durations drop by up to 60% on MySQL databases. Meta is deploying Graviton at scale, starting with tens of millions of cores to support its agentic AI efforts, making Meta one of the largest Graviton customers in the world. Tacnode, an AI-native real-time data platform, benchmarked M9g against its current Graviton4 fleet and reported 20-30% throughput improvement and P99 tail latency more than halving under load.
Xiaowei Jiang, Tacnode CEO and Chief Architect, stated: “Graviton5’s higher memory bandwidth is a particularly good match for Tacnode’s bandwidth-sensitive mixed read/write paths at scale. Graviton5 will become the default compute tier for Tacnode on AWS.”
AWS Graviton5 Reframes the CPU as Agentic AI Infrastructure
Analyst Take: AWS Graviton5 represents a deliberate architectural thesis that agentic AI workloads require CPU infrastructure optimized for massive concurrency rather than single-thread peak performance. The general availability of M9g and M9gd instances, combined with Meta’s commitment to tens of millions of cores, signals that the industry’s largest AI builders view high-core-count Arm CPUs as essential scaffolding for orchestrating autonomous agent fleets. Tacnode’s benchmarks, showing P99 tail latency more than halving under load, validate that the gains translate directly into the stability profile that high-stakes agentic decisioning demands. This positions Graviton5 not as a GPU substitute but as the coordination fabric that keeps accelerators fed and agents responsive across concurrent execution environments. The central question is whether AWS can convert this architectural advantage into agentic workload gravity.
192 Cores Redefine Concurrency Economics for Agent Orchestration
The decision to pack 192 cores into a single socket with 33% lower inter-core latency reflects AWS’s view that agentic AI creates a fundamentally different compute profile than traditional cloud workloads. Agents that reason, generate code, and coordinate multi-step tasks require processors capable of sustaining large numbers of lightweight, latency-sensitive threads simultaneously without degrading under concurrent load. Graviton5’s 5x larger L3 cache and DDR5-8800 memory support reduce the data-fetch penalties that throttle concurrent workloads on lower-density architectures, while the move to PCIe Gen 6 ensures I/O does not become a secondary bottleneck as agent fleets scale.
AWS explicitly frames this generation around the shift from AI answering questions to AI taking actions, running code, using tools, evaluating results, and orchestrating multi-step tasks, positioning the CPU as the substrate on which these operations execute concurrently. The 25% compute improvement over Graviton4 compounds with the architectural changes to create a generation that targets workload density rather than peak per-core speed. CPU selection criteria for agentic AI workloads are shifting from peak compute throughput to sustained concurrent density and inter-core communication efficiency.
Meta’s Core Commitment Validates the CPU-as-AI-Infrastructure Thesis
Meta’s pledge to deploy tens of millions of Graviton cores for agentic AI represents one of the largest known CPU procurement commitments for AI-specific workloads. This commitment reframes the conventional narrative that AI infrastructure investment flows exclusively toward GPUs and accelerators, revealing a complementary demand for CPU fabric capable of managing context, coordinating tool use, and sustaining the reasoning loops that sit between inference calls.
Meta’s decision suggests that orchestrating agent fleets at scale requires dedicated CPU resources optimized for the concurrency, memory bandwidth, and low-latency characteristics that agentic workloads demand in ways that general-purpose processors cannot deliver at equivalent density. The breadth of Graviton’s existing footprint, powering over 350 instance types serving more than 120,000 customers across eight years of continuous investment, means Meta joins an established ecosystem rather than pioneering an unproven platform. AWS positions this customer concentration as evidence that the agentic CPU demand pattern has moved from speculative to structural across the industry.
Real-Time Context Platforms Expose the Latency Demands of Agentic Decisioning
Tacnode’s qualification benchmarks provide direct evidence that Graviton5’s architectural improvements translate into measurable gains for the most demanding agentic AI workloads. Tacnode Context Lake, an AI-native real-time data platform providing agents with millisecond-fresh context for fraud and risk decisions, reported 20-30% throughput improvements across standard workloads and P99 tail latency reductions of more than 50% on its most demanding context-serving paths. CEO and Chief Architect Xiaowei Jiang confirmed that the DDR5-8800 specification delivers a practical advantage for data-intensive agent orchestration.
The decision to make Graviton5 the default compute tier for Tacnode on AWS demonstrates that data infrastructure vendors are treating this generation as a step-change rather than an incremental upgrade. Platforms such as Tacnode sit at the intersection of agent orchestration and real-time decision-making, precisely where latency variance translates directly into operational risk for high-stakes use cases such as fraud prevention. These early proof points will need to translate into multi-agent workflows to prove that CPUs can attach to GPUs for autonomous inference.
Formally Verified Security Establishes a New Isolation Standard for Multi-Tenant AI
The Nitro Isolation Engine introduces formal verification into production cloud security, establishing a new baseline for workload isolation in multi-tenant environments running sensitive AI operations. This approach moves beyond conventional testing to mathematically demonstrate that the hypervisor behaves as intended across all possible states, not merely in specific test cases, eliminating categories of security vulnerabilities that probabilistic testing cannot guarantee against. AWS describes the Nitro Isolation Engine as a purpose-built component responsible for mediating all access to virtual machine memory, CPU register state, and I/O devices through a minimal set of APIs, reducing attack surface by architectural constraint rather than through layered mitigation.
Nitro represents a structural claim about the security standard required for autonomous agents that access sensitive data and execute consequential actions across organizational boundaries in shared infrastructure. For regulated industries where agentic AI adoption has been constrained by governance and isolation concerns, formally proven separation of tenant workloads addresses a specific barrier to deployment at scale. The competitive pressure this places on other hyperscalers to demonstrate equivalent isolation assurances could reshape procurement criteria for AI workloads in financial services, healthcare, and government sectors.
What to Watch:
- Whether Meta’s deployment translates into publicly disclosed agentic AI products or remains an infrastructure-layer investment.
- How Intel and AMD respond with competitive core density, inter-core latency, and agentic workload benchmarks for their next-generation server processors.
- Whether the Nitro Isolation Engine’s formal verification approach becomes a procurement requirement for regulated-industry AI deployments.
- How real-time context infrastructure vendors such as Tacnode shape purchasing criteria around tail latency predictability rather than average throughput.
- The degree to which competing hyperscalers accelerate their own custom CPU programs in response to AWS’s agentic AI positioning
Read the full announcement on the AWS website.
Declaration of generative AI and AI-assisted technologies in the writing process: This content has been generated with the support of artificial intelligence technologies. Due to the fast pace of content creation and the continuous evolution of data and information, The Futurum Group and its analysts strive to ensure the accuracy and factual integrity of the information presented. However, the opinions and interpretations expressed in this content reflect those of the individual author/analyst. The Futurum Group makes no guarantees regarding the completeness, accuracy, or reliability of any information contained herein. Readers are encouraged to verify facts independently and consult relevant sources for further clarification.
Disclosure: Futurum is a research and advisory firm that engages or has engaged in research, analysis, and advisory services with many technology companies, including those mentioned in this article. The author does not hold any equity positions with any company mentioned in this article.
Analysis and opinions expressed herein are specific to the analyst individually and data and other information that might have been provided for validation, not those of Futurum as a whole.
Read the full Futurum Group Disclosure.
Other Insights From Futurum:
AWS Bets on Random Graph Theory: Will Cloud Network Resilience Define the Next Decade?
Bedrock Advanced Prompt Optimization Cuts the Cost of Model Switching
AWS Pushes the Agent Stack: Quick, Connect Verticals, OpenAI on Amazon Bedrock
Brendan is Research Director, Semiconductors, Supply Chain, and Emerging Tech. He advises clients on strategic initiatives and leads the Futurum Semiconductors Practice. He is an experienced tech industry analyst who has guided tech leaders in identifying market opportunities spanning edge processors, generative AI applications, and hyperscale data centers.
Before joining Futurum, Brendan consulted with global AI leaders and served as a Senior Analyst in Emerging Technology Research at PitchBook. At PitchBook, he developed market intelligence tools for AI, highlighted by one of the industry’s most comprehensive AI semiconductor market landscapes encompassing both public and private companies. He has advised Fortune 100 tech giants, growth-stage innovators, global investors, and leading market research firms. Before PitchBook, he led research teams in tech investment banking and market research.
Brendan is based in Seattle, Washington. He has a Bachelor of Arts Degree from Amherst College.

