
MemryX Inc., a company delivering production AI inference acceleration, announced its strategic roadmap for the MX4. The next-generation accelerator is engineered to scale the company’s “at-memory” dataflow architecture from edge deployments into the data center, leveragingย 3D hybrid-bonded memoryย to eliminate the industry’s most pressing bottleneck: the “memory wall.”
MemryX is currently in production with itsย MX3 silicon, delivering >20ร better performance per watt than mainstream GPUs for targeted AI inference applications. With MX4, MemryX is extending that production-proven foundation to address data center workloads increasingly constrained not by compute, but by memory capacity, bandwidth, and energy efficiency.
MemryX has now signed an agreement with a next-generation 3D memory partner to execute a dedicatedย 2026 test chip program, validating a targeted ~5ยตm-class hybrid-bonded interface and direct-to-tile memory integration. The partner is not disclosed at this time.
Also Read:ย CIO Influence Interview with Duncan Greatwood, CEO at Xage Security
The announcement comes as the semiconductor industry increasingly prioritizes deterministic inference architectures for the next era of AI processing, reinforced by recent multibillion-dollar licensing and investment activity across AI hardwareโsuch asย Nvidia’s $20B deal with Groq, which underscoresย the massive strategic value ofย efficient inference solutions. While the first generation of dataflow solutions proved theย efficiency of 2D SRAM, MemryX is moving into the third dimension to address the power, cost, and complexity constraints of frontier AI workloads.
Software Continuity: Leveraging the MX3 Compiler Foundation
MemryX plans to leverage its mature, production-proven MX3 software stack โ including its compiler and runtime โ as the foundation for MX4. While MX4 introduces new capabilities to support larger memory footprints and data center-scale configurations, the roadmap is designed to preserve key elements of the MX3 programming model and toolchain to accelerate adoption and shorten time-to-deployment for existing and new customers.
Beyond LLMs: Powering Frontier Inference
While Large Language Models (LLMs) remain a priority, the data center is rapidly evolving towardย Large Action Models (LAMs), high-resolution multimodal vision, and real-time recommendation engines. These “frontier workloads” require massive memory capacity and predictable throughput that traditional 2.5D HBM-based architectures struggle to provide efficiently.
The MX4 addresses this by physically bonding high-bandwidth memory directly to compute tiles, shifting the focus from data movement back to high-efficiency computation.
The Asynchronous Advantage: Scalability Without Bottlenecks
The MX4 represents a fundamental departure from synchronous chip designs. Many current accelerators rely on a global synchronous clock, which can introduce clock skew and thermal challenges as designs scale using 3D stacks.
Like the MX3, the MX4 utilizes adata-driven producer/consumer flow-control modelย and avoids the centralized memory bottlenecks common in traditional architectures by enabling direct interfaces from 3D memory to compute tiles. However, rather than using 2D embedded SRAM like the MX3, the MX4 directly connects computing tiles to 3D memories without using single shared controllers.
- Asynchronous Scaling:ย Tiles operate independently, processing only when data is available and downstream consumers are ready. This naturally manages backpressure and reduces the switching overhead and clocking complexities inherent in synchronous architectures.
- Direct-to-Tile 3D Interface:ย By targeting a ~5ยตm-class hybrid bonding pitch, MX4 enables a distributed vertical interconnect in which individual compute engines access memory layers directlyโwithout relying on a single shared memory controller used by today’s HBM-based designs.
- Technology Agnostic:ย The architecture is designed to support multiple 3D direct to memory formats, including today’s stacked DRAM and emerging FeRAM-class technologies.
“The industry has recognized that deterministic dataflow is a compelling path forward for AI inference, but both efficiency and scale are critical,” saidย Keith Kressin, CEO of MemryX. “By combining our production-proven architectureโincluding an asynchronous flow modelโwith 3D hybrid bonding, we are removing the physical barriers to power-efficient trillion-parameter scalability. We aren’t just building a faster chip; we are building a more practical roadmap for the future of AI.”
Catch more CIO Insights:ย Why Todayโs Web Agent Benchmarks Donโt Reflect Real-World Reliability
[To share your insights with us, please write toย psen@itechseries.comย ]

