MemryX Unveils MX4 Roadmap: Enabling Distributed, Asynchronous Dataflow for Highly Efficient Data Center AI

MemryX Inc., a company delivering production AI inference acceleration, announced its strategic roadmap for the MX4. The next-generation accelerator is engineered to scale the company’s “at-memory” dataflow architecture from edge deployments into the data center, leveraging 3D hybrid-bonded memory to eliminate the industry’s most pressing bottleneck: the “memory wall.”

MemryX is currently in production with its MX3 silicon, delivering >20× better performance per watt than mainstream GPUs for targeted AI inference applications. With MX4, MemryX is extending that production-proven foundation to address data center workloads increasingly constrained not by compute, but by memory capacity, bandwidth, and energy efficiency.

MemryX has now signed an agreement with a next-generation 3D memory partner to execute a dedicated 2026 test chip program, validating a targeted ~5µm-class hybrid-bonded interface and direct-to-tile memory integration. The partner is not disclosed at this time.

Also Read: CIO Influence Interview with Duncan Greatwood, CEO at Xage Security

The announcement comes as the semiconductor industry increasingly prioritizes deterministic inference architectures for the next era of AI processing, reinforced by recent multibillion-dollar licensing and investment activity across AI hardware—such as Nvidia’s $20B deal with Groq, which underscores the massive strategic value of efficient inference solutions. While the first generation of dataflow solutions proved the efficiency of 2D SRAM, MemryX is moving into the third dimension to address the power, cost, and complexity constraints of frontier AI workloads.

Software Continuity: Leveraging the MX3 Compiler Foundation

MemryX plans to leverage its mature, production-proven MX3 software stack — including its compiler and runtime — as the foundation for MX4. While MX4 introduces new capabilities to support larger memory footprints and data center-scale configurations, the roadmap is designed to preserve key elements of the MX3 programming model and toolchain to accelerate adoption and shorten time-to-deployment for existing and new customers.

Beyond LLMs: Powering Frontier Inference

While Large Language Models (LLMs) remain a priority, the data center is rapidly evolving toward Large Action Models (LAMs), high-resolution multimodal vision, and real-time recommendation engines. These “frontier workloads” require massive memory capacity and predictable throughput that traditional 2.5D HBM-based architectures struggle to provide efficiently.

The MX4 addresses this by physically bonding high-bandwidth memory directly to compute tiles, shifting the focus from data movement back to high-efficiency computation.

The Asynchronous Advantage: Scalability Without Bottlenecks

The MX4 represents a fundamental departure from synchronous chip designs. Many current accelerators rely on a global synchronous clock, which can introduce clock skew and thermal challenges as designs scale using 3D stacks.

Like the MX3, the MX4 utilizes adata-driven producer/consumer flow-control model and avoids the centralized memory bottlenecks common in traditional architectures by enabling direct interfaces from 3D memory to compute tiles. However, rather than using 2D embedded SRAM like the MX3, the MX4 directly connects computing tiles to 3D memories without using single shared controllers.

Asynchronous Scaling: Tiles operate independently, processing only when data is available and downstream consumers are ready. This naturally manages backpressure and reduces the switching overhead and clocking complexities inherent in synchronous architectures.
Direct-to-Tile 3D Interface: By targeting a ~5µm-class hybrid bonding pitch, MX4 enables a distributed vertical interconnect in which individual compute engines access memory layers directly—without relying on a single shared memory controller used by today’s HBM-based designs.
Technology Agnostic: The architecture is designed to support multiple 3D direct to memory formats, including today’s stacked DRAM and emerging FeRAM-class technologies.

“The industry has recognized that deterministic dataflow is a compelling path forward for AI inference, but both efficiency and scale are critical,” said Keith Kressin, CEO of MemryX. “By combining our production-proven architecture—including an asynchronous flow model—with 3D hybrid bonding, we are removing the physical barriers to power-efficient trillion-parameter scalability. We aren’t just building a faster chip; we are building a more practical roadmap for the future of AI.”

Catch more CIO Insights: Why Today’s Web Agent Benchmarks Don’t Reflect Real-World Reliability

[To share your insights with us, please write to psen@itechseries.com ]

MemryX Unveils MX4 Roadmap: Enabling Distributed, Asynchronous Dataflow for Highly Efficient Data Center AI

PR Newswire

Quick Links

Visit Our Other Sites

Krown Network and KrownDEX Achieve Perfect External Security Audit Score and 5-Star Confidence Rating

Imagen Network Advances Intelligent Rendering Frameworks for High-Fidelity Digital Assets

PR Newswire

Related posts

Leading ASPM Platform ArmorCode Appoints Aaron Feigin as Chief Marketing Officer

Integrated Media Technologies, Inc. (IMT) Forms IMT Cloud Services

Keeper Security Announces Integration with ServiceNow to Empower DevOps Teams With Next-Gen Secrets Management