CIO Influence
CIO Influence Interviews Computing Data Management IT and DevOps Machine Learning Technology

CIO Influence Interview with Sid Sheth, Founder and CEO of d-Matrix

CIO Influence Interview with Sid Sheth, Founder and CEO of d-Matrix

Sid Sheth, Founder and CEO of d-Matrix chats about their first gen AI inference platform while taking us through some of d-Matrix other enhancements that benefit IT teams in this CIO Influence interview:

_______

Hi Sid, tell us about d-Matrix and the platform’s journey over the years?

We started d-Matrix in 2019 with a singular focus: inference acceleration at datacenter scale. Back then, I’d walk into investor meetings and get asked “what is inference?” because the industry was still entirely focused on training, while we saw something different: we knew intuitively that the whole world would want to use AI, inference, but not everyone would train models.

The timing required some luck, but our vision was deliberate. We built our platform around compute and memory integration from day one, what we call digital in-memory compute, or DIMC. We took a chiplet-based approach for modularity and went deep on solving the memory bandwidth bottleneck. Then ChatGPT happened, then DeepSeek, and the world kept moving toward exactly what we’d been building for. Today, with Corsairโ€”our first-generation AI inference platformโ€”beginning to ramp toward commercial scale, and our next-gen Raptor accelerator with 3D DRAM stacking, we call it 3DIMC, is already of great interest to our customers, we’re more relevant than ever.

We’d love to hear about your recent acquisition of GigaIO and how this will impact end users?

GigaIO brought us something critical: strong expertise in rack-scale systems and scale-up interconnect technology to accelerate our ability to meet the demand for heterogeneous ย infrastructure for running AI with extreme efficiency. ย ย For end users, a disaggregated heterogeneous approach to AI changes the economics and flexibility of deployment; you’re not locked into buying massive, expensive clusters upfront. You can start smaller, scale as demand grows, and reconfigure on the fly as workloads shift. Inference isn’t one-size-fits-all, as you’ve got frontier model inference, disaggregated inference, sub-frontier model inference, , all with different resource needs. GigaIO’s technology lets us build systems that adapt to how inference runs best.

How does rack-scale infrastructure and high-performance interconnects help with heavy AI workloads?

Most AI infrastructure today was built for training AI: big, monolithic, systems optimized for raw compute throughput. But inference, especially modern inference with reasoning models and multi-stage workflows, doesn’t have the same requirements. You need fast memory access, predictable latency, and orchestration across different tasks that don’t all need the same resources at the same time.

Rack-scale architecture with high-performance interconnects lets you treat the entire rack as a composable resource pool rather than a stack of isolated servers. You can move data where it needs to be without bottlenecking, dynamically allocate compute and memory based on what each stage of the pipeline requires, and do it all with consistent, low-latency performance. It’s the difference between forcing every workload onto the same processor versus designing a system where the right resource handles the right task, which is what heterogeneous compute is all about.

Also Read:ย CIO Influence Interview With Jake Mosey, Chief Product Officer at Recast

Why are AI Inference platforms becoming the need of the hour? What should businesses keep in mind when choosing from platforms today?

Inference has moved from simple prompt-response tasks to multi-stage, interactive systems. You’re seeing reasoning models, agentic workflows, real-time code generation, and interactive video, all of which demand consistent latency and predictable performance, not just peak throughput. The applications people want to build today can’t run on infrastructure designed for a different era.

When businesses evaluate platforms, they need to ask: Does this architecture solve for how inference actually runs today, or is it optimized for yesterday’s workload? Look at memory architecture; if it’s HBM-based, you’re slowed down by the physics of needing to transfer data back and forth. Youโ€™re also competing directly with NVIDIA’s supply chain and cost structure. You also need to ask about latency predictability, not just raw speed. Andย  flexibility is key: can you deployย  an inference solution within your existing infrastructure, or are you locked into a single vendor’s rack-and-cooling solution? The platforms that succeed will be the ones designed around real-world inference constraints: cost per query, latency consistency, and the ability to coexist with what you’ve already invested in.

What are some of the limits surrounding latency, cost, energy that modern AI innovators and tech innovators should be more conscious about?

Latency kills interactivity, so if your reasoning model takes 30 seconds to think, your user experience collapses. Cost per query determines whether your application is economically viable at scale. You can’t build a mass-market product if every inference call burns through expensive GPU cycles. And energy is the real constraint nobody wants to talk about: you can ship all the chips you want, but if there’s no power to turn them on, they’re just expensive, untapped inventory.

The mistake I see repeatedly is conflating peak performance with real-world performance. A chip that delivers incredible FLOPs under ideal conditions but has terrible latency variance or burns excessive power per query isn’t a win, but a bottleneck dressed up as a benchmark. The smartest innovators are the ones asking: what’s my total cost of ownership? What’s my energy footprint per inference? Can I deploy this at the edge, in the datacenter, across sovereigns with different power availability? Because scaling AI isn’t just about making chips perform faster, itโ€™s about making AI sustainable, affordable, and deployable everywhere.

A few thoughts on the future of AI and technology before we wrap up?

We’re entering the era of heterogeneous compute; while the GPU-only approach served us well for training and early inference, that model doesn’t scale for today, and it certainly doesnโ€™t for what’s coming next. You’re going to see specialized architectures working together, GPUs handling what they do best, inference accelerators like ours handling low-latency, cost-sensitive workloads, and maybe quantum for specific problem classes down the road. Specialization isn’t fragmentation; it’s how every major computing platform has evolved.

The other thing I’m watching closely: the application layer is finally catching up to the infrastructure. For years, we’ve been infrastructure-limited, and now we’re seeing reasoning models, agentic systems, interactive video generation, and applications that assume inference is fast, c****, and ubiquitous. Companies that recognize this shift and build infrastructure that respects how modern AI must run will be able to scale most quickly. We are past the point where brute force wins; now, systems thinking wins.

Catch more CIO Insights:ย Why CIOs are becoming chief risk orchestrators?

[To share your insights with us, please write toย psen@itechseries.com ]

d-Matrix is pioneering accelerated computing for AI inference, breaking through the limits of latency, cost and energy. Its Corsair compute accelerators, JetStream IO accelerators, and Aviator software deliver fast, sustainable AI inference at data center scale.

Sid Sheth, is Founder and CEO of d-Matrix

Related posts

APIsec Launches Automated Penetration Testing to Secure APIs

CIO Influence News Desk

OpsMx Announces โ€œCode to Cloudโ€ Application Delivery Security and Compliance for Kubernetes Environments

GlobeNewswire

CGI launches high-security sovereign AI platform in Finland for enterprise and public sector use

PR Newswire