
Expedera Inc., a leading provider of scalable Neural Processing Unit (NPU) semiconductor intellectual property (IP), launched its new Origin Evolution NPU IP today. This innovative technology advances Generative Artificial Intelligence (GenAI) capabilities in edge devices. Origin Evolution effectively manages the unique workload requirements of running LLMs on resource-constrained devices, as well as CNNs (Convolutional Neural Networks) and RNNs (Recurrent Neural Networks).
Read More onย CIO Influence:ย AI-Augmented Risk Scoring in Shared Data Ecosystems
Running LLM inference in edge hardware is crucial because it reduces latency and eliminates security concerns associated with cloud-based implementations. However, deploying LLMs in resource-constrained systems poses challenges due to their large model sizes and significant computational requirements. Consequently, edge designs require specialized hardware that can effectively address their unique resource constraints, including power, performance, area (PPA), latency, and memory requirements. Moreover, innovative software optimizations are essential, including model compression, hardware optimization, attention optimization, and the creation of dedicated frameworks to manage computational and energy constraints at the edge.
“Origin Evolution is a radical advancement providing an AI inference engine with out-of-the-box compatibility with popular LLM and CNN networks, that produces ideal results in applications as varied as smartphones, automobiles, and data centers,” saidย Siyad Ma, CEO and co-founder of Expedera. “It builds on our years of engineering advancements and is incredibly exciting for our customers and the myriad brands that want to utilize GenAI in their products.”
Scalable to 128 TFLOPS in a single core and to PetaFLOPS and beyond with multiple cores, Origin Evolution can be configured to produce optimal PPA results in a wide range of applications. Origin Evolution significantly reduces memory and system power needs while increasing processor utilization. Compared to alternative solutions, its packet-based processing reduces external memory moves by more than 75% for Llama 3.2ย 1Bย and Qwen2ย 1.5 B. Even in highly memory-bound use cases, Origin Evolution excels in producing 1000s of effective TFLOPS and dozens of tokens per second per mm2ย of silicon.
Origin Evolution can support custom and ‘black box’ layers and networks, while offering out-of-the-box support for today’s most popular networks, including Llama3, ChatGLM, DeepSeek, Qwen, MobileNet,ย Yolo, MiniCPM, and many others. Origin Evolution NPU IP solutions are available now and are production-ready and silicon-proven in customer production designs.
Also Read:ย AppDirect Appoints Carl Emond as the General Manager of ITCloud
Origin Evolution allows users to implement existing trained models with no reduction in accuracy and no retraining requirements, with confidence in achieving ideal PPA. It uses Expedera’s unique packet-based architecture to achieve unprecedented NPU efficiency. Packets, contiguous fragments of neural networks, overcome the hurdles of large memory movements and differing network layer sizes, which LLMs exacerbate. The architecture routes the packets through discrete processing blocks, including Feed Forward, Attention, and Vector, which accommodate the varying operations, data types, and precisions required when simultaneously or separately running LLM and CNN networks. Origin Evolution includes a high-speed external memory streaming interface compatible with the latest DRAM and HBM standards. Complementing the hardware stack is an advanced software stack, featuring support for network representations from HuggingFace, Llama.cpp, TVM, and others. It supports full integer and floating-point precisions (including mixed modes), layer fusion and fissions, and centralized control of multiple cores within a chip, chiplet, or system.
[To share your insights with us, please write toย psen@itechseries.com]

