Fermyon Serverless AI Efficiently Timeshares Each GPU for Thousands of AI Developers Concurrently
Fermyon Technologies, the serverless WebAssembly company, announced Fermyon Serverless AI, a new capability that radically alters the field of the technology industry’s largest, most dominant paradigm shift in decades, AI. Serverless AI is now available on Fermyon Cloud’s free tier, showcasing Fermyon’s supersonic startup time for AI inferencing with LLMs.
Fermyon will be showcasing Serverless AI in Fermyon Cloud and Spin this week at the Civo Navigate conference as well as at the Linux Foundation’s WasmCon conference.
“Enterprises wishing to build AI applications that go beyond simple chat services face a largely insurmountable dilemma – it’s either cost prohibitive or it’s abysmally slow and, therefore, often abandon plans to build AI apps. Fermyon has used its core WebAssembly-based cloud compute platform to run fast AI inferencing workloads. It achieves this by using its technology to only use the GPU for the duration of the inferencing request, thus multiplexing thousands of requests into a single GPU,” said Omdia analyst Roy Illsley.
CIO INFLUENCE News: Cohesity and Carahsoft Partner to Provide Data Security and Management Solutions to the Public Sector
Inferencing on large language models (LLMs) is one of the most popular workloads in computing today. Demand for GPUs is high but the equipment itself is scarce and expensive. As a result, developers tasked with building and running enterprise AI apps on LLMs like LLaMA2 face a 100x compute expense for access to GPUs at $32/instance-hour and upwards. Alternatively, they can use on-demand services but then experience abysmal startup times. This makes it impractical to deliver enterprise-based AI apps affordably.
Fermyon Serverless AI has solved this problem by offering 50 millisecond cold start times, over 100x faster than other on-demand AI infrastructure services. This breakthrough is made possible because of serverless WebAssembly technology powering Fermyon Cloud, the fastest, most secure, most flexible and most affordable serverless solution on the market. Fermyon Cloud is architected for sub-millisecond cold starts and high-volume time-slicing of compute instances which has proven to alter compute densities by a factor of 30x. Extending this runtime profile to GPUs makes Fermyon Cloud the fastest AI inferencing infrastructure service.
CIO INFLUENCE News: InfluxData Announces InfluxDB Clustered to Deliver Time Series Analytics for On-Premises and Private Cloud Deployments
“At Fermyon, we set out to build the next wave of cloud computing by squeezing every last bit of efficiency out of CPU utilization. With the boom in AI interest, we extended this same performance profile to high-end GPUs. GPUs are essential to AI. But compared to CPUs, GPUs are massively more expensive. The solution is to improve efficiency and time-sharing of GPU usage. And we do that with a WebAssembly-powered serverless platform that boasts supersonic startup speed, a strong security sandbox, and most of all, platform neutrality that extends beyond just OS and CPU, but to GPU architecture as well. Fermyon’s new Serverless AI is the easiest, fastest and cheapest way to build enterprise AI inferencing apps,” said Matt Butcher, co-founder and CEO of Fermyon.
Fermyon Serverless AI brings a new tool to the fullstack developer’s toolbox. Combined with Fermyon’s NoOps SQL Database and Key Value Storage, developers can quickly build advanced AI-enabled serverless applications without needing external vector databases or storage.
Fermyon Serverless AI has been added to both Fermyon Cloud and Spin and is currently in private beta. Developers can work locally with the AI inferencing technology in Spin, the popular open source product, with more than 3900 GitHub stars and over 105,000 downloads, that is the easiest way for developers to build WebAssembly serverless apps. And with one command they can deploy the application to Fermyon Cloud, taking advantage of powerful AI grade GPUs. Developers can sign up to join Fermyon’s private beta.
Fermyon Serverless AI is powered by Civo’s GPU compute service, and will extend to Civo’s carbon-neutral GPU offering announced at their Navigate Europe 2023 conference. This sustainable compute solution is built on the groundbreaking use of compute waste heat to meet the energy needs of heat-intensive industries.
“We are incredibly excited about this world-first breakthrough from Fermyon running on Civo’s core GPU compute. It promises a new era for how developers get up and running with AI, at a critical time for the technology. Civo’s compute is well placed to drive the success of Fermyon Serverless AI. Our service is inherently WebAssembly-ready, easy to use and quick to launch, with transparent pricing that meets a great price-performance point. We look forward to introducing our customers to this capability at our flagship conference, Civo Navigate,” said Mark Boost, co-founder and CEO at Civo, a leading cloud native service provider.
CIO INFLUENCE News: Enterprise Satisfaction With Providers Up; Emphasis on Innovation and Execution Growing
[To share your insights with us, please write to sghosh@martechseries.com]