First solution to combine dense, sparse, and image embeddings with vector search in one managed environment. Reduces latency, cuts network costs, and simplifies hybrid and multimodal search
Qdrant, the leading provider of high-performance, open-source vector search, announced the launch of Qdrant Cloud Inference. This fully managed service allows developers to generate text and image embeddings using integrated models directly within its managed vector search engine offering Qdrant Cloud.
Also Read: CIO Influence Interview with Dipto Chakravarty, Chief Product and Technology Officer at Black Duck
“By uniquely offering both text and image embedding models within the same managed vector database, Qdrant enables developers to accelerate development cycles, simplify infrastructure management, and deliver richer, multimodal AI experiences.”
With Qdrant Cloud Inference, users can generate, store, and index embeddings in a single API call, turning unstructured text and images into search-ready vectors in a single environment. Directly integrating model inference into Qdrant Cloud removes the need for separate inference infrastructure, manual pipelines, and redundant data transfers. This simplifies workflows, accelerates development cycles, and eliminates unnecessary network hops for developers.
“Traditionally, embedding generation and vector search have been handled separately in developer workflows,” said André Zayarni, CEO and Co-Founder of Qdrant. “With Qdrant Cloud Inference, it feels like a single tool: one API call with optimal resources for each component.”
“Qdrant helped us unify and simplify our entire search infrastructure. Hybrid search performs with precision and speed, and that’s been instrumental in boosting agent performance. As we advance further into agentic applications, every millisecond matters, so we see Qdrant Cloud Inference as a strong enabler for accelerating our pipelines. The fact that it supports both text and image embeddings is an added benefit, since multimodal search is part of our roadmap,” said Kshitiz Parashar, Founding Engineer and Vector Infra Lead, Alhena AI.
“Embedding generation and management is often a fragmented, complicated workflow for developers building AI-driven applications. Seventy-five percent of organizations are using 6 to 15 tools for management,” said Paul Nashawaty, Practice Lead for theCUBE Research. “Qdrant Cloud Inference improves the developer experience by unifying these capabilities into their cloud. By uniquely offering both text and image embedding models within the same managed vector database, Qdrant enables developers to accelerate development cycles, simplify infrastructure management, and deliver richer, multimodal AI experiences.”
Dense and sparse text embeddings and dense image embeddings for every use case
Qdrant Cloud Inference is the only managed vector database offering multimodal inference (using separate image and text embedding models), natively integrated in its cloud. Supported models include MiniLM, SPLADE, BM25, Mixedbread Embed-Large, and CLIP for both text and image. The new offering includes up to 5 million free tokens per model each month, with unlimited tokens for BM25. This enables teams to build and iterate on real AI features from day one.
Together, these capabilities simplify and accelerate building applications with Multimodal Search, RAG and Hybrid Search. Additional models will become available over time, as Qdrant expands its integrated inference capabilities, giving developers the flexibility to choose the right model for each workload and use case.
Also Read: The CIO’s New Mandate: Weaving the Unified Data Fabric for AI-Powered Enterprise Decisions
[To share your insights with us as part of editorial or sponsored content, please write to psen@itechseries.com]

