CIO Influence
CIO Influence News IT and DevOps Networking

Alluxio Boosts AI/ML Support for Its Hybrid and Multi-Cloud Data Orchestration Platform

Alluxio Boosts AI/ML Support for Its Hybrid and Multi-Cloud Data Orchestration Platform
New features drastically improve I/O efficiency for data loading and preprocessing stages of an AI/ML training pipeline to reduce end-to-end training time and costs

Alluxio, the developer of open source data orchestration software for large-scale workloads, announced the immediate availability of version 2.7 of its Data Orchestration Platform. This new release has led to 5x improved I/O efficiency for Machine Learning (ML) training at significantly lower cost by parallelizing data loading, data preprocessing and training pipelines. Alluxio 2.7 also provides enhanced performance insights and support for open table formats like Apache Hudi and Iceberg to more easily scale access to data lakes for faster Presto and Spark-based analytics.

“Alluxio 2.7 further strengthens Alluxio’s position as a key component for AI, Machine Learning, and deep learning in the cloud,” said Haoyuan Li, Founder and CEO, Alluxio. “With the age of growing datasets and increased computing power from CPUs and GPUs, machine learning and deep learning have become popular techniques for AI. This rise of these techniques advances the state-of-the-art for AI, but also exposes some challenges for the access to data and storage systems.”

Top iTechnology AIOps News: Relativity and Ari Kaplan Advisors Release Report on Maximizing Collections in Evolving e-Discovery Environment

“We deployed Alluxio in a cluster of 1000 nodes to accelerate the data preprocessing of model training on our game AI platform. Alluxio has proven to be stable, scalable and manageable,” said Peng Chen, Engineer Manager in the big data team at Tencent. “As more and more big data and AI applications are containerized, Alluxio is becoming the top choice for large organizations as an intermediate layer to accelerate data analytics and model training.”

“Data teams with large-scale analytics and AI/ML computing frameworks are under increasing pressure to make a growing number of data sources more easily accessible, while also maintaining performance levels as data locality, network IO, and rising costs come into play,” said Mike Leone, Analyst, ESG. “Organizations want to use more affordable and scalable storage options like cloud object stores, but they want peace of mind knowing they don’t have to make costly application changes or experience new performance issues. Alluxio is helping organizations address these challenges by abstracting away storage details while bringing data closer to compute, especially in hybrid cloud and multi-cloud environments.”

Alluxio 2.7 Community and Enterprise Edition features new capabilities, including:

Alluxio and NVIDIA’s DALI for ML

NVIDIA’s Data Loading Library (DALI) is a commonly used python library which supports CPU and GPU execution for data loading and preprocessing to accelerate deep learning. With release 2.7, the Alluxio platform has been optimized to work with DALI for python-based ML applications which include a data loading and preprocessing step as a precursor to model training and inference. By accelerating I/O heavy stages and allowing parallel processing of the following compute intensive training, end-to-end training on the Alluxio data platform achieves significant performance gains over traditional solutions. The solution is scale-out as opposed to other solutions suitable for smaller data set sizes.

Top iTechnology AIOps News: Relativity and Ari Kaplan Advisors Release Report on Maximizing Collections in Evolving e-Discovery Environment

Data Loading at Scale

At the heart of Alluxio’s value proposition is data management capabilities complimenting caching and unification of disparate data sources. As the use of Alluxio has grown for compute and storage spanning multiple geographical locations, the software continues to evolve to keep scaling using a new technique for batching data management jobs. Batching jobs, performed using an embedded execution engine for tasks such as data loading, reduces the resource requirements for the management controller lowering cost of provisioned infrastructure.

Ease of Use on Kubernetes

Alluxio now supports a native Container Storage Interface (CSI) Driver for Kubernetes, as well as a Kubernetes operator for ML making it easier than ever before to operate ML pipelines on the Alluxio platform in containerized environments. The Alluxio volume type is now natively available for Kubernetes environments. Agility and ease-of-use are a constant focus in this release.

Insight Driven Dynamic Cache Sizing for Presto

An intelligent new capability, called Shadow Cache, makes striking the balance between high performance and cost easy by dynamically delivering insights to measure the impact of cache size on response times. For multi-tenant Presto environments at scale, this new feature significantly reduces the management overhead with self-managing capabilities.
“Data platform teams utilize Alluxio to streamline data pre-processing and loading phases in a world where storage is separated from ML computation,” said Adit Madan, Senior Product Manager, Alluxio.  “This simplicity enables maximum utilization of GPUs with frameworks such as Spark ML, Tensorflow and PyTorch. The Alluxio solution is available on multiple cloud platforms such as AWS, GCP, and Azure Cloud, and now also on Kubernetes in private data centers or public clouds.”

Alluxio is the developer of open source data orchestration software for the cloud. Alluxio moves data closer to data analytics and machine learning compute frameworks in any cloud across clusters, regions, clouds and countries, providing memory-speed data access to files and objects.  Intelligent data tiering and data management deliver consistent high performance to customers in financial services, high tech, retail and telecommunications.

Top iTechnology Cloud News: Eagle Tech Corp and Check Point Announce New Partnership

[To share your insights with us, please write to sghosh@martechseries.com]

Related posts

Gcore Joins Forces With Super Protocol Right Before The Testnet Phase Two Launch

CIO Influence News Desk

iSAT Africa and SES Networks to Provide Reliable 4G Services in East Africa via O3b mPOWER

Leading Pharma Company Brings Products to Market Faster with Datatron MLOps and AI Governance Solution

CIO Influence News Desk

Leave a Comment