Databricks has announced new features for its persona-based platform using Machine Learning (ML) capabilities. This platform delivers a streamlined way for organizations to standardize the full data and machine learning lifecycle at any scale.
Databricks Machine Learning is now Live with two cutting-edge features.
These are:
- Databricks AutoML to augment model creation without sacrificing control and transparency, and
- Databricks Feature Store to improve discoverability, governance, and reliability of model features.
With Databricks Machine Learning, new and existing ML capabilities on the Databricks Lakehouse Platform are organized into a collaborative, role-based product surface that provides ML engineers with everything they need to build, train, deploy, and manage ML models from experimentation to production, uniquely combining data and the full ML lifecycle.
What is Databricks AutoML?
Jumpstart New Projects and Automate Tedious ML Tasks with Databricks AutoML
The introduction of new AutoML capabilities within Databricks ML allows data teams to quickly produce trained models either through a UI and or API.
Additionally, it also delivers the underlying experiments and notebooks used to create them so data scientists can easily validate an unfamiliar dataset or check the direction of a new ML project.
How AutoML Improves Agility in ML Projects?
AutoML has the potential to allow data teams to more quickly build ML models by automating a lot of heavy lifting involved in the experimentation and training phases. But, customers who want to use AutoML tools today often struggle with getting a machine learning model to production.
This happens because the tools provide no visibility into how they arrive at their final model.
This makes it impossible to modify its performance or troubleshoot it when edge cases in data lead to low confidence predictions. Additionally, it can be difficult for customers to satisfy compliance requirements that require them to explain how a model works, because they lack visibility into the model’s code.
Databricks’ unique ‘glass-box’ approach to AutoML offers transparency into how a model operates and allows users to take control at any time. Additionally, all AutoML experiments are integrated with the rest of the Databricks Lakehouse Platform to track all the related parameters, metrics, artifacts, and models associated with every trial run to make it easy to compare models and easily deploy them to production.
What is Databricks Feature Store?
Streamline ML at Scale With Simplified Feature Sharing and DiscoveryÂ
Machine learning models are built using features, which are the attributes used by a model to make a decision. To work most efficiently, data scientists need to be able to discover what features exist within their organization, how they are built, and where they are used. Otherwise, it’s far too easy to waste significant time reinventing features that already exist. Additionally, feature code needs to be kept consistent between offline use cases (training and batch inference) where data engineering teams own the code and online use cases (real-time inference) where application development teams own the code, otherwise the predictions will be inconsistent. Managing code changes between these teams is a major source of friction in quickly deploying and iterating ML models.
Read Also:Â ITechnology Interview with Bill Donlan, Executive VP, Digital Customer Experience at Capgemini North America
Databricks’ Feature Store is the first feature store to be co-designed with a data and MLOps platform. Feature Store allows data teams to easily facilitate the reuse of features across different models to avoid rework and feature duplication, which can save data teams months in developing new models. Features are stored in Delta Lake’s open format and can be accessed through Delta Lake’s native APIs.
Customers Can Completely Avoid the Challenge of Keeping Feature Code in Sync
Additionally, Feature Store supports both batch and real-time access for training, batch inference, and low-latency streaming inference use cases. Because both training and inference use cases are accessing Feature Store, customers can completely avoid the challenge of keeping feature code in sync. Additionally, through integration with MLflow, feature references are embedded in the model itself. This allows customers to update features without requiring any changes to the client by the application development team, significantly simplifying the model deployment process.
With Databricks Machine Learning, data scientists have the most complete and collaborative environment for the entire ML workflow. Data scientists can easily build high-quality data sets on the most up-to-date data, using familiar languages and a consistent format so data engineering and ML teams can work collaboratively and confidently to deploy and manage models that drive business value at an unmatched scale.
[To share your insights, please write to us at sghosh@martechseries.com]