Amazon SageMaker HyperPod reduces time to train foundation models by up to 40% by providing purpose-built infrastructure for distributed training at scale
Amazon SageMaker Inference reduces foundation model deployment costs by 50% on average and latency by 20% on average by optimizing the use of accelerators
Amazon SageMaker Clarify now makes it easier for customers to evaluate and select foundation models quickly based on parameters that support responsible use of AI
Amazon SageMaker Canvas capabilities help customers accelerate data preparation using natural-language instructions and model building using foundation models in just a few clicks
BMW Group, Booking.com, Hugging Face, Perplexity, Salesforce, Stability AI, and Vanguard among the customers and partners using new Amazon SageMaker capabilities
At AWS re:Invent, Amazon Web Services, (AWS), an Amazon.com, Inc. company announced five new capabilities within Amazon SageMaker to help accelerate building, training, and deployment of large language models and other foundation models. As models continue to transform customer experiences across industries, SageMaker is making it easier and faster for organizations to build, train, and deploy machine learning (ML) models that power a variety of generative AI uses cases. However, to use models successfully, customers need advanced capabilities that efficiently manage model development, usage, and performance. That’s why most industry leading models such as Falcon 40B and 180B, IDEFICS, Jurassic-2, Stable Diffusion, and StarCoder are all trained on SageMaker. Today’s announcements include a new capability that further enhances SageMaker for scaling with models by accelerating model training time. Another new SageMaker capability optimizes managed ML infrastructure operations by reducing deployment costs and latency of models. AWS is also introducing a new SageMaker Clarify capability that makes it easier to select the right model based on quality parameters that support responsible use of AI. To help customers apply these models across organizations, AWS is also introducing a new no-code capability in SageMaker Canvas that makes it faster and easier for customers to prepare data using natural-language instructions. Additionally, SageMaker Canvas continues to democratize model building and customization by making it easier for customers to use models to extract insights, make predictions, and generate content using an organization’s proprietary data. These advancements build on SageMaker’s extensive capabilities to help customers innovate with ML at scale. To get started with Amazon SageMaker,
CIO INFLUENCE News: AnChain.AI and BlockTrace Join Forces to Provide Cutting Edge Solutions to National Security Sector Partners
Recent advancements in ML, along with ready availability of scalable compute capacity and the massive proliferation of data, have led to the rise of models that contain billions of parameters, making them capable of performing a wide range of tasks like writing blog posts, generating images, solving math problems, engaging in dialog, and answering questions based on a document. Today, tens of thousands of customers like 3M, AstraZeneca, Ferrari, LG AI Research, RyanAir, Thomson Reuters, and Vanguard are using SageMaker to make more than 1.5 trillion inference requests every month. In addition, customers like AI21 Labs, Stability AI, and Technology Innovation Institute are using SageMaker to train models with up to billions of parameters. As customers move from building mostly task-specific models to the large, general-purpose models that power generative AI, they work with massive datasets and more complex infrastructure setups—all while optimizing for cost and performance. Customers also want to build and customize their own models to create unique customer experiences, embodying the company’s voice, style, and services. With more than 380 capabilities and features added since the service was launched in 2017, SageMaker offers customers everything they need to build, train, and deploy production-ready models at scale.
“Machine learning is one of the most profound technological developments in recent history, and interest in models has spread to every organization,” said Bratin Saha, vice president of Artificial Intelligence and Machine Learning at AWS. “This growth in interest is presenting new scaling challenges for customers who want to build, train, and deploy models faster. From accelerating training, optimizing hosting costs, reducing latency, and simplifying the evaluation of foundation models, to expanding our no-code model-building capabilities, we are on a mission to democratize access to high-quality, cost-efficient machine learning models for organizations of all sizes. With today’s announcements, we are enhancing Amazon SageMaker with fully managed, purpose-built capabilities that help customers make the most of their machine learning investments.”
New capabilities make it easier and faster for customers to train and operate models to power their generative AI applications
As generative AI continues to gain momentum, many emerging applications will rely on models. But most organizations struggle to adapt their infrastructure to meet the demands of these new models, which can be difficult to train and operate efficiently at scale. Today, SageMaker is adding two new capabilities that help ease the burdens of training and deploying models at scale.
- SageMaker HyperPod accelerates FM training at scale: Many organizations want to train their own models using graphics processing units (GPU)-based and Trainium-based compute instances at low cost. However, the volume of data, size of the models, and time required for training models has exponentially increased the complexity of training a model, requiring customers to further adapt their processes to deal with these emerging demands. Customers often need to split their model training across potentially hundreds or thousands of accelerators. They then run trillions of data computations in parallel for weeks or months at a time, which is time consuming and requires specialized ML expertise. The number of accelerators and training time increases substantially compared to training task-specific models, so the likelihood of rare, small errors, like a single accelerator failure, compounds. These errors can disrupt the entire training process and require manual intervention to identify, isolate, debug, repair, and recover from the issue, further delaying progress. During the FM training process, customers are frequently required to pause, evaluate ongoing performance, and make optimizations to the training code. For uninterrupted model training, developers have to continuously save training progress (commonly known as checkpointing), so they don’t lose progress and can resume training from where they last left off. These challenges increase the time it takes to train a model, driving up costs and delaying the deployment of new generative AI innovations. SageMaker HyperPod removes the undifferentiated heavy lifting involved in building and optimizing ML infrastructure for training models, reducing training time by up to 40%. SageMaker HyperPod is pre-configured with SageMaker’s distributed training libraries that enable customers to automatically split training workloads across thousands of accelerators, so workloads can be processed in parallel for improved model performance. SageMaker HyperPod also ensures customers can continue model training uninterrupted by periodically saving checkpoints. When a hardware failure occurs during training, SageMaker HyperPod automatically detects the failure, repairs or replaces the faulty instance, and resumes the training from the last saved checkpoint, removing the need for customers to manually manage this process and helping them train for weeks or months in a distributed setting without disruption.
CIO INFLUENCE News: New Synopsys Research Reveals Vast Majority of Organizations Report DevOps Delays Due to Critical Security Issues
- SageMaker Inference reduces model deployment costs and latency: As organizations deploy models, they are constantly looking for ways to optimize their performance. To reduce deployment costs and decrease response latency, customers use SageMaker to deploy models on the latest ML infrastructure accelerators, including AWS Inferentia and GPUs. However, some models do not fully utilize the accelerators available with those instances, leading to an inefficient use of hardware resources. Some organizations also deploy multiple models to the same instance to better utilize all of the available accelerators, but this requires complex infrastructure orchestration that is time consuming and difficult to manage. When multiple models share the same instance, each model has its own scaling needs and usage patterns, making it challenging to predict when customers need to add or remove instances. For example, one model may be used to power an application where usage can spike during certain hours, while another model may have a more consistent usage pattern. In addition to optimizing costs, customers want to provide the best end-user experience by reducing latency. Because models outputs could range from a single sentence to an entire blog post, the time it takes to complete the inference request varies significantly, leading to unpredictable spikes in latency if the requests are routed randomly between instances. SageMaker now supports new inference capabilities that help customers reduce deployment costs and latency. With these new capabilities, customers can deploy multiple models to the same instance to better utilize the underlying accelerators, reducing deployment costs by 50% on average. Customers can also control scaling policies for each model separately, making it easier to adapt to model usage patterns while optimizing infrastructure costs. SageMaker actively monitors instances that are processing inference requests and intelligently routes requests based on which instances are available, achieving 20% lower inference latency on average.
New capability helps customers evaluate any model and select the best one for their use case
Today, customers have a wide range of options when choosing a model to power their generative AI applications, and they want to compare these models quickly to find the best option based on relevant quality and responsible AI parameters (e.g., accuracy, fairness, and robustness). However, when comparing models that perform the same function (e.g., text generation or summarization) or that are within the same family (e.g., Falcon 40B versus Falcon 180B), each model will perform differently across various responsible AI parameters. Even the same model fine-tuned on two different datasets could perform differently, making it challenging to know which version works best. To start comparing models, organizations must first spend days identifying relevant benchmarks, setting up evaluation tools, and running assessments on each model. While customers have access to publicly available model benchmarks, they are often unable to evaluate the performance of models on prompts that are representative of their specific use cases. In addition, these benchmarks are often hard to decipher and are not useful for evaluating criteria like brand voice, relevance, and style. Then an organization has to go through the time-consuming process of manually analyzing results, and repeating this process for every new use case or fine-tuned model.
SageMaker Clarify now helps customers evaluate, compare, and select the best models for their specific use case based on their chosen parameters to support an organization’s responsible use of AI. With the new capability in SageMaker Clarify, customers can easily submit their own model for evaluation or select a model via SageMaker JumpStart. In SageMaker Studio, customers choose the models that they want to compare for a given task, such as question answering or content summarization. Customers then select evaluation parameters and upload their own prompt dataset or select from built-in, publicly available datasets. For sensitive criteria or nuanced content that requires sophisticated human judgement, customers can choose to use their own workforce, or a managed workforce provided by SageMaker Ground Truth, to review the responses within minutes using feedback mechanisms. Once customers finish the setup process, SageMaker Clarify runs its evaluations and generates a report, so customers can quickly evaluate, compare, and select the best model based on performance criteria.
New Amazon SageMaker Canvas enhancements make it easier and faster for customers to integrate generative AI into their workflows
Amazon SageMaker Canvas helps customers build ML models and generate predictions without writing a single line of code. Today’s announcement expands on SageMaker Canvas’ existing, ready-to-use capabilities that help customers use models to power a range of use cases, in a no-code environment.
- Prepare data using natural-language instructions: Today, the visual interface in SageMaker Canvas makes it easy for those without ML expertise to do their own data preparation, but some customers want a faster, more intuitive way to navigate their datasets. Customers can now get started quickly with sample queries and ask ad-hoc questions throughout the process to streamline data preparation. Customers can also do complex transformations, using natural-language instructions to fix common data problems like filling in missing values in a column. With this new no-code interface, customers can dramatically simplify how they work with data on SageMaker Canvas, reducing time spent preparing data from hours to minutes.
- Leverage models for business analysis at scale: Customers use SageMaker Canvas to build ML models and generate predictions for a variety of tasks, including demand forecasting, customer churn prediction, and financial portfolio analysis. Earlier this year, SageMaker Canvas made it possible for customers to access multiple models on Amazon Bedrock, including models from AI21 Labs, Anthropic, and Amazon, along with models from MosaicML and TII and through SageMaker Jumpstart. With the same no-code interface they use today, customers can upload a dataset and select a model, and SageMaker Canvas automatically helps customers build custom models to generate predictions immediately. SageMaker Canvas also displays performance metrics, so customers can collaborate easily to generate predictions using models and understand how well the FM is performing on a given task.
Hugging Face is a leading machine learning company and open platform for AI builders, offering open foundation models, and the tools to create them. “Hugging Face has been using SageMaker HyperPod to create important new open foundation models like StarCoder, IDEFICS, and Zephyr, which have been downloaded millions of times,” said Jeff Boudier, head of Product at Hugging Face. “SageMaker HyperPod’s purpose-built resiliency and performance capabilities have enabled our open science team to focus on innovating and publishing important improvements to the ways foundation models are built, rather than managing infrastructure. We especially liked how SageMaker HyperPod is able to detect ML hardware failure and quickly replace the faulty hardware without disrupting ongoing model training. Because our teams need to innovate quickly, this automated job recovery feature helped us minimize disruption during the foundation model training process, helping us save hundreds of hours of training time in just a year.”
Salesforce is a leading AI customer relationship management (CRM) platform, driving productivity and trusted customer experiences powered by data, AI, and CRM. “At Salesforce, we have an open ecosystem approach to foundation models, and Amazon SageMaker is a vital component, helping us scale our architecture and accelerate our go-to-market,” said Bhavesh Doshi, vice president of Engineering at Salesforce. “Using the new SageMaker Inference capability, we were able to put all our models onto a single SageMaker endpoint that automatically handled all the resource allocation and sharing of the compute resources, accelerating performance and reducing deployment cost of foundation models.”
Thomson Reuters is a leading source of information, including one of the world’s most trusted news organizations. “One of the challenges that our engineers face is managing customer call resources during peak seasons to ensure the optimal number of customer service personnel are hired to handle the influx of inquiries,” said Maria Apazoglou, vice president of Artificial Intelligence, Business Intelligence and Data Platforms at Thomson Reuters. “Historical analysis of call center data containing call volume, wait time, date, and other relevant metrics is time consuming. Our teams are leveraging the new data preparation and customization capabilities in SageMaker Canvas to train models on company data to identify patterns and trends that impact call volume during peak hours. It was extremely easy for us to build ML models using our own data, and we look forward to increasing the use of foundational models—without writing any code—through Canvas.”
Workday, Inc. is a cloud-based software vendor specializing in human capital management (HCM) and financial management applications. “More than 10,000 organizations around the world rely on Workday to manage their most valuable assets—their people and their money,” said Shane Luke, vice president of AI and Machine Learning at Workday. “We provide responsible and transparent solutions to customers by selecting the best foundation model that reflects our company’s policies around the responsible use of AI. For tasks such as creating job descriptions, which must be high quality and promote equal opportunity, we tested the new model evaluation capability in Amazon SageMaker and are excited about the ability to measure foundation models across metrics such as bias, quality, and performance. We look forward to using this service in the future to compare and select models that align with our stringent responsible AI criteria.”
CIO INFLUENCE News: Virgin Pulse Collaborates with AWS to Accelerate Homebase for Health Platform Innovation and User Experience
[To share your insights with us, please write to sghosh@martechseries.com]