New SEI Tool Enhances Machine Learning Model Test and Evaluation

Software systems with a machine learning (ML) component often fail in production. One reason is that ML models are frequently developed in isolation, making it impossible to test and evaluate against system and operational requirements and constraints. The Software Engineering Institute (SEI) at Carnegie Mellon University (CMU) today announced its release of a new tool to help teams developing ML-enabled software systems mitigate this problem. Machine Learning Test and Evaluation (MLTE), available for download from GitHub, is a semi-automated process and infrastructure for testing ML models based on stakeholder-generated quality attribute requirements.

Also Read: CIO Interview with Greg Anderson, CEO and founder at DefectDojo

ML model developers often work in silos. They lack knowledge of the overarching system or its operational environment. Without this context, developers can only evaluate a model on its accuracy, or the predictability of its output. Once the model is delivered, software engineers and quality assurance teams often have no specifications or knowledge to guide its testing. None of the groups can evaluate how well the model will work in production.

“The bottom line is that many models fail in production because they are not tested properly,” said Grace Lewis, a principal researcher at the SEI and lead of its Tactical and AI-Enabled Systems Initiative. “When ML-enabled systems fail operational tests because of problems with the model, it creates huge delays in system delivery, especially if new data needs to be collected to retrain the model.”

To fill this gap in the development of ML-enabled software, Lewis and her team at the SEI collaborated with the U.S. Army Artificial Intelligence Integration Center (AI2C) and Christian Kästner, an associate professor in the CMU School of Computer Science.

Also Read: FibreconX And Global Edge Launch New MSP Incentive And Platform To Offer Cutting Edge Fibre Solutions

They created MLTE, which applies best practices from traditional software development to ML model test and evaluation (T&E). The process brings together all the stakeholders of an ML-enabled software project, not just the ML developers, to negotiate the model’s quality attribute requirements based on system needs. Those attributes become specifications for automated internal and system-dependent testing. Test results populate reports that developers and other stakeholders can use to decide if the model is ready for production. If it is not, the reports can inform further iteration and testing. Special libraries within the MLTE infrastructure automate parts of the process.

“MLTE provides system and operational context for ML model developers to make informed decisions about design and development,” said Lewis. “Other stakeholders can better understand whether the requirements for models are realistic so that problems can be detected and fixed early in the process, not discovered in operational tests or production.”

MLTE is a system-centric, quality-attribute-driven, semi-automated process and infrastructure to enable negotiation, specification, and testing of ML model and system qualities. It incorporates TEC, an earlier SEI tool that detects mismatched expectations among the teams building an ML component. Both TEC and MLTE are part of an SEI effort to establish integrated T&E of ML capabilities throughout the Department of Defense.

[To share your insights with us as part of editorial or sponsored content, please write to psen@itechseries.com]

New SEI Tool Enhances Machine Learning Model Test and Evaluation

PR Newswire

Quick Links

Visit Our Other Sites

n2 Group Advances HPC/AI Portfolio by Acquiring Managed Services Company X-ISS

CIO Influence Interview with Tyler Healy, CISO, DigitalOcean

PR Newswire

Related posts

The Hidden Threat in Your Software Supply Chain

Avalanche Technology Enables The Orbital Internet and Mars/Moon Gateway With Space-IoT Architecture

Cowbell Cyber Introduces Microsoft Secure Score Connector to Improve Policyholders’ Cyber Risk Profile