Appen Launches AI Chat Feedback and Benchmarking Solutions for Enhanced LLM Evaluation

Appen Limited a leading provider of high-quality data for the AI lifecycle announced the launch of two new products that will enable customers to launch high-performing large language models (LLMs) whose responses are helpful, harmless and honest to reduce bias and toxicity. These solutions are:

Top AI ML CIO Guest Posts: Why Architecture Matters with Generative AI and Cloud Security

AI Chat Feedback — empower domain experts to assess a multi-turn live conversation, enabling them to review, rate and rewrite each response.
Benchmarking — a solution designed to help customers evaluate model performance across various dimensions, such as model accuracy, toxicity, etc.

The rise of LLM-based chatbots and assistants has accelerated demand for more sophisticated conversational AI that can support multiple tasks. It is important to test a LLMs contextual understanding and coherence in complex conversations that extend over multiple turns or dialogues, mirroring real-world applications. This will help identify strengths and weaknesses in handling extended interactions, ultimately enhancing the quality of user experiences and the model’s practical utility. Appen’s AI Chat Feedback manages the end-to-end flow of data through multiple rounds of evaluation and provides customers required data to help improve models.

Appen’s Benchmarking tool solves an inflection point businesses face while under pressure to enter the AI market quickly: how to determine the right LLM to choose for a specific enterprise application. Model selection has strategic implications for many dimensions of an application including user experience, ease of maintenance and profitability. With the Benchmarking solution, customers can evaluate the performance of various models along commonly used or fully custom dimensions. Combined with a curated crowd of Appen’s AI Training Specialists, the tool evaluates performance along demographic dimensions of interest such as gender, ethnicity and language. A configurable dashboard enables efficient comparison of multiple models across various dimensions of interest.

“As AI Chatbots grow more advanced, the stakes are higher for enterprises to get them right before they’re released into the world, or they risk harmful biases and dangerous responses that could have long-term impacts on the business,” said Appen CEO Armughan Ahmad. “Appen’s new evaluation products provide our customers with an essential trust layer that ensures they are releasing AI tools that are truly helpful and not harmful to the public. This trust layer is backed by robust datasets and processes that have proven effective in our 27 years of AI training work, and a team of over a million human experts who are attending to the nuances of the data.”

Top AI ML Insights: Why Architecture Matters with Generative AI and Cloud Security

Human feedback has been shown to be critical to the performance of LLM models. Appen’s world-class technology is reinforced by its global crowd of more than 1 million AI Training Specialists who evaluate datasets for accuracy and bias. The AI Chat Feedback tool directly connects a LLM output with specialists so that it can learn from diverse, natural chat data. Appen leveraged its over two decades of experience with intuitive, efficient annotation platforms to design a chat interface that demonstrates familiarity and ease. Specialists chat live with a model, whether a customer’s model or a third party’s, and rate, flag and provide context for their evaluation. This white-glove service extends to a project-dedicated staff who meticulously analyze each batch of data, uncovering edge cases and optimizing the data quality.

Recommended CIO Influence News: Liquidware Launches Liquidware Ready Program

[To share your insights with us, please write to sghosh@martechseries.com]

Appen Launches AI Chat Feedback and Benchmarking Solutions for Enhanced LLM Evaluation

PR Newswire

Quick Links

Visit Our Other Sites

Merging Digital and Physical Security Creates a Robust Defense Against Modern Threats, Says Info-Tech Research Group

Salt Security Partners with API Testing Leaders to Bring Best-of-breed Capabilities to API Security

PR Newswire

Related posts

VulnCheck and Filigran Partner to Transform Enterprise Threat Intelligence Capabilities

Owl Data Diodes Become a Critical Component of the Industry-Leading Cyber Recovery Data Vault Solution

Alphawave Semi Spearheads Chiplet-Based Custom Silicon for Generative AI and Data Center Workloads with Successful 3nm Tapeouts of HBM3 and UCIe IP