CIO Influence
Cloud Guest Authors Machine Learning Natural Language

From Experiment to ROI: The Next Phase of GenAI Development for Enterprises

From Experiment to ROI: The Next Phase of GenAI Development for Enterprises

Generative AI (GenAI) is no longer just a fun tool to experiment with. Executives are moving into a “prove-it” phase. According to IBM, 72% of top-performing CEOs believe that “competitive advantage depends on who has the most advanced” GenAI models. That directly translates into a desire for tangible ROI, but per Deloitte, only 18-36% of businesses are actually realizing expected benefits to the extent they need to justify the investment. The pressure is now on.

Also Read: The Top Five Must-Haves for Picking an AI Security Solution

Businesses that haven’t settled on a coherent GenAI strategy don’t want to be completely left in the dust. And businesses that have don’t want to lose what they have left of their time advantage. Companies that have GenAI models in production are continually trying to improve performance , including reduced inaccuracies and fewer outright hallucinations. Amazon just launched its Rufus AI shopping assistant ahead of Prime Day with caveats that its responses and recommendations may not always be accurate. While customers will accept that now, they may not be so tolerant in the near future, and bottom lines would suffer.

CIOs have four choices when it comes to implementing GenAI — build your own model, customize a foundational model in one of two ways, or simply leverage an off-the-shelf model. Choosing the best path depends on a few factors: how quickly you want to get a model into production, how much you want to spend getting it there, and what kinds of data you have.

The first option is to create your own model from scratch. This method provides the greatest business advantage. However, it usually requires a significant investment and it definitely requires a lot of time and effort. The bar to start on your own GenAI model is now higher than ever before. OpenAI, Meta, Google, Anthropic, and smaller model providers like Cohere and Mistral have all made significant progress, and the indication is they won’t slow down anytime soon. As a consequence, you may end up falling further and further behind with nothing to show for it.

On the other hand, if you choose to work with an existing foundation model, you’ll have to assess your data situation and what you need the model to do to determine your approach. Key questions here include:

  • Is your use case domain specific (i.e., requiring specialized knowledge)?
  • Do you have data that already exists that addresses those specific needs? Is there a repository of existing documents specific to you?
  • Where are you sourcing your data from?
  • If you need to fine-tune a model, is there a third party you trust to work with?
  • How complex is the work your model will be doing?

There are two main methods of model customization: supervised fine-tuning (SFT) and Retrieval-Augmented Generation (RAG) embedding. SFT focuses on training a model for specific outcomes, which could include domain specificity, writing style or task optimization. RAG on the other hand is more about retrieving information from various knowledge bases to improve the accuracy and reliability of model responses.

Also Read: The Data Dilemma in the Era of AI

SFT typically takes longer predominantly because it involves updating the entire model’s parameters based-on a large dataset of labeled examples. This may be done in-house or with the help of a third-party annotator (which comes with its own time and capital cost). However, because of its tailored nature, SFT helps create a model that is suited to specific and specialized tasks. A customer service model is an example of one that benefits from SFT, as you want to ensure the model is making the right decisions about what is and isn’t eligible for a return.

Alternatively, RAG systems often rely on pre-trained language models and search engines, allowing for quicker updates and adaptations. RAG embedding has been described as a “general-purpose fine-tuning recipe.” It’ll take the least amount of time and money to complete, because it’s not using any specialized resources or a specialized model. Companies like NVIDIA even offer introductory workflows to get you started. While they still require a knowledge base, it generally doesn’t need the same extensive labeled dataset as supervised fine-tuning.

Finally, SFT can also be performed in concert with RAG. Together, they typically result in a model that is close to the level of performance a from-scratch model can achieve (thanks to the advances by other companies mentioned above).

To illustrate: say you are building a customer service model.

For a retailer with easy returns building a customer service model, performing SFT and RAG may not make the most sense. If  you very rarely reject returns, then you do not need the model to be capable of handling those more complex cases that may require outside resources or more specialized information provided by a labeled dataset; all you may need to do is perform RAG embedding optimization.

However, if the model will be used for a more complex business case, such as insurance or banking, SFT and RAG together offer a better-performing model. RAG will provide the model with the knowledge resources it needs to offer responses that are correct according to regulations; SFT allows the model to synthesize that information with the specialized data it already has to refer to.

In general, the more complex the task, the more likely you will need to do SFT and RAG, but  choosing this path means you can select an off-the-shelf model as a starting point.

However, not all models are created equally, and claims around performance can be confusing to navigate. For example, Anthropic’s recently-launched Claude 3.5 Sonnet claims to be its best-performing model yet. What’s the difference between ChatGPT-4o and ChatGPT-4o Mini?  What about Google and Gemini? The questions can be nearly endless.

To make matters more difficult, we’re now reaching a point where models are becoming too smart for the most common benchmark tests — meaning new tests need to be designed that can actually evaluate performance.

Also Read: CIO Influence Interview with Kelly Ahuja, CEO, Versa Networks

As a result, choosing a model family to work with in your business can be confusing. Not only do you have to select a model family, but you can then select a specific model (for example, Anthropic has three Claude 3 models). There is no end-all, be-all best model out there. What will be the best for your company depends on several factors, including your data type, desired outcomes, and the budget you’re working with.

To choose the right model to get started with, follow this checklist:

  • Input and output: Do you need text-only, or a model capable of handling text, images, video, and audio? Are the tasks themselves particularly complex? Do you need the model modified to suit your business?
  • External factors: Are you prioritizing performance or budget? Does your company have an existing arrangement with major cloud providers like Microsoft Azure or Google Cloud Platform?
  • Safety and regulation: How critical is AI assistant safety for your usage? What regulations (such as the EU AI Act) will your model need to comply with?

While there are a number of models and model families available, answering these questions will likely point you to one of a smaller number of prominent models. OpenAI’s range of ChatGPT models includes GPT-4o, launched in May, the newly-launched ChatGPT-4o Mini, and GPT-4, launched in 2023. Google offers Gemini 1.5 Pro and recently launched 1.5 Flash (which prioritizes efficiency while still offering the multimodal functions of its more powerful sibling). As previously mentioned, Anthropic is the company behind the Claude models; its most recent is Claude 3.5 Sonnet, and Claude 3 Haiku, Sonnet, and Opus, which differ in performance levels and budgetary costs, are also available. It’s also worth considering Llama 2, Meta’s open-source model.

Simply put,  no two models are the same – not even two iterations of an off-the-shelf model used by different companies. Over time, they’ll learn off of different data. And choosing how to incorporate GenAI goes beyond looking for the “best” model — there no longer is just one “best” model. Instead, you’re looking for the best model for your business use case.

If possible, I encourage experimenting with multiple models to get a better sense of which one meets your needs. Making the wrong choice with such a powerful tool like GenAI could affect more than just your budget, after all – it could impact your entire business future.

Also Read: CIO Influence Interview with Jason Hardy, CTO at Hitachi Vantara

[To share your insights with us as part of editorial or sponsored content, please write to psen@itechseries.com]

Related posts

Infobip Announces Microsoft Dynamics 365 Marketing Integration to Support Marketing Communications

Business Wire

Over Half of Businesses Now Have a Policy on Whether to Pay Out on Ransomware Attacks, Says Databarracks Research

CIO Influence News Desk

SecurityScorecard to Offer Comprehensive Cybersecurity Ratings and Remediation Intelligence to Users of EY BRETA Platform

CIO Influence News Desk