Generative AI is transforming much of the daily activities of knowledge workers. Large Language Model (LLM) assistants, like ChatGPT and Gemini, are becoming essential tools for boosting productivity. These assistants manage a wide array of tasks, including but certainly not limited to:

  • generating content and ideas
  • proofreading text
  • getting personalized recommendations
  • summarization
  • coding

LLM assistants are developed in a way that allows them to understand and respond to user questions and instructions. In that sense, they show great promise to become first-class information sources. Nevertheless, they come with several inconveniences. Some LLMs are trained on data with a cut-off date, so they are unaware of recent events. They may also lack traceability, making it difficult to verify the reliability of their responses. LLMs produce responses based on next word prediction without having a real notion of what is right or wrong. Because of that, responses may sometimes be factually inaccurate, which we have come to call "hallucinations". These drawbacks pose problems in any context, but their impact is amplified in an enterprise setting. Moreover, enterprises are more vulnerable to reputation damage, which is one of the key risks of extensive and public reliance on LLM applications.

Despite the issues above, there are ways to mitigate the untrustworthy behavior of LLMs. A comparative study will examine three key methods that optimize LLM behavior: retrieval-augmented generation (RAG), fine-tuning, and prompt engineering. Additionally, a fourth novel method will briefly be introduced. These techniques are good examples of how refined LLM architecture can improve the accuracy, reliability, and overall utility of enterprise LLM applications.

Methods to optimize your generative AI application

Retrieval-Augmented Generation (RAG)

RAG ties an LLM to real-world, up-to-date, and accurate data sources. Instead of relying solely on its internal language representation, LLMs leverage a direct connection to separate information sources to augment the responses they generate. This paves the way for a variety of cutting-edge information retrieval applications, well-suited for enterprise use.

RAG applications are engineered to generate responses that embed or rephrase whole sections of a specialized source document without requiring additional training or adjustments to the underlying model. It comes with increased auditability as the retrieved content can be traced to the relevant data source. Moreover, it reduces the likelihood of hallucinations since it relies on hand-picked data sources to formulate a more informed response.

However, RAG is only as good as your data and it requires a highly data-centric architecture. LLMs are handled as a separate concern in RAG architecture. As such, RAG applications can surf the wave of LLM development by swapping to the latest model as needed. Conversely, if a model provider decides to discontinue a model, it is important to implement a backup plan. Lastly, as information retrieval is the cornerstone of a RAG application, great care should be taken to ensure the relevant context is correctly retrieved and subsequently provided to the end user in the desired format.

Fine-tuning

Fine-tuning involves further training a pre-existing LLM on specific data or tasks to develop language capabilities in a particular domain. This process is especially beneficial for organizations with technical language, such as in legal or medical institutions. LLMs are typically pre-trained to recognize general language and context, but fine-tuning aims to enhance models to meet specific needs and goals by focusing on select skills and tasks. Similarly, it is easier to teach a general physician to become specialized in pediatrics, rather than teaching someone from scratch. Fine-tuning allows the model to become more specialized, improving accuracy in complex tasks. More specifically, it will show differences in behavior, writing style, and vocabulary.

Effective fine-tuning requires high-quality, representative, and sufficiently large training data. It may also require additional infrastructure costs to host the customized LLM. It is often a complex process of trial and error to optimize model hyper-parameters and achieve the desired quality. Regular fine-tuning may become necessary to maintain optimal performance. Hence, fine-tuning in a safe, compliant way requires the involvement of mature data science profiles.

Prompt engineering

Prompt engineering is the most straightforward and accessible method to optimize Generative AI applications. It involves crafting prompts that provides the LLM with a more nuanced context and will consequently lead to a more appropriate output. Like writing instructions for an algorithm to complete a task and obtain the intended results, prompts should provide precise instructions to enable LLMs to effectively generate an accurate response. In prompt engineering, natural language is the new programming language, it requires thorough understanding of natural language nuances and the specific task at hand.

Prompt engineering can include techniques such as few-shot learning, where relevant examples are part of the prompt input, and chain-of-thought, which encourage the LLM to break down queries into simpler components. Using prompt templates can also help guide responses. There are many more creative ways to enable the LLM to optimize for the desired response.

Designing your generative AI system

A RAG application may sometimes require additional prompt engineering and model fine-tuning to achieve the desired quality. Conversely, a custom chatbot with minor prompt engineering might suffice. The optimization techniques that are eventually implemented depend entirely on the specific use case and often involve trial and error.

Method

Description

Key features

RAG

Augment LLM responses by retrieving relevant information from external data sources.

  • Improve factual accuracy
  • Leverage specialized knowledge bases
  • Increase auditability
Fine-tuning Training pre-existing LLMs on a set of data or tasks to gain domain expertise.
  • Gain proficiency in a specific domain
  • Learn uncommon skills and tasks
Prompt
Engineering

Crafting prompts that guide LLMs towards a desired output.

  • Customize LLM responses
  • Limited technical skills required
  • Hard to establish guardrails

Considerations for RAG

The key concept of RAG systems is knowledge. Common use cases include advanced question-answering systems and information retrieval systems. For example, in legal research, RAG can load knowledge bases updated with the latest legislations and retrieve specific context on demand, reducing the time and effort required, as opposed to traditional research methods where finding the relevant piece of information is sometimes akin to finding a needle in a haystack.

Considerations for fine-tuning

Fine-tuning involves identifying patterns. Custom LLMs are trained to perform for domain-specific functions. This ensures consistent and branded user experiences by adopting critical formats, voices, tones, and guidelines. Even though this method is complex compared to the other techniques, fine-tuning is ideal for tasks that are hard to articulate in a prompt, especially when it involves non-public information.

Considerations for prompt engineering

It makes sense to combine prompt engineering with one of the aforementioned techniques. Hybrid solutions that include RAG, fine-tuning, and prompt engineering are likely to result in highly potent systems.

What about AI agents?

The ongoing shift towards AI agents promises to expand beyond narrowly scoped and specialized AI systems. Autonomous LLM-based AI agents are designed with the ability to provide support across a vast scope of domains. The general idea is that multi-agent workflows break down complex tasks into smaller tasks that can be executed by agents, each of which have a clearly defined role and objective. They will then plan and perform multi-step tasks until their goals are achieved. For example, if the goal is to build a website, the roles might be front-end developer, back-end developer, and testing engineer. They can reason about their actions and initiate new actions based on human feedback. In this way, AI agents can continuously adapt their knowledge, skills, and behavior.

Conclusion

Generative AI, particularly using LLM assistants, is significantly enhancing productivity for knowledge workers. However, to truly leverage the potential of LLMs in enterprise settings, it is important to address their inherent limitations. Techniques such as RAG, fine-tuning, and prompt engineering offer viable solutions to improve the accuracy, reliability, and overall utility of these applications. By implementing these methods, enterprises can mitigate the risks associated with LLM usage and harness the full power of generative AI, ensuring that it serves as a robust and dependable tool in their operations. As AI technology continues to evolve, staying abreast of LLM optimization strategies will be key to maintaining a competitive edge and safeguarding enterprise reputation.

 

Author: Jorgo Haezaerts