Large Language Models (LLMs) have revolutionized various natural language processing tasks, demonstrating remarkable capabilities in general knowledge reasoning. However, adapting these models to specialized domains, such as legal documents, medical records, or company-specific information, remains a challenge.
This paper focuses on improving LLMs' performance in domain-specific question answering (QA) tasks. Existing approaches, such as supervised fine-tuning and Retrieval-Augmented Generation (RAG), have limitations. Supervised fine-tuning often fails to leverage external knowledge sources, while RAG struggles to handle irrelevant information effectively.
RAFT addresses these limitations by introducing a novel training recipe that combines the strengths of supervised fine-tuning and RAG. The core idea is to train LLMs to differentiate between relevant and irrelevant documents while answering questions in an "open-book" setting. This enables the model to focus on pertinent information and generate accurate answers.
Imagine preparing for an open-book exam. You wouldn't simply memorize the entire textbook; instead, you'd learn to identify and focus on relevant sections. RAFT applies this principle to LLMs, training them to ignore "distractor" documents and extract key information from relevant ones. This improves their ability to answer questions accurately within a specific domain.
RAFT involves the following steps:
RAFT consistently outperforms existing methods on various benchmarks, demonstrating significant improvements in domain-specific question answering. The inclusion of distractor documents during training enhances the model's robustness and ability to handle irrelevant information effectively.
Compared with the base Llama-2 instruction-tuned model, RAFT with RAG does much better in terms of extracting information as well as being robust towards distractors. The gain can be as big as 35.25% on Hotpot QA and 76.35% on Torch Hub evaluation. Compared with DSF on the specific dataset, our model does better at relying on the provided context to solve the problem. RAFT does much better on tasks like HotpotQA and HuggingFace datasets (30.87% on HotpotQA and 31.41% on HuggingFace).
Incorporating a reasoning chain guides the model to the right answer and also helps the model understand the task and improve the overall accuracy and enhances robustness.
A key finding here is:
incorporating a portion of the training data without the oracle document in the context (P = 80%) appears to enhance the model’s performance on RAG tasks.
That is, training your LLM without the correct corresponding context at times can be beneficial for the downstream task of answering questions related to the documents. This is counterintuitive, and I explain in the next section as to why this happens.
The technique can maintain consistent performance even when the number of documents retrieved during testing varies because it is finetuned to ignore the irrelevant text (distracting documents). In real-world applications, the amount of information available can fluctuate. For example, a search engine might return a different number of results depending on the query.
Many RAG pipelines which use oracle documents (relevant documents) without distractor documents end up generating imperfect output because the models are not trained on which parts of the prompts to ignore.
RAFT generalizes because of the way it's trained:
Infact, this generalizes to real time search query (Perplexity AI usecase) answering via an LLM too as the model understands which part of the prompt is not relevant and ignores that thus generating a more accurate answer.
Beyond basic question-answering applications, RAFT has significant potential to benefit businesses in various ways:
RAFT presents a promising approach for training LLMs to excel in domain-specific question answering. Its ability to handle irrelevant information and adapt to different retrieval settings makes it a valuable tool for various business applications. As research in this area continues, we can expect to see even more powerful and efficient LLMs that can effectively leverage domain-specific knowledge to solve complex problems.
RAFT gives better results than existing methods for domain-specific question answering because it trains LLMs to be more robust to irrelevant information. This is achieved by including distractor documents during the training process.
In essence, RAFT prepares the LLM for the "open-book" exam scenario by simulating the presence of distractor documents during training. This allows the model to develop the ability to sift through irrelevant information and focus on what's truly important for answering the question accurately.