The HARG Truth: AI’s Need for the Human Element

Introduction

In the evolving landscape of AI-powered systems, combining human intuition with machine efficiency can create robust and reliable solutions. The Human-Augmented Retrieval Generation (HARG) method builds upon the Retrieval Augmented Generation (RAG) model, integrating a crucial human touch to the pipeline.

To understand HARG, it's essential to first understand how RAG operates:

  1. Query: For example, User asks a question
  2. Retrieval Step: Question is parsed and relevant documents related to the question are retrieved
  3. Documents and original query: Documents and the original prompt are fed to the language model
  4. Response: Answer is generated based on the documents and original query

This is a distilled overview. Each step contains its own intricacies and nuances, but the basic framework is as outlined above.

While RAG boasts numerous promising applications, it sometimes falls short. There are instances where the retrieved documents might be similar to the query but not strictly relevant. For instance, if someone inquires about Manchester United's performance this season, and the system retrieves documents related to the seasons '74-'75, '78-'79, and '87-'88, the response would be imprecise. A human, upon reviewing, would likely notice this discrepancy and adjust the query accordingly (add the current year to it) or pick manually correct documents as context.

Adding a human step

HARG is designed for knowledge-intensive tasks that not only rely on accurate retrieval of information but also human judgment to select the most appropriate context. Unlike RAG, which automatically concatenates retrieved documents as context, HARG proposes a step where a human reviews the suggestions made by the retrieval component. This ensures that the selected context is both relevant and appropriate, thereby further reducing the chances of “hallucination” or generation of incorrect or irrelevant information.

Here’s how HARG operates:

  1. Query: For example, User asks a question
  2. Retrieval Step: Just like RAG, HARG retrieves a set of relevant/supporting documents from a source (e.g., Wikipedia) based on the input.
  3. Human Selection Step: Instead of automatically feeding the retrieved documents to the generator, a human expert reviews and selects the most pertinent context from the suggestions.
  4. Documents and original query: Documents and the original prompt are fed to the language model
  5. Response: Answer is generated based on the documents and original query

The inclusion of the human element in HARG serves a dual purpose: enhancing reliability by minimizing machine errors and ensuring the context aligns well with human intuition and understanding.

With the growing emphasis on human-in-the-loop AI systems, HARG bridges the best of both worlds, ensuring efficiency and relevance while maintaining the adaptability of retrieval-based generation models.

This HARG concept provides an additional layer of verification, ensuring more accurate and contextually appropriate responses.

Optimal use-cases for HARG

HARG might not be the optimal solution for use cases where the user is purely searching for answers to questions. The user might not know which documents are relevant to the query. Prominent use cases for HARG lie in co-pilot-like applications, where the user is generating something, e.g., code or parts of legal documents. In these cases, the user usually has some knowledge of whether the retrieved documents are relevant and contain answers.

One use case would be a helper tool for a tech support operator. The operator might have a traditional chat UI where they have a conversation with users. While chatting with a user, a HARG-enabled agent might analyze the conversation, fetch relevant documents based on user information, questions, etc., and surface them on the UI for the Operator. The Operator, on the other hand, can pick and choose relevant documents and ask the Agent to generate possible answers to users' questions based on human-augmented context.

While doing this, all the generated question/answer pairs can be stored to later improve the agent itself, for example, by fine-tuning. The same logic would apply to numerous co-pilot-like applications.