Reduce hallucinations in LLM when no content is available
We previously saw how to generate verification questions and improve a baseline answer using the Chain of Verification. However, the previous article used a reference content to provide answers.
Here, we'll look how to reduce hallucinations when no content is provided. We'll continue using the CoVe method as a first principle.
Avoid more hallucinations with verification single-fact based questions
One pitfall when implementing the CoVe method is generating more hallucinations when answering the verification questions. Doing so will get us even farther from the truth.
This was stated numerous times by Yann LeCun as the architectural defect of current LLMs (being auto regressive).
What I found is that we can mitigate the generation of more hallucination by specifically asking for verification questions that are close-ended and contain a single fact.
The prompt I used in the baseline verification was :
Write {n_questions} single-fact, close-ended questions to help
verify if there are mistakes in the answer.
Using the example question from the CoVe paper, I got:
> Name some politicians who were born in NY, New York
# Baseline answer
Some politicians who were born in NY, New York include Hillary Clinton,
Donald Trump, Franklin D. Roosevelt, Theodore Roosevelt, and Andrew Cuomo
# Verification questions
- Was Hillary Clinton born in NY, New York ?
- Was Donald Trump born in NY, New York ?
- Was Andrew Cuomo born in NY, New York ?
# Verification answers
- Yes, Hillary Clinton was born in Chicago, Illinois.
- Yes, Donald Trump was born in New York, New York.
- Yes, Andrew Cuomo was born in New York, New York.
Note the answer regarding Hillary Clinton, it starts with "Yes" but the model (gpt-3.5) manages to get the correct answer.
Previously, when the questions were not about a single fact, the model was completely off course :
> Is Hillary Clinton a politician born in NY, New York?
Yes, Hillary Clinton is a politician born in NY, New York.
How many verification questions should be asked
Asking more verification is a double edged sword and may produce hallucinations. Doing some more testing with up to 30 questions, I found that the model repeats over and over the same questions.
As rule of thumb, if the original question is open-ended, more (8 to 10) questions are desirable in order to scan the possible state space. On the contrary, when the question is closed-ended, a definitive answer can be reached with 3 to 5 verification questions.
Do not forget that hallucination cannot be completely removed and must be accounted for.
Testing the Chain of verification with a sample implementation
The PyLLMCore library now contains an implementation of the Chain of Verification and can be invoked like this:
>>> from llm_core.assistants import COVQuestionAnswering
>>> cov_qa = COVQuestionAnswering.ask(
... question="Name some politicians who were born in NY, New York"
... )
>>> print(cov_qa.revised_answer)
Some politicians who were born in NY, New York include Donald Trump,
Franklin D. Roosevelt, Theodore Roosevelt, and Andrew Cuomo.
The COVQuestionAnswering object has the following fields so you can actually debug the chain:
@dataclass
class COVQuestionAnswering:
question: str
baseline_answer: str
verification_questions: List[str]
verification_answers: List[str]
revised_answer: str
content: str = ""
Next steps : Zero-resource hallucination prevention for large language models
A recent paper, Zero-resource hallucination prevention for large language models is using another technique that can further reduce hallucinations.
I intend to test both CoVe and the latter with a smaller model, Mistral AI Instruct v0.1.
Subscribe to get notified when a new article is published.