Anyone that’s used an LLM like OpenAI’s ChatGPT knows that you need to fact check any fact that an LLM tells you. Recently this phenomenon came to the forefront when two lawyers cited six fake cases that were made up by ChatGPT. Many people have been working on minimizing the problem, but others in the industry are saying that it may never go away.
Why do LLMs hallucinate?
At its core, an LLM is an AI model that does one thing. It predicts the next word. That’s it. LLMs do not have any innate knowledge of the world or any specific subject. They may seem confident in their responses, but they are simply predicting which words to write next based on the data set that they’ve been trained on. As a result, an LLM will simply make up sentences that sound like the real thing when it doesn’t have a direct answer.
One of the more common examples of LLM hallucinations are when you ask about something that happened after the model was trained. Since LLMs are trained on data that is from a point in time, they won’t inherently know anything that happened after the model was trained.
As of writing this article, ChatGPT was trained on data that occurred prior to mid 2021. So, if you ask ChatGPT about something that happened in 2023, it will hallucinate an answer. As a result, the companies that make LLMs try to put in guard rails that will prevent LLMs from answering questions about things that happened more recently than the model knows about.
The Achilles heel of running LLM apps in production
If you’re using LLMs in a production application, hallucinations can be a problem. If you can’t trust the answers that come back, you can’t build a reliable application. So, how do we control hallucinations?
The easiest way to control hallucinations is to use LLMs primarily for their strengths. LLMs are great at summarizing and editing text. They are not research tools in and of themselves. Let’s take a look at an example.
I asked ChatGPT the following question
Explain the failure of silicon valley bank
This is what ChatGPT responds with:
OpenAI has thankfully put some guardrails around the response so the model doesn’t hallucinate, but it still doesn’t give me the correct answer.
The solution to this problem is to provide context to the model before asking it to answer a question. So, instead if I update the prompt to include some context from the wikipedia page about the collapse of SVB:
Context: Seeking higher investment returns from its burgeoning deposits, SVB had dramatically increased its holdings of long-term securities since 2021, accounting for them on a hold-to-maturity basis. The market value of these bonds decreased significantly through 2022 and into 2023 as the Federal Reserve raised interest rates to curb an inflation surge, causing unrealized losses on the portfolio. Higher interest rates also raised borrowing costs throughout the economy and some Silicon Valley Bank clients started pulling money out to meet their liquidity needs. To raise cash to pay withdrawals by its depositors, SVB announced on Wednesday, March 8 that it had sold over US$21 billion worth of securities, borrowed $15 billion, and would hold an emergency sale of some of its treasury stock to raise $2.25 billion. The announcement, coupled with warnings from prominent Silicon Valley investors, caused a bank run as customers withdrew funds totaling $42 billion by the following day.
Question: Explain the failure of SVB
Now the result is much more accurate:
If I then ask ChatGPT to summarize the results:
Now I have the correct answer.
This is an example of what’s called Retrieval-augmented generation (RAG). Instead of relying on the underlying LLM to answer the question, you augment your prompt with external data that the LLM uses to answer the question. In the case above, I manually added information from Wikipedia to the prompt, which then gave the LLM enough information to answer correctly. I then fed the generated answer into another prompt to summarize the answer for easy consumption.
RAG, Prompt Chains, and Vector Databases
Now, how do you do this programmatically? There are a couple open source technologies that make this pretty easy these days.
LangChain is a relatively new open source technology that allows you to do complicated tasks with LLMs more easily. The first is that it allows you to chain prompts together. As you saw in the example above, I broke the answer down into two prompts. One I asked the LLM to explain the failure of SVB. The second, I asked it to summarize the explanation. With LangChain, you can do this automatically.
The second benefit of LangChain is that it allows you to add third party technologies to your prompts with relative ease. You can fetch a relevant wikipedia article, or you can query a database for the information you need to answer a question. This allows you to provide the LLM with the context that it needs to answer the question without hallucinating.
Langflow is another open source tool that provides a user interface on top of LangChain. It provides the easiest way I’ve found to experiment with prompt chains, and allows you to iterate on complex prompts with relative ease. Above is an example of a prompt chain that queries a website, stores the result in a vector database, and makes the information from the website available for context in any query.
As a word of warning though, Langflow is very much under active development. Documentation is basically nonexistant as of writing this article, and it doesn’t support everything that LangChain has to offer yet. That said, once you pick it up, it’s a great tool to use.
Vector databases are how you can give an LLM memory. Think of a vector database like Google for an LLM. When queried, a vector database responds with relevant information to the query, which that can then be provided to the LLM as context.
In the Langflow example above, the prompt chain scrapes a web page, breaks the page into digestible chunks, translates those chunks into a format that a machine can understand (vectors, aka embeddings), and then stores them in the vector database.
When an LLM query is submitted, the vector database is queried for relevant information to that query, which is then provided as context to the LLM. You then get the correct answer the first time.
Hallucinations are a major problem with LLMs today that aren’t likely to go away in the near future. If you’re building an application that uses an LLM, chances are you’ll need to provide context to the LLM to get the responses you’re looking for. With a combination of prompt chains and a vector database, you’ll be able to provide the LLM with the context it needs to give you that answer.