Question Answering#

This notebook walks through how to use LangChain for question answering over a list of documents. It covers four different types of chains: stuff, map_reduce, refine, map_rerank. And You can find the origin notebook in LangChain example, and this example will show you how to set the LLM with GPTCache so that you can cache the data with LLM.

Go into GPTCache#

Please install gptcache first, then we can initialize the cache.There are two ways to initialize the cache, the first is to use the map cache (exact match cache) and the second is to use the DataBse cache (similar search cache), it is more recommended to use the second one, but you have to install the related requirements.

Before running the example, make sure the OPENAI_API_KEY environment variable is set by executing echo $OPENAI_API_KEY. If it is not already set, it can be set by using export OPENAI_API_KEY=YOUR_API_KEY on Unix/Linux/MacOS systems or set OPENAI_API_KEY=YOUR_API_KEY on Windows systems. And there is get_content_func for the cache settings:

# get the content(only question) form the prompt to cache
def get_content_func(data, **_):
    return data.get("prompt").split("Question")[-1]

1. Init for exact match cache#

# from gptcache import cache
# cache.init(pre_embedding_func=get_content_func)
# cache.set_openai_key()

2. Init for similar match cache#

Require faiss.

from gptcache import cache
from gptcache.embedding import Onnx
from gptcache.manager import CacheBase, VectorBase, get_data_manager
from gptcache.similarity_evaluation.distance import SearchDistanceEvaluation


onnx = Onnx()
data_manager = get_data_manager(CacheBase("sqlite"), VectorBase("faiss", dimension=onnx.dimension))
cache.init(
    pre_embedding_func=get_content_func,
    embedding_func=onnx.to_embeddings,
    data_manager=data_manager,
    similarity_evaluation=SearchDistanceEvaluation(),
    )
cache.set_openai_key()

After initializing the cache, you can use the LangChain LLMs with gptcache.adapter.langchain_models. At this point gptcache will cache the answer, the only difference from the original example is to change llm = OpenAI(temperature=0) to llm = LangChainLLMs(llm=OpenAI(temperature=0)), which will be commented in the code block.

Then you will find that it will be more fast when search the similar content, let’s play with it.

Prepare Data#

First we prepare the data. For this example we do similarity search over a vector database, but these documents could be fetched in any manner (the point of this notebook to highlight what to do AFTER you fetch the documents).

from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.docstore.document import Document
from langchain.prompts import PromptTemplate
from langchain.indexes.vectorstore import VectorstoreIndexCreator
with open("./state_of_the_union.txt") as f:
    state_of_the_union = f.read()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_text(state_of_the_union)

embeddings = OpenAIEmbeddings()
docsearch = Chroma.from_texts(texts, embeddings, metadatas=[{"source": str(i)} for i in range(len(texts))]).as_retriever()
Using embedded DuckDB without persistence: data will be transient
query = "What did the president say about Justice Breyer"
docs = docsearch.get_relevant_documents(query)
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI

from gptcache.adapter.langchain_models import LangChainLLMs

Quickstart#

If you just want to get started as quickly as possible, this is the recommended way to do it:

# llm = OpenAI(temperature=0) # using the following code to cache with gptcache
llm = LangChainLLMs(llm=OpenAI(temperature=0))
chain = load_qa_chain(llm, chain_type="stuff")
query = "What did the president say about Justice Breyer"
chain.run(input_documents=docs, question=query)
' The president said that Justice Breyer has dedicated his life to serve the country and thanked him for his service.'

If you want more control and understanding over what is happening, please see the information below.

The stuff Chain#

This sections shows results of using the stuff Chain to do question answering.

chain = load_qa_chain(llm, chain_type="stuff")
query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)
{'output_text': ' The president said that Justice Breyer has dedicated his life to serve the country and thanked him for his service.'}

The refine Chain#

This sections shows results of using the refine Chain to do question answering.

chain = load_qa_chain(llm, chain_type="refine")
query = "What did the president say about Justice Breyer"
chain({"input_documents": docs, "question": query}, return_only_outputs=True)
{'output_text': '\n\nThe president said that Justice Breyer has dedicated his life to serve the country and thanked him for his service, and also offered a Unity Agenda for the Nation to beat the opioid epidemic.'}

Intermediate Steps

We can also return the intermediate steps for refine chains, should we want to inspect them. This is done with the return_refine_steps variable.

chain = load_qa_chain(llm, chain_type="refine", return_refine_steps=True)
chain({"input_documents": docs, "question": query}, return_only_outputs=True)
{'intermediate_steps': [' The president said that Justice Breyer has dedicated his life to serve the country and thanked him for his service.',
  '\n\nThe president said that Justice Breyer has dedicated his life to serve the country and thanked him for his service, and also offered a Unity Agenda for the Nation to beat the opioid epidemic.',
  '\n\nThe president said that Justice Breyer has dedicated his life to serve the country and thanked him for his service, and also offered a Unity Agenda for the Nation to beat the opioid epidemic.',
  '\n\nThe president said that Justice Breyer has dedicated his life to serve the country and thanked him for his service, and also offered a Unity Agenda for the Nation to beat the opioid epidemic.'],
 'output_text': '\n\nThe president said that Justice Breyer has dedicated his life to serve the country and thanked him for his service, and also offered a Unity Agenda for the Nation to beat the opioid epidemic.'}