Retrieval-Augmented Generation (RAG) is crucial for companies with private documents, enhancing response accuracy by combining retrieval and generation. It allows accessing relevant internal data without relying on external APIs, ensuring data security and confidentiality while providing contextually accurate and coherent answers for applications like chatbots and virtual assistants.
In this blog post, I will create a Streamlit application that allows users to index documents and ask questions about them. I will use Elasticsearch for document storage and retrieval, and a local language model API for generating responses.
Setup
First, ensure you have all the required dependencies installed:
pip install streamlit elasticsearch sentence-transformers requests
Initializing Elasticsearch and SentenceTransformer
import streamlit as st
from elasticsearch import Elasticsearch
from sentence_transformers import SentenceTransformer
import requests
import os
es = Elasticsearch(
hosts=[{'host': 'localhost', 'port': 9200, 'scheme': 'http'}]
)
model_name = 'all-MiniLM-L6-v2'
sentence_model = SentenceTransformer(model_name)
Creating the Elasticsearch Index
Create an Elasticsearch index to store the documents and their embeddings if it doesn't already exist.
def create_index():
if not es.indices.exists(index="documents"):
es.indices.create(
index="documents",
body={
"mappings": {
"properties": {
"text": {"type": "text"},
"embedding": {"type": "dense_vector", "dims": 384}
}
}
}
)
create_index()
Indexing Documents
Function to index a new document by generating its embedding and storing it in Elasticsearch.
def index_document(doc_text):
embedding = sentence_model.encode(doc_text)
es.index(
index="documents",
body={
"text": doc_text,
"embedding": embedding.tolist()
}
)
Handling User Questions
Generate the embedding for the user's question.
def handle_question(question):
query_embedding = sentence_model.encode(question)
Retrieving Relevant Documents
response = es.search(
index="documents",
body={
"query": {
"script_score": {
"query": {"match_all": {}},
"script": {
"source": "cosineSimilarity(params.query_vector, 'embedding') + 1.0",
"params": {"query_vector": query_embedding.tolist()}
}
}
},
"size": 5
}
)
retrieved_docs = [hit['_source']['text'] for hit in response['hits']['hits']]
context = " ".join(retrieved_docs)
Calling the Local Language Model API
def call_local_model(user_input):
url = "http://192.168.1.10:11434/api/chat"
payload = {
"model": "llama3",
"messages": [
{ "role": "user", "content": user_input }
],
"stream": False
}
headers = {
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
try:
response_json = response.json()
print(response_json)
return response_json
except ValueError:
st.error("Failed to decode JSON response")
return None
Streamlit App
Initialize Streamlit session state variables.
if "conversation" not in st.session_state:
st.session_state.conversation = None
if "chat_history" not in st.session_state:
st.session_state.chat_history = []
Add a text area to input and index a new document.
st.header("ASK PDFs :books:")
new_document = st.text_area("Add a new document to the index:")
if st.button("Index Document"):
if new_document:
index_document(new_document)
st.success("Document indexed successfully!")
Input for user questions and handle them using the previously defined functions.
user_question = st.text_input("Ask questions about the uploaded document:")
if user_question:
handle_question(user_question)
Conclusion
In this blog post, I demonstrated how to create a Streamlit application that indexes documents and answers questions about them using Elasticsearch and a local language model API. This application allows users to interactively add documents and retrieve relevant information based on their queries.