RAG Using Llama3
- 2 min read

RAG Using Llama3

On this page
Introduction

Retrieval-Augmented Generation (RAG) is crucial for companies with private documents, enhancing response accuracy by combining retrieval and generation. It allows accessing relevant internal data without relying on external APIs, ensuring data security and confidentiality while providing contextually accurate and coherent answers for applications like chatbots and virtual assistants.

In this blog post, I will create a Streamlit application that allows users to index documents and ask questions about them. I will use Elasticsearch for document storage and retrieval, and a local language model API for generating responses.

Setup

First, ensure you have all the required dependencies installed:

pip install streamlit elasticsearch sentence-transformers requests

Initializing Elasticsearch and SentenceTransformer

import streamlit as st
from elasticsearch import Elasticsearch
from sentence_transformers import SentenceTransformer
import requests
import os

es = Elasticsearch(
    hosts=[{'host': 'localhost', 'port': 9200, 'scheme': 'http'}]
)
model_name = 'all-MiniLM-L6-v2'
sentence_model = SentenceTransformer(model_name)

Creating the Elasticsearch Index

Create an Elasticsearch index to store the documents and their embeddings if it doesn't already exist.

def create_index():
    if not es.indices.exists(index="documents"):
        es.indices.create(
            index="documents",
            body={
                "mappings": {
                    "properties": {
                        "text": {"type": "text"},
                        "embedding": {"type": "dense_vector", "dims": 384}
                    }
                }
            }
        )

create_index()

Indexing Documents

Function to index a new document by generating its embedding and storing it in Elasticsearch.

def index_document(doc_text):
    embedding = sentence_model.encode(doc_text)
    es.index(
        index="documents",
        body={
            "text": doc_text,
            "embedding": embedding.tolist()
        }
    )

Handling User Questions

Generate the embedding for the user's question.

def handle_question(question):
    query_embedding = sentence_model.encode(question)

Retrieving Relevant Documents

response = es.search(
    index="documents",
    body={
        "query": {
            "script_score": {
                "query": {"match_all": {}},
                "script": {
                    "source": "cosineSimilarity(params.query_vector, 'embedding') + 1.0",
                    "params": {"query_vector": query_embedding.tolist()}
                }
            }
        },
        "size": 5
    }
)

retrieved_docs = [hit['_source']['text'] for hit in response['hits']['hits']]
context = " ".join(retrieved_docs)

Calling the Local Language Model API

def call_local_model(user_input):
    url = "http://192.168.1.10:11434/api/chat"
    payload = {
        "model": "llama3",
        "messages": [
            { "role": "user", "content": user_input }
        ],
        "stream": False
    }
    headers = {
        "Content-Type": "application/json"
    }
    response = requests.post(url, json=payload, headers=headers)
    
    try:
        response_json = response.json()
        print(response_json)
        return response_json
    except ValueError:
        st.error("Failed to decode JSON response")
        return None

Streamlit App

Initialize Streamlit session state variables.

if "conversation" not in st.session_state:
    st.session_state.conversation = None
if "chat_history" not in st.session_state:
    st.session_state.chat_history = []

Add a text area to input and index a new document.

st.header("ASK PDFs :books:")

new_document = st.text_area("Add a new document to the index:")
if st.button("Index Document"):
    if new_document:
        index_document(new_document)
        st.success("Document indexed successfully!")

Input for user questions and handle them using the previously defined functions.

user_question = st.text_input("Ask questions about the uploaded document:")
if user_question:
    handle_question(user_question)

Conclusion

In this blog post, I demonstrated how to create a Streamlit application that indexes documents and answers questions about them using Elasticsearch and a local language model API. This application allows users to interactively add documents and retrieve relevant information based on their queries.