Creating a Simple RAG in Python with AzureOpenAI and LlamaIndex. Part 2 • Rail Yard Works

Following my previous post, where I successfully embedded text from PDF documents and made queries based on that data, I’m now going to save that embedded and vectorised data into a database.

I’ll be using Azure Cosmos MongoDB, but you can also use a regular MongoDB that Atlas offers, since it’s essentially the same MongoDB.

PS: I LOVE Azure for this. They didn’t try to reinvent MongoDB like AWS did with their DocumentDB, which unfortunately has some significant missing features.

Refactoring

I’m going to split my scripts into two parts: ingestion and querying, so that we don’t re-ingest the data into our database every time. I’ll also move our setup into a separate file, including things like setting LLM and embedding models, loading the .env file, and initialising the vector store, since we’ll need all of that in both functions.

As a result, we’ll have a setup file that looks like this:

import os
import pymongo
from dotenv import load_dotenv
from llama_index.llms.azure_openai import AzureOpenAI
from llama_index.embeddings.azure_openai import AzureOpenAIEmbedding
from llama_index.core import Settings
from llama_index.vector_stores.azurecosmosmongo import (
    AzureCosmosDBMongoDBVectorSearch,
)

load_dotenv()


Settings.llm = AzureOpenAI(
    engine="gpt-4o-mini",
    api_key=os.environ.get('AZURE_OPENAI_API_KEY'),
    azure_endpoint=os.environ.get('AZURE_OPENAI_ENDPOINT'),
    api_version="2024-05-01-preview",
)

Settings.embed_model = AzureOpenAIEmbedding(
    model="text-embedding-3-small",
    deployment_name="text-embedding-3-small",
    api_key=os.environ.get('AZURE_OPENAI_API_KEY'),
    azure_endpoint=os.environ.get('AZURE_OPENAI_EMBEDDING_ENDPOINT'),
    api_version='2023-05-15',
)

mongodb_client = pymongo.MongoClient(os.environ.get("AZURE_COSMOSDB_URI"))

store = AzureCosmosDBMongoDBVectorSearch(
    mongodb_client=mongodb_client,
    db_name="gambitai",
    collection_name="uk",
)

Our previous main.py will be renamed to ingest.py, and we’ll keep only the embedding functionality:

import setup
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import StorageContext

def ingest():
    documents = SimpleDirectoryReader("data").load_data()
    storage_context = StorageContext.from_defaults(vector_store=setup.store)
    VectorStoreIndex.from_documents(
        documents,
        transformations=[SentenceSplitter(chunk_size=1024, chunk_overlap=20)],
        storage_context=storage_context
    )

ingest()

That’s it for the ingestion part.

Querying

Previously, for querying, we had three lines:

Index from documents
Query Engine, and
Response, which was a result of our query.

index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
response = query_engine.query("Query me this...")

Since we’ve already ingested the data and it now lives in our database, we no longer need to set up a VectorStoreIndex from documents. Instead, we’ll make it point to our CosmosDB:

index = VectorStoreIndex.from_vector_store(vector_store=setup.store)
query_engine = index.as_query_engine()

The QueryEngine initialisation remains the same.

And that’s it. Now when we query, we’ll get exactly the same result, except we’ll go straight to the database and won’t process our PDF documents again. I’ve put the content in a text file and am reading from it, so we’ll have something like this:

file = open("test/test.txt", "r")
content = file.read()

response = query_engine.query(
    """
    You are acting on behalf of the Gambling Commission of the United Kingdom.
    You need to go through that content and analyse if any of the sections, words,
    images or more break any rules or laws.
    Go through the content find what should not be there and report why.
    You can also suggest what should be there instead.

    Here is the content: {}
    """.format(content))

file.close()

print(response)

As always, see the GitHub repo for the full code.