PostgreSQL with pgvector as a Vector Database for RAG

We have explored vector search and Retrieval-Augmented Generation (RAG) in previous blog posts like RAG from Scratch and Building an AI Chatbot Powered by Your Data. In the last one, we actually built a full-stack RAG chatbot application using Redis as the vector database.

But there is another very well-known database that many developers already have in their toolkit: PostgresSQL. Wouldn't it be great if we could just use it as a vector database too?

PostgreSQL is a powerful and widely used open-source relational database. It's also incredibly versatile, allowing you to store and manipulate JSON data (similar to NoSQL and document databases) and providing a rich set of extensions with added functionalities, such as PostGIS for geospatial data or pgcron for job scheduling.

Thanks to the pgvector extension, Postgres can now also perform efficient similarity searches on vector embeddings. This opens up many possibilities for RAG and AI applications, with the added benefit of using a familiar database you might already have in your stack. It also means that you can combine relational data, JSON data and vector embeddings in a single system, enabling complex queries that involve both structured data and vector searches.

There are, of course, specialized vector databases like Qdrant, Pinecone or Weaviate that offer optimized performance for very large datasets and more advanced functionalities. However, Postgres with pgvector is a very compelling alternative if you want to keep all your application data more deeply integrated and minimize the number of database systems in your stack, reducing costs and complexity.

In this post, we'll explore how to set up and use Postgres as a vector database and how it can be used for vector search and Retrieval-Augmented Generation applications in Python.

#Setting Up PostgreSQL and pgvector

Before we get started, we need to install PostgreSQL, pgvector and the Python libraries we will be using:

Download and install PostgreSQL following the instructions in the official documentation for your operating system.
Install the pgvector extension, following the installation notes in the pgvector repository.
Install the required Python packages. As well as the pgvector Python library, we will be using SQLAlchemy as the ORM and the asyncpg driver to connect to Postgres asynchronously using asyncio:
```
pip install sqlalchemy pgvector asyncpg
```
Create a new database and enable the pgvector extension:
```
createdb rag_db
psql rag_db
CREATE EXTENSION vector;
```

#Creating a Vector Database with PostgreSQL

Now that the database is set up, let's create a SQLAlchemy model to represent our vector data:

from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
from sqlalchemy import Text
from sqlalchemy.dialects.postgresql import JSONB
from pgvector.sqlalchemy import Vector

class Base(DeclarativeBase):
    pass

class Vector(Base):
    __tablename__ = 'vectors'

    id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
    text: Mapped[str] = mapped_column(Text)
    vector = mapped_column(Vector(1024))
    metadata_: Mapped[dict | None] = mapped_column('metadata', JSONB)

    def __repr__(self):
        return f'Vector(id={self.id}, text={self.text[:50]}..., metadata={self.metadata_})'

We're using SQLAlchemy 2.0 syntax, which allows us to use Python's type hints with Mapped and mapped_column to conveniently derive database column types from the type annotations.

The Vector model defines the following columns in the vectors table:

id: A unique identifier (and primary key) for each vector entry.
vector: Stores the actual vector embedding. We specify the embedding dimensions (1024 in this case) to match our chosen embedding model.
text: Contains the original text that the vector embedding represents. This allows us to easily retrieve the corresponding text when performing vector similarity searches.
metadata: Stores additional properties for each vector entry as JSON. This can include information like the source document name, chunk index within the document, or any other relevant metadata useful for vector searches or filtering.

⚠️Note that we use "metadata_" as the attribute name in the model because "metadata" is a reserved keyword in SQLAlchemy models, but the actual column name in the database will be "metadata".

To create the database table defined in our model, we can use the following SQLAlchemy code:

from sqlalchemy.ext.asyncio import create_async_engine

DB_URL = 'postgresql+asyncpg://admin:postgres@localhost:5432/rag_db'

engine = create_async_engine(DB_URL)

async def db_create():
    async with engine.begin() as conn:
        await conn.run_sync(Base.metadata.create_all)

The 'postgresql+asyncpg' prefix in the database URL is necessary because we are using the asyncpg driver to enable asynchronous connections with asyncio.

Once the vectors table has been created, you can use pgAdmin to explore the table structure and run queries with a user-friendly interface.

#Embedding Documents into PostgreSQL

The next step is to process, embed and store information in our Postgres vector database. If you are familiar with RAG systems or have read our previous posts, this process involves the following steps:

Extracting text from the source documents.
Splitting the text into smaller chunks.
Converting these chunks into vector embeddings that capture their meanings.
Storing the vector embeddings together with the original texts and any relevant metadata in our vector database.

Let's take a look at an example function that carries out these steps:

engine = create_async_engine(DB_URL)
Session = async_sessionmaker(engine, expire_on_commit=False)

async def add_document_to_vector_db(doc_path):
    text = extract_text(docpath)
    doc_name = os.path.splitext(os.path.basename(doc_path))[0]
    
    chunks = []
    text_splitter = TextSplitter(chunk_size=512)
    text_chunks = text_splitter.split(text)
    for idx, text_chunk in enumerate(text_chunks):
        chunks.append({
            'text': text_chunk,
            'metadata_': {'doc': doc_name, 'index': idx}
        })

    vectors = await create_embeddings([chunk['text'] for chunk in chunks])

    for chunk, vector in zip(chunks, vectors):
        chunk['vector'] = vector
    
    async with Session() as db:
        for chunk in chunks:
            db.add(Vector(**chunk))
        await db.commit()

In the code above:

The extract_text function is responsible for extracting text from the document. You can use libraries like pdfminer or pypdf to extract text from PDF files, or docx2txt for Word documents.
We use a TextSplitter to break the document into smaller chunks of text. In this example, we are creating chunks of 512 tokens. You can read more about the chunking functionality and different strategies in our RAG from Scratch post.
The create_embeddings function converts our text chunks into vector embeddings.
Finally, we create a database session and use the previously defined Vector model to add each chunk's vector embedding, text and metadata to our Postgres database.

To create the vector embeddings, you can use any embedding model of your choice. Just make sure that the embedding model's dimensions match the dimensions of the vector column, as defined in the Vector model (1024 in our example).

Here's an example implementation using OpenAI's text-embedding-3-large embeddings:

from openai import AsyncOpenAI

client = AsyncOpenAI(api_key=os.environ['OPENAI_API_KEY'])

async def get_embeddings(input):
    res = await client.embeddings.create(input=input, model='text-embedding-3-large', dimensions=1024)
    return [item.embedding for item in res.data]

#Vector Search with PostgreSQL

Now that we have our documents embedded and stored in our PostgreSQL database, we can perform vector similarity searches to retrieve the most relevant information for our queries. This is a key step in building a Retrieval-Augmented Generation (RAG) system.

The following function shows how we can implement a basic similarity search in PostgreSQL with pgvector:

async def vector_search(query_vector, top_k=3):
    async with Session() as db:
        query = (
            select(Vector.text, Vector.metadata_, Vector.vector.cosine_distance(query_vector).label('distance'))
            .order_by('distance')
            .limit(top_k)
        )
        res = await db.execute(query)
        return [{
            'text': text,
            'metadata': metadata,
            'score': 1 - distance
        } for text, metadata, distance in res]

Let's break down this function and the search process:

The query_vector parameter is the vector embedding of our search query. It's generated using the same embedding model we used previously for the document chunks.
We use the cosine_distance function provided by pgvector to calculate the distance between our query vector and each vector in our database. Cosine distance is a measure of dissimilarity: the smaller the cosine distance, the bigger the similarity to the user query.
We sort the results by the distance (ascending) and retrieve the top_k most similar chunks to our query.
For each of the top_k most similar chunks, we return the chunk text, metadata and a similarity score which is calculated as 1 - distance.

It's also important to highlight that pgvector performs exact nearest neighbor search by default, which provides perfect recall (all of the nearest neighbors are found), but it can be slower with large datasets.

In those cases, you can also create an index to speed up searches and trade a bit of accuracy for much faster search times. The supported index types are IVFFlat (Inverted File Flat) and HNSW (Hierarchical Navigable Small World). You can read more about indexing in pgvector here.

#RAG in Action

With all the vector search functionality in place, now it's the time to put it all together to create a basic Retrieval-Augmented Generation example, using OpenAI's GPT-4o model, that can answer questions about the documents we embedded in the PostgreSQL database.

First, let's take a look at the prompts that we are using:

SYSTEM_PROMPT = """
You are an AI assistant that answers questions about documents in your knowledge base.
"""

RAG_PROMPT = """
Use the following pieces of context to answer the user question.
You must only use the facts from the context to answer.
If the answer cannot be found in the context, say that you don't have enough information to answer the question and provide any relevant facts found in the context.

Context:
{context}

User Question:
{question}
"""

And this is how we can implement the basic RAG system:

from openai import AsyncOpenAI

client = AsyncOpenAI(api_key=os.environ['OPENAI_API_KEY'])

async def answer_question_with_rag(question):
    query_vector = await get_embedding(question)
    top_chunks = await vector_search(query_vector, top_k=3)
    context = '\n\n---\n\n'.join([chunk['text'] for chunk in top_chunks]) + '\n\n---'
    user_message = RAG_PROMPT.format(context=context, question=question)
    messages=[
        {'role': 'system', 'content': SYSTEM_PROMPT},
        {'role': 'user', 'content': user_message}
    ]
    response = await client.chat.completions.create(model='gpt-4o', messages=messages)
    return response.choices[0].message.content

This function encapsulates the core RAG process: it converts the user's question to a vector, performs a similarity search in our Postgres vector database and includes the retrieved information as context for the GPT-4o model to generate an informed response.

To use this in your application, you might do something like this:

question = "What are the main challenges in renewable energy adoption?"
answer = await answer_question_with_rag(question)
print(answer)

You can now apply all the code we've seen to your own applications. You can embed your own documents and use the combination of PostgreSQL with pgvector and GPT-4o (or any other LLM of your choice) to answer questions based on those documents.

And you can use these ideas to build more advanced applications like AI chatbots and assistants, with a simple architecture that takes advantage of PostgreSQL's power and versatility and all your data integrated in a single place.