logo

PostgreSQL with pgvector as a Vector Database for RAG

Implementing vector search and Retrieval-Augmented Generation using PostgreSQL

October 7, 2024

We have explored vector search and Retrieval-Augmented Generation (RAG) in previous blog posts like RAG from Scratch and Building an AI Chatbot Powered by Your Data. In the last one, we actually built a full-stack RAG chatbot application using Redis as the vector database.

But there is another very well-known database that many developers already have in their toolkit: PostgresSQL. Wouldn't it be great if we could just use it as a vector database too?

PostgreSQL is a powerful and widely used open-source relational database. It's also incredibly versatile, allowing you to store and manipulate JSON data (similar to NoSQL and document databases) and providing a rich set of extensions with added functionalities, such as PostGIS for geospatial data or pgcron for job scheduling.

Thanks to the pgvector extension, Postgres can now also perform efficient similarity searches on vector embeddings. This opens up many possibilities for RAG and AI applications, with the added benefit of using a familiar database you might already have in your stack. It also means that you can combine relational data, JSON data and vector embeddings in a single system, enabling complex queries that involve both structured data and vector searches.

There are, of course, specialized vector databases like Qdrant, Pinecone or Weaviate that offer optimized performance for very large datasets and more advanced functionalities. However, Postgres with pgvector is a very compelling alternative if you want to keep all your application data more deeply integrated and minimize the number of database systems in your stack, reducing costs and complexity.

In this post, we'll explore how to set up and use Postgres as a vector database and how it can be used for vector search and Retrieval-Augmented Generation applications in Python.

#Setting Up PostgreSQL and pgvector

Before we get started, we need to install PostgreSQL, pgvector and the Python libraries we will be using:

  1. Download and install PostgreSQL following the instructions in the official documentation for your operating system.

  2. Install the pgvector extension, following the installation notes in the pgvector repository.

  3. Install the required Python packages. As well as the pgvector Python library, we will be using SQLAlchemy as the ORM and the asyncpg driver to connect to Postgres asynchronously using asyncio:

    pip install sqlalchemy pgvector asyncpg
  4. Create a new database and enable the pgvector extension:

    createdb rag_db
    psql rag_db
    CREATE EXTENSION vector;

#Creating a Vector Database with PostgreSQL

Now that the database is set up, let's create a SQLAlchemy model to represent our vector data:

from sqlalchemy.orm import DeclarativeBase, Mapped, mapped_column
from sqlalchemy import Text
from sqlalchemy.dialects.postgresql import JSONB
from pgvector.sqlalchemy import Vector

class Base(DeclarativeBase):
pass

class Vector(Base):
__tablename__ = 'vectors'

id: Mapped[int] = mapped_column(primary_key=True, autoincrement=True)
text: Mapped[str] = mapped_column(Text)
vector = mapped_column(Vector(1024))
metadata_: Mapped[dict | None] = mapped_column('metadata', JSONB)

def __repr__(self):
return f'Vector(id={self.id}, text={self.text[:50]}..., metadata={self.metadata_})'

We're using SQLAlchemy 2.0 syntax, which allows us to use Python's type hints with Mapped and mapped_column to conveniently derive database column types from the type annotations.

The Vector model defines the following columns in the vectors table:

⚠️Note that we use "metadata_" as the attribute name in the model because "metadata" is a reserved keyword in SQLAlchemy models, but the actual column name in the database will be "metadata".

To create the database table defined in our model, we can use the following SQLAlchemy code:

from sqlalchemy.ext.asyncio import create_async_engine

DB_URL = 'postgresql+asyncpg://admin:postgres@localhost:5432/rag_db'

engine = create_async_engine(DB_URL)

async def db_create():
async with engine.begin() as conn:
await conn.run_sync(Base.metadata.create_all)

The 'postgresql+asyncpg' prefix in the database URL is necessary because we are using the asyncpg driver to enable asynchronous connections with asyncio.

Once the vectors table has been created, you can use pgAdmin to explore the table structure and run queries with a user-friendly interface.

#Embedding Documents into PostgreSQL

The next step is to process, embed and store information in our Postgres vector database. If you are familiar with RAG systems or have read our previous posts, this process involves the following steps:

Let's take a look at an example function that carries out these steps:

engine = create_async_engine(DB_URL)
Session = async_sessionmaker(engine, expire_on_commit=False)

async def add_document_to_vector_db(doc_path):
text = extract_text(docpath)
doc_name = os.path.splitext(os.path.basename(doc_path))[0]

chunks = []
text_splitter = TextSplitter(chunk_size=512)
text_chunks = text_splitter.split(text)
for idx, text_chunk in enumerate(text_chunks):
chunks.append({
'text': text_chunk,
'metadata_': {'doc': doc_name, 'index': idx}
})

vectors = await create_embeddings([chunk['text'] for chunk in chunks])

for chunk, vector in zip(chunks, vectors):
chunk['vector'] = vector

async with Session() as db:
for chunk in chunks:
db.add(Vector(**chunk))
await db.commit()

In the code above:

To create the vector embeddings, you can use any embedding model of your choice. Just make sure that the embedding model's dimensions match the dimensions of the vector column, as defined in the Vector model (1024 in our example).

Here's an example implementation using OpenAI's text-embedding-3-large embeddings:

from openai import AsyncOpenAI

client = AsyncOpenAI(api_key=os.environ['OPENAI_API_KEY'])

async def get_embeddings(input):
res = await client.embeddings.create(input=input, model='text-embedding-3-large', dimensions=1024)
return [item.embedding for item in res.data]

#Vector Search with PostgreSQL

Now that we have our documents embedded and stored in our PostgreSQL database, we can perform vector similarity searches to retrieve the most relevant information for our queries. This is a key step in building a Retrieval-Augmented Generation (RAG) system.

The following function shows how we can implement a basic similarity search in PostgreSQL with pgvector:

async def vector_search(query_vector, top_k=3):
async with Session() as db:
query = (
select(Vector.text, Vector.metadata_, Vector.vector.cosine_distance(query_vector).label('distance'))
.order_by('distance')
.limit(top_k)
)
res = await db.execute(query)
return [{
'text': text,
'metadata': metadata,
'score': 1 - distance
} for text, metadata, distance in res]

Let's break down this function and the search process:

It's also important to highlight that pgvector performs exact nearest neighbor search by default, which provides perfect recall (all of the nearest neighbors are found), but it can be slower with large datasets.

In those cases, you can also create an index to speed up searches and trade a bit of accuracy for much faster search times. The supported index types are IVFFlat (Inverted File Flat) and HNSW (Hierarchical Navigable Small World). You can read more about indexing in pgvector here.

#RAG in Action

With all the vector search functionality in place, now it's the time to put it all together to create a basic Retrieval-Augmented Generation example, using OpenAI's GPT-4o model, that can answer questions about the documents we embedded in the PostgreSQL database.

First, let's take a look at the prompts that we are using:

SYSTEM_PROMPT = """
You are an AI assistant that answers questions about documents in your knowledge base.
"""

RAG_PROMPT = """
Use the following pieces of context to answer the user question.
You must only use the facts from the context to answer.
If the answer cannot be found in the context, say that you don't have enough information to answer the question and provide any relevant facts found in the context.

Context:
{context}

User Question:
{question}
"""

And this is how we can implement the basic RAG system:

from openai import AsyncOpenAI

client = AsyncOpenAI(api_key=os.environ['OPENAI_API_KEY'])

async def answer_question_with_rag(question):
query_vector = await get_embedding(question)
top_chunks = await vector_search(query_vector, top_k=3)
context = '\n\n---\n\n'.join([chunk['text'] for chunk in top_chunks]) + '\n\n---'
user_message = RAG_PROMPT.format(context=context, question=question)
messages=[
{'role': 'system', 'content': SYSTEM_PROMPT},
{'role': 'user', 'content': user_message}
]
response = await client.chat.completions.create(model='gpt-4o', messages=messages)
return response.choices[0].message.content

This function encapsulates the core RAG process: it converts the user's question to a vector, performs a similarity search in our Postgres vector database and includes the retrieved information as context for the GPT-4o model to generate an informed response.

To use this in your application, you might do something like this:

question = "What are the main challenges in renewable energy adoption?"
answer = await answer_question_with_rag(question)
print(answer)

You can now apply all the code we've seen to your own applications. You can embed your own documents and use the combination of PostgreSQL with pgvector and GPT-4o (or any other LLM of your choice) to answer questions based on those documents.

And you can use these ideas to build more advanced applications like AI chatbots and assistants, with a simple architecture that takes advantage of PostgreSQL's power and versatility and all your data integrated in a single place.