PostgreSQL with pgvector as a Vector Database for RAG
Implementing vector search and Retrieval-Augmented Generation using PostgreSQL
October 7, 2024
We have explored vector search and Retrieval-Augmented Generation (RAG) in previous blog posts like RAG from Scratch and Building an AI Chatbot Powered by Your Data. In the last one, we actually built a full-stack RAG chatbot application using Redis as the vector database.
But there is another very well-known database that many developers already have in their toolkit: PostgresSQL. Wouldn't it be great if we could just use it as a vector database too?
PostgreSQL is a powerful and widely used open-source relational database. It's also incredibly versatile, allowing you to store and manipulate JSON data (similar to NoSQL and document databases) and providing a rich set of extensions with added functionalities, such as PostGIS for geospatial data or pgcron for job scheduling.
Thanks to the pgvector extension, Postgres can now also perform efficient similarity searches on vector embeddings. This opens up many possibilities for RAG and AI applications, with the added benefit of using a familiar database you might already have in your stack. It also means that you can combine relational data, JSON data and vector embeddings in a single system, enabling complex queries that involve both structured data and vector searches.
There are, of course, specialized vector databases like Qdrant, Pinecone or Weaviate that offer optimized performance for very large datasets and more advanced functionalities. However, Postgres with pgvector is a very compelling alternative if you want to keep all your application data more deeply integrated and minimize the number of database systems in your stack, reducing costs and complexity.
In this post, we'll explore how to set up and use Postgres as a vector database and how it can be used for vector search and Retrieval-Augmented Generation applications in Python.
#Setting Up PostgreSQL and pgvector
Before we get started, we need to install PostgreSQL, pgvector and the Python libraries we will be using:
-
Download and install PostgreSQL following the instructions in the official documentation for your operating system.
-
Install the pgvector extension, following the installation notes in the pgvector repository.
-
Install the required Python packages. As well as the pgvector Python library, we will be using SQLAlchemy as the ORM and the asyncpg driver to connect to Postgres asynchronously using asyncio:
-
Create a new database and enable the pgvector extension:
#Creating a Vector Database with PostgreSQL
Now that the database is set up, let's create a SQLAlchemy model to represent our vector data:
We're using SQLAlchemy 2.0 syntax, which allows us to use Python's type hints with Mapped
and mapped_column
to conveniently derive database column types from the type annotations.
The Vector
model defines the following columns in the vectors
table:
id
: A unique identifier (and primary key) for each vector entry.vector
: Stores the actual vector embedding. We specify the embedding dimensions (1024 in this case) to match our chosen embedding model.text
: Contains the original text that the vector embedding represents. This allows us to easily retrieve the corresponding text when performing vector similarity searches.metadata
: Stores additional properties for each vector entry as JSON. This can include information like the source document name, chunk index within the document, or any other relevant metadata useful for vector searches or filtering.
⚠️Note that we use "metadata_" as the attribute name in the model because "metadata" is a reserved keyword in SQLAlchemy models, but the actual column name in the database will be "metadata".
To create the database table defined in our model, we can use the following SQLAlchemy code:
The 'postgresql+asyncpg'
prefix in the database URL is necessary because we are using the asyncpg driver to enable asynchronous connections with asyncio.
Once the vectors
table has been created, you can use pgAdmin to explore the table structure and run queries with a user-friendly interface.
#Embedding Documents into PostgreSQL
The next step is to process, embed and store information in our Postgres vector database. If you are familiar with RAG systems or have read our previous posts, this process involves the following steps:
- Extracting text from the source documents.
- Splitting the text into smaller chunks.
- Converting these chunks into vector embeddings that capture their meanings.
- Storing the vector embeddings together with the original texts and any relevant metadata in our vector database.
Let's take a look at an example function that carries out these steps:
In the code above:
- The
extract_text
function is responsible for extracting text from the document. You can use libraries like pdfminer or pypdf to extract text from PDF files, or docx2txt for Word documents. - We use a
TextSplitter
to break the document into smaller chunks of text. In this example, we are creating chunks of 512 tokens. You can read more about the chunking functionality and different strategies in our RAG from Scratch post. - The
create_embeddings
function converts our text chunks into vector embeddings. - Finally, we create a database session and use the previously defined
Vector
model to add each chunk's vector embedding, text and metadata to our Postgres database.
To create the vector embeddings, you can use any embedding model of your choice. Just make sure that the embedding model's dimensions match the dimensions of the vector column, as defined in the Vector
model (1024 in our example).
Here's an example implementation using OpenAI's text-embedding-3-large embeddings:
#Vector Search with PostgreSQL
Now that we have our documents embedded and stored in our PostgreSQL database, we can perform vector similarity searches to retrieve the most relevant information for our queries. This is a key step in building a Retrieval-Augmented Generation (RAG) system.
The following function shows how we can implement a basic similarity search in PostgreSQL with pgvector:
Let's break down this function and the search process:
- The
query_vector
parameter is the vector embedding of our search query. It's generated using the same embedding model we used previously for the document chunks. - We use the
cosine_distance
function provided by pgvector to calculate the distance between our query vector and each vector in our database. Cosine distance is a measure of dissimilarity: the smaller the cosine distance, the bigger the similarity to the user query. - We sort the results by the distance (ascending) and retrieve the
top_k
most similar chunks to our query. - For each of the
top_k
most similar chunks, we return the chunk text, metadata and a similarity score which is calculated as1 - distance
.
It's also important to highlight that pgvector performs exact nearest neighbor search by default, which provides perfect recall (all of the nearest neighbors are found), but it can be slower with large datasets.
In those cases, you can also create an index to speed up searches and trade a bit of accuracy for much faster search times. The supported index types are IVFFlat (Inverted File Flat) and HNSW (Hierarchical Navigable Small World). You can read more about indexing in pgvector here.
#RAG in Action
With all the vector search functionality in place, now it's the time to put it all together to create a basic Retrieval-Augmented Generation example, using OpenAI's GPT-4o model, that can answer questions about the documents we embedded in the PostgreSQL database.
First, let's take a look at the prompts that we are using:
And this is how we can implement the basic RAG system:
This function encapsulates the core RAG process: it converts the user's question to a vector, performs a similarity search in our Postgres vector database and includes the retrieved information as context for the GPT-4o model to generate an informed response.
To use this in your application, you might do something like this:
You can now apply all the code we've seen to your own applications. You can embed your own documents and use the combination of PostgreSQL with pgvector and GPT-4o (or any other LLM of your choice) to answer questions based on those documents.
And you can use these ideas to build more advanced applications like AI chatbots and assistants, with a simple architecture that takes advantage of PostgreSQL's power and versatility and all your data integrated in a single place.