Files
DocsGPT/docs/content/Deploying/Postgres-Migration.mdx
2026-04-15 09:49:01 +01:00

126 lines
4.7 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: PostgreSQL for User Data
description: PostgreSQL is the user-data store for DocsGPT. This page covers fresh installs and the one-shot migration from legacy MongoDB deployments.
---
import { Callout } from 'nextra/components'
# PostgreSQL for User Data
DocsGPT uses **PostgreSQL** as the user-data store for conversations,
agents, prompts, sources, attachments, workflows, logs, token usage,
and the rest of the application's structured state. MongoDB is no
longer required for a default install.
<Callout type="info" emoji="">
Vector stores are independent from user-data storage. `VECTOR_STORE`
can still be `pgvector`, `faiss`, `qdrant`, `milvus`, `elasticsearch`,
or `mongodb` (Mongo Atlas Vector Search) — your choice there does not
affect this page.
</Callout>
## Fresh install
1. **Run Postgres 13+.** Native install, Docker, or managed (Neon, RDS,
Supabase, Cloud SQL…) — all work. The default Docker Compose file
ships a `postgres` service plus a one-shot `postgres-init` migrator
that applies the schema automatically.
2. **Create a database and role** (skip if your managed provider gave
you these, or if you're using the bundled compose `postgres`
service):
```sql
CREATE ROLE docsgpt LOGIN PASSWORD 'docsgpt';
CREATE DATABASE docsgpt OWNER docsgpt;
```
3. **Set `POSTGRES_URI` in `.env`.** Any standard Postgres URI works —
DocsGPT normalizes it internally to the SQLAlchemy `psycopg` (v3)
dialect.
```bash
POSTGRES_URI=postgresql://docsgpt:docsgpt@localhost:5432/docsgpt
# Append ?sslmode=require for managed providers that enforce SSL.
```
4. **Apply the schema** (idempotent — safe to re-run). The bundled
`postgres-init` compose service does this for you; if you're running
the backend outside compose, run it manually:
```bash
python scripts/db/init_postgres.py
# or equivalently:
alembic -c application/alembic.ini upgrade head
```
That's it — the backend will come up against Postgres.
## Migrating from a legacy MongoDB install
If you are upgrading from an older DocsGPT deployment that stored user
data in MongoDB, a one-shot migration tool copies every collection into
Postgres. The tool is run **once**, offline, with the app stopped.
1. **Install the optional Mongo client libraries.** `pymongo` and
`dnspython` are no longer part of the default backend install; they
live in an optional requirements file:
```bash
pip install -r application/requirements.txt -r application/requirements-mongo.txt
```
2. **Provision Postgres** following the [Fresh install](#fresh-install)
steps above, so `POSTGRES_URI` is set and the schema is applied.
3. **Point the backfill at both databases.** Set `MONGO_URI` in the
environment alongside `POSTGRES_URI` for the duration of the
migration:
```bash
export MONGO_URI="mongodb://user:pass@host:27017/docsgpt"
export POSTGRES_URI="postgresql://docsgpt:docsgpt@localhost:5432/docsgpt"
```
4. **Run the backfill.** Idempotent — re-run any time to re-sync
drifted rows. Without arguments, backfills every registered table;
pass `--tables` to limit.
```bash
python scripts/db/backfill.py --dry-run # preview everything
python scripts/db/backfill.py # real run, everything
python scripts/db/backfill.py --tables users # only specific tables
```
5. **Restart the app against Postgres only.** Unset `MONGO_URI` (or
leave it unset — it is `Optional[str] = None` in settings) and start
the backend. Nothing in the default code path consults MongoDB
anymore.
<Callout type="warning" emoji="⚠️">
The backfill is a one-shot tool. There is no dual-write window and no
runtime feature flag — once you're on the current version, Postgres
is the only user-data store the backend reads from or writes to.
</Callout>
<Callout type="info" emoji="">
Keep your MongoDB instance online until you have verified the
Postgres data is complete. You can re-run `backfill.py` at any time
to re-sync. Once you're satisfied, decommission MongoDB — unless you
also use it as your vector store (`VECTOR_STORE=mongodb`), in which
case keep it for that purpose.
</Callout>
## Troubleshooting
- **`relation "..." does not exist`** — run `python scripts/db/init_postgres.py`
(or `alembic -c application/alembic.ini upgrade head`).
- **`FATAL: role "docsgpt" does not exist`** — run the `CREATE ROLE` /
`CREATE DATABASE` statements from step 2 of the fresh install as a
Postgres superuser.
- **SSL errors on a managed provider** — append `?sslmode=require` to
`POSTGRES_URI`.
- **`ModuleNotFoundError: pymongo` when running `backfill.py`** —
install the optional Mongo requirements:
`pip install -r application/requirements-mongo.txt`.