mirror of
https://github.com/arc53/DocsGPT.git
synced 2026-05-10 12:31:21 +00:00
126 lines
4.7 KiB
Plaintext
126 lines
4.7 KiB
Plaintext
---
|
||
title: PostgreSQL for User Data
|
||
description: PostgreSQL is the user-data store for DocsGPT. This page covers fresh installs and the one-shot migration from legacy MongoDB deployments.
|
||
---
|
||
|
||
import { Callout } from 'nextra/components'
|
||
|
||
# PostgreSQL for User Data
|
||
|
||
DocsGPT uses **PostgreSQL** as the user-data store for conversations,
|
||
agents, prompts, sources, attachments, workflows, logs, token usage,
|
||
and the rest of the application's structured state. MongoDB is no
|
||
longer required for a default install.
|
||
|
||
<Callout type="info" emoji="ℹ️">
|
||
Vector stores are independent from user-data storage. `VECTOR_STORE`
|
||
can still be `pgvector`, `faiss`, `qdrant`, `milvus`, `elasticsearch`,
|
||
or `mongodb` (Mongo Atlas Vector Search) — your choice there does not
|
||
affect this page.
|
||
</Callout>
|
||
|
||
## Fresh install
|
||
|
||
1. **Run Postgres 13+.** Native install, Docker, or managed (Neon, RDS,
|
||
Supabase, Cloud SQL…) — all work. The default Docker Compose file
|
||
ships a `postgres` service plus a one-shot `postgres-init` migrator
|
||
that applies the schema automatically.
|
||
|
||
2. **Create a database and role** (skip if your managed provider gave
|
||
you these, or if you're using the bundled compose `postgres`
|
||
service):
|
||
|
||
```sql
|
||
CREATE ROLE docsgpt LOGIN PASSWORD 'docsgpt';
|
||
CREATE DATABASE docsgpt OWNER docsgpt;
|
||
```
|
||
|
||
3. **Set `POSTGRES_URI` in `.env`.** Any standard Postgres URI works —
|
||
DocsGPT normalizes it internally to the SQLAlchemy `psycopg` (v3)
|
||
dialect.
|
||
|
||
```bash
|
||
POSTGRES_URI=postgresql://docsgpt:docsgpt@localhost:5432/docsgpt
|
||
# Append ?sslmode=require for managed providers that enforce SSL.
|
||
```
|
||
|
||
4. **Apply the schema** (idempotent — safe to re-run). The bundled
|
||
`postgres-init` compose service does this for you; if you're running
|
||
the backend outside compose, run it manually:
|
||
|
||
```bash
|
||
python scripts/db/init_postgres.py
|
||
# or equivalently:
|
||
alembic -c application/alembic.ini upgrade head
|
||
```
|
||
|
||
That's it — the backend will come up against Postgres.
|
||
|
||
## Migrating from a legacy MongoDB install
|
||
|
||
If you are upgrading from an older DocsGPT deployment that stored user
|
||
data in MongoDB, a one-shot migration tool copies every collection into
|
||
Postgres. The tool is run **once**, offline, with the app stopped.
|
||
|
||
1. **Install the optional Mongo client libraries.** `pymongo` and
|
||
`dnspython` are no longer part of the default backend install; they
|
||
live in an optional requirements file:
|
||
|
||
```bash
|
||
pip install -r application/requirements.txt -r application/requirements-mongo.txt
|
||
```
|
||
|
||
2. **Provision Postgres** following the [Fresh install](#fresh-install)
|
||
steps above, so `POSTGRES_URI` is set and the schema is applied.
|
||
|
||
3. **Point the backfill at both databases.** Set `MONGO_URI` in the
|
||
environment alongside `POSTGRES_URI` for the duration of the
|
||
migration:
|
||
|
||
```bash
|
||
export MONGO_URI="mongodb://user:pass@host:27017/docsgpt"
|
||
export POSTGRES_URI="postgresql://docsgpt:docsgpt@localhost:5432/docsgpt"
|
||
```
|
||
|
||
4. **Run the backfill.** Idempotent — re-run any time to re-sync
|
||
drifted rows. Without arguments, backfills every registered table;
|
||
pass `--tables` to limit.
|
||
|
||
```bash
|
||
python scripts/db/backfill.py --dry-run # preview everything
|
||
python scripts/db/backfill.py # real run, everything
|
||
python scripts/db/backfill.py --tables users # only specific tables
|
||
```
|
||
|
||
5. **Restart the app against Postgres only.** Unset `MONGO_URI` (or
|
||
leave it unset — it is `Optional[str] = None` in settings) and start
|
||
the backend. Nothing in the default code path consults MongoDB
|
||
anymore.
|
||
|
||
<Callout type="warning" emoji="⚠️">
|
||
The backfill is a one-shot tool. There is no dual-write window and no
|
||
runtime feature flag — once you're on the current version, Postgres
|
||
is the only user-data store the backend reads from or writes to.
|
||
</Callout>
|
||
|
||
<Callout type="info" emoji="ℹ️">
|
||
Keep your MongoDB instance online until you have verified the
|
||
Postgres data is complete. You can re-run `backfill.py` at any time
|
||
to re-sync. Once you're satisfied, decommission MongoDB — unless you
|
||
also use it as your vector store (`VECTOR_STORE=mongodb`), in which
|
||
case keep it for that purpose.
|
||
</Callout>
|
||
|
||
## Troubleshooting
|
||
|
||
- **`relation "..." does not exist`** — run `python scripts/db/init_postgres.py`
|
||
(or `alembic -c application/alembic.ini upgrade head`).
|
||
- **`FATAL: role "docsgpt" does not exist`** — run the `CREATE ROLE` /
|
||
`CREATE DATABASE` statements from step 2 of the fresh install as a
|
||
Postgres superuser.
|
||
- **SSL errors on a managed provider** — append `?sslmode=require` to
|
||
`POSTGRES_URI`.
|
||
- **`ModuleNotFoundError: pymongo` when running `backfill.py`** —
|
||
install the optional Mongo requirements:
|
||
`pip install -r application/requirements-mongo.txt`.
|