Files
DocsGPT/docs/content/Deploying/Postgres-Migration.mdx
2026-04-17 20:45:22 +01:00

152 lines
5.2 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: PostgreSQL for User Data
description: PostgreSQL is the user-data store for DocsGPT. Covers auto-bootstrap, production hardening, and the one-shot migration from legacy MongoDB deployments.
---
import { Callout } from 'nextra/components'
# PostgreSQL for User Data
DocsGPT stores conversations, agents, prompts, sources, attachments,
workflows, logs, and token usage in **PostgreSQL**. MongoDB is no longer
required.
<Callout type="info" emoji="">
Vector stores are independent — `VECTOR_STORE` can still be `pgvector`,
`faiss`, `qdrant`, `milvus`, `elasticsearch`, or `mongodb`.
</Callout>
## Quickstart
Three common paths. Each assumes Postgres 13+ and the default env vars
`AUTO_MIGRATE=true` / `AUTO_CREATE_DB=true` (both ship enabled).
### Docker Compose
The bundled compose file ships a `postgres` service. App boot handles the
rest — no sidecar, no init job.
```bash
cd deployment && docker compose up
```
### Managed Postgres (Neon, RDS, Supabase, Cloud SQL)
Point `POSTGRES_URI` at the provider-given URI. The app applies the
schema on first boot.
```bash
export POSTGRES_URI="postgresql://user:pass@host/docsgpt?sslmode=require"
flask --app application/app.py run --host=0.0.0.0 --port=7091
```
### Bare-metal Postgres
Run Postgres locally and point `POSTGRES_URI` at the default superuser.
First boot creates both the database and the schema.
```bash
export POSTGRES_URI="postgresql://postgres@localhost/docsgpt"
flask --app application/app.py run --host=0.0.0.0 --port=7091
```
Prefer a dedicated non-superuser role? Create it once as superuser — the
app never creates roles.
```sql
CREATE ROLE docsgpt LOGIN PASSWORD 'docsgpt' CREATEDB;
-- Then: POSTGRES_URI=postgresql://docsgpt:docsgpt@localhost/docsgpt
```
## How auto-bootstrap works
Two env vars control startup behavior. Both default to `true` in the
app and are idempotent.
| Setting | Effect | Requires |
| --- | --- | --- |
| `AUTO_CREATE_DB` | If the target database is missing, connects to the server's `postgres` maintenance DB and issues `CREATE DATABASE`. | `CREATEDB` privilege (or superuser) |
| `AUTO_MIGRATE` | Runs `alembic upgrade head` against the target database. | Table-owner or superuser on the target DB |
Concurrent workers serialize through `alembic_version`, so rolling
restarts are safe. If the role lacks the required privilege, startup
fails fast with a clear error rather than silently skipping.
<Callout type="info" emoji="">
Convenient in dev. In production, disable both and run migrations as
an explicit step — see [Production hardening](#production-hardening).
</Callout>
## Production hardening
Set both flags to `false` in prod and run migrations as a gated,
auditable step before rolling out the app.
```env
AUTO_MIGRATE=false
AUTO_CREATE_DB=false
```
Run migrations from your CI/CD pipeline, a Kubernetes `Job`, or an
init-container ahead of the app rollout:
```bash
python scripts/db/init_postgres.py
# equivalently:
alembic -c application/alembic.ini upgrade head
```
The reasoning: the app's runtime role shouldn't carry DDL privileges,
migrations should gate each rollout, and an explicit step is
auditable — implicit first-boot bootstrap is fine for dev but muddies
prod deploys.
<Callout type="warning" emoji="⚠️">
Migrations are not reversible by the app. Always back up production
Postgres before running `alembic upgrade head` on a new release.
</Callout>
## Migrating from MongoDB
One-shot, offline, app stopped. The app itself will create the
Postgres schema when it boots — you only need to run the data copy.
```bash
pip install -r application/requirements.txt
pip install 'pymongo>=4.6'
export POSTGRES_URI="postgresql://docsgpt:docsgpt@localhost:5432/docsgpt"
export MONGO_URI="mongodb://user:pass@host:27017/docsgpt"
python scripts/db/backfill.py --dry-run # preview
python scripts/db/backfill.py # real run
# or: python scripts/db/backfill.py --tables users,agents
```
Then unset `MONGO_URI` and start the backend — nothing consults Mongo
in the default path anymore. The backfill is idempotent (per-table
`ON CONFLICT` upserts, event-log tables deduped via `mongo_id`), so
re-running is safe and re-syncs any drifted rows. Keep Mongo online
until you've verified Postgres is complete; decommission afterwards
unless you still use it as a vector store.
<Callout type="warning" emoji="⚠️">
No dual-write window and no runtime flag — on the current version,
Postgres is the only user-data store the backend reads or writes.
</Callout>
## Troubleshooting
- **`relation "..." does not exist`** — schema not applied. Either let
the app bootstrap it (`AUTO_MIGRATE=true`) or run
`python scripts/db/init_postgres.py`.
- **`permission denied to create database`** — the role lacks
`CREATEDB`. As superuser: `ALTER ROLE <name> CREATEDB;`. Or create
the database manually and set `AUTO_CREATE_DB=false`.
- **`role "docsgpt" does not exist`** — roles are never auto-created.
As superuser: `CREATE ROLE docsgpt LOGIN PASSWORD '...';`.
- **SSL errors on a managed provider** — append `?sslmode=require` to
`POSTGRES_URI`.
- **`ModuleNotFoundError: pymongo`** — `pip install 'pymongo>=4.6'`
(only needed for the one-shot Mongo backfill).