mirror of
https://github.com/arc53/DocsGPT.git
synced 2026-05-07 06:30:03 +00:00
152 lines
5.2 KiB
Plaintext
152 lines
5.2 KiB
Plaintext
---
|
||
title: PostgreSQL for User Data
|
||
description: PostgreSQL is the user-data store for DocsGPT. Covers auto-bootstrap, production hardening, and the one-shot migration from legacy MongoDB deployments.
|
||
---
|
||
|
||
import { Callout } from 'nextra/components'
|
||
|
||
# PostgreSQL for User Data
|
||
|
||
DocsGPT stores conversations, agents, prompts, sources, attachments,
|
||
workflows, logs, and token usage in **PostgreSQL**. MongoDB is no longer
|
||
required.
|
||
|
||
<Callout type="info" emoji="ℹ️">
|
||
Vector stores are independent — `VECTOR_STORE` can still be `pgvector`,
|
||
`faiss`, `qdrant`, `milvus`, `elasticsearch`, or `mongodb`.
|
||
</Callout>
|
||
|
||
## Quickstart
|
||
|
||
Three common paths. Each assumes Postgres 13+ and the default env vars
|
||
`AUTO_MIGRATE=true` / `AUTO_CREATE_DB=true` (both ship enabled).
|
||
|
||
### Docker Compose
|
||
|
||
The bundled compose file ships a `postgres` service. App boot handles the
|
||
rest — no sidecar, no init job.
|
||
|
||
```bash
|
||
cd deployment && docker compose up
|
||
```
|
||
|
||
### Managed Postgres (Neon, RDS, Supabase, Cloud SQL)
|
||
|
||
Point `POSTGRES_URI` at the provider-given URI. The app applies the
|
||
schema on first boot.
|
||
|
||
```bash
|
||
export POSTGRES_URI="postgresql://user:pass@host/docsgpt?sslmode=require"
|
||
flask --app application/app.py run --host=0.0.0.0 --port=7091
|
||
```
|
||
|
||
### Bare-metal Postgres
|
||
|
||
Run Postgres locally and point `POSTGRES_URI` at the default superuser.
|
||
First boot creates both the database and the schema.
|
||
|
||
```bash
|
||
export POSTGRES_URI="postgresql://postgres@localhost/docsgpt"
|
||
flask --app application/app.py run --host=0.0.0.0 --port=7091
|
||
```
|
||
|
||
Prefer a dedicated non-superuser role? Create it once as superuser — the
|
||
app never creates roles.
|
||
|
||
```sql
|
||
CREATE ROLE docsgpt LOGIN PASSWORD 'docsgpt' CREATEDB;
|
||
-- Then: POSTGRES_URI=postgresql://docsgpt:docsgpt@localhost/docsgpt
|
||
```
|
||
|
||
## How auto-bootstrap works
|
||
|
||
Two env vars control startup behavior. Both default to `true` in the
|
||
app and are idempotent.
|
||
|
||
| Setting | Effect | Requires |
|
||
| --- | --- | --- |
|
||
| `AUTO_CREATE_DB` | If the target database is missing, connects to the server's `postgres` maintenance DB and issues `CREATE DATABASE`. | `CREATEDB` privilege (or superuser) |
|
||
| `AUTO_MIGRATE` | Runs `alembic upgrade head` against the target database. | Table-owner or superuser on the target DB |
|
||
|
||
Concurrent workers serialize through `alembic_version`, so rolling
|
||
restarts are safe. If the role lacks the required privilege, startup
|
||
fails fast with a clear error rather than silently skipping.
|
||
|
||
<Callout type="info" emoji="ℹ️">
|
||
Convenient in dev. In production, disable both and run migrations as
|
||
an explicit step — see [Production hardening](#production-hardening).
|
||
</Callout>
|
||
|
||
## Production hardening
|
||
|
||
Set both flags to `false` in prod and run migrations as a gated,
|
||
auditable step before rolling out the app.
|
||
|
||
```env
|
||
AUTO_MIGRATE=false
|
||
AUTO_CREATE_DB=false
|
||
```
|
||
|
||
Run migrations from your CI/CD pipeline, a Kubernetes `Job`, or an
|
||
init-container ahead of the app rollout:
|
||
|
||
```bash
|
||
python scripts/db/init_postgres.py
|
||
# equivalently:
|
||
alembic -c application/alembic.ini upgrade head
|
||
```
|
||
|
||
The reasoning: the app's runtime role shouldn't carry DDL privileges,
|
||
migrations should gate each rollout, and an explicit step is
|
||
auditable — implicit first-boot bootstrap is fine for dev but muddies
|
||
prod deploys.
|
||
|
||
<Callout type="warning" emoji="⚠️">
|
||
Migrations are not reversible by the app. Always back up production
|
||
Postgres before running `alembic upgrade head` on a new release.
|
||
</Callout>
|
||
|
||
## Migrating from MongoDB
|
||
|
||
One-shot, offline, app stopped. The app itself will create the
|
||
Postgres schema when it boots — you only need to run the data copy.
|
||
|
||
```bash
|
||
pip install -r application/requirements.txt
|
||
pip install 'pymongo>=4.6'
|
||
|
||
export POSTGRES_URI="postgresql://docsgpt:docsgpt@localhost:5432/docsgpt"
|
||
export MONGO_URI="mongodb://user:pass@host:27017/docsgpt"
|
||
|
||
python scripts/db/backfill.py --dry-run # preview
|
||
python scripts/db/backfill.py # real run
|
||
# or: python scripts/db/backfill.py --tables users,agents
|
||
```
|
||
|
||
Then unset `MONGO_URI` and start the backend — nothing consults Mongo
|
||
in the default path anymore. The backfill is idempotent (per-table
|
||
`ON CONFLICT` upserts, event-log tables deduped via `mongo_id`), so
|
||
re-running is safe and re-syncs any drifted rows. Keep Mongo online
|
||
until you've verified Postgres is complete; decommission afterwards
|
||
unless you still use it as a vector store.
|
||
|
||
<Callout type="warning" emoji="⚠️">
|
||
No dual-write window and no runtime flag — on the current version,
|
||
Postgres is the only user-data store the backend reads or writes.
|
||
</Callout>
|
||
|
||
## Troubleshooting
|
||
|
||
- **`relation "..." does not exist`** — schema not applied. Either let
|
||
the app bootstrap it (`AUTO_MIGRATE=true`) or run
|
||
`python scripts/db/init_postgres.py`.
|
||
- **`permission denied to create database`** — the role lacks
|
||
`CREATEDB`. As superuser: `ALTER ROLE <name> CREATEDB;`. Or create
|
||
the database manually and set `AUTO_CREATE_DB=false`.
|
||
- **`role "docsgpt" does not exist`** — roles are never auto-created.
|
||
As superuser: `CREATE ROLE docsgpt LOGIN PASSWORD '...';`.
|
||
- **SSL errors on a managed provider** — append `?sslmode=require` to
|
||
`POSTGRES_URI`.
|
||
- **`ModuleNotFoundError: pymongo`** — `pip install 'pymongo>=4.6'`
|
||
(only needed for the one-shot Mongo backfill).
|