fix: tests

fix: better json validation
fix: mini issues
2026-05-07 06:30:03 +00:00 · 2026-05-04 23:00:21 +01:00 · 2026-05-04 18:09:40 +01:00 · 2026-05-04 17:51:09 +01:00 · 2026-05-04 17:17:15 +01:00 · 2026-05-04 16:32:28 +01:00
534 changed files with 792374 additions and 42939 deletions
--- a/.env-template
+++ b/.env-template
@@ -35,8 +35,5 @@ MICROSOFT_TENANT_ID=your-azure-ad-tenant-id
 #Alternatively, use "https://login.microsoftonline.com/common" for multi-tenant app.
 MICROSOFT_AUTHORITY=https://{tenantId}.ciamlogin.com/{tenantId}

-# User-data Postgres DB (Phase 0 of the MongoDB→Postgres migration).
-# Standard Postgres URI — `postgres://` and `postgresql://` both work.
-# Leave unset while the migration is still being rolled out; the app will
-# fall back to MongoDB for user data until POSTGRES_URI is configured.
+
 # POSTGRES_URI=postgresql://docsgpt:docsgpt@localhost:5432/docsgpt
--- a/.github/workflows/vale.yml
+++ b/.github/workflows/vale.yml
@@ -11,7 +11,6 @@ on:

 permissions:
  contents: read
-  pull-requests: write

 jobs:
  vale:
@@ -20,11 +19,16 @@ jobs:
      - name: Checkout code
        uses: actions/checkout@v4

-      - name: Vale linter
-        uses: errata-ai/vale-action@v2
-        with:
-          files: docs
-          fail_on_error: false
-          version: 3.0.5
-        env:
-          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+      - name: Install Vale
+        run: |
+          curl -fsSL -o vale.tar.gz \
+            https://github.com/errata-ai/vale/releases/download/v3.0.5/vale_3.0.5_Linux_64-bit.tar.gz
+          tar -xzf vale.tar.gz
+          sudo mv vale /usr/local/bin/vale
+          vale --version
+
+      - name: Sync Vale packages
+        run: vale sync
+
+      - name: Run Vale
+        run: vale --minAlertLevel=error docs
--- a/.gitignore
+++ b/.gitignore
@@ -186,3 +186,11 @@ node_modules/
 .vscode/sftp.json
 /models/
 model/
+
+# E2E test artifacts
+.e2e-tmp/
+/tmp/docsgpt-e2e/
+tests/e2e/node_modules/
+tests/e2e/playwright-report/
+tests/e2e/test-results/
+tests/e2e/.e2e-last-run.json
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -10,9 +10,15 @@
 For feature work, do **not** assume the environment needs to be recreated.

 - Check whether the user already has a Python virtual environment such as `venv/` or `.venv/`.
- Check whether MongoDB is already running.
+- Check whether Postgres is already running and reachable via `POSTGRES_URI` (the canonical user-data store).
 - Check whether Redis is already running.
- Reuse what is already working. Do not stop or recreate MongoDB, Redis, or the Python environment unless the task is environment setup or troubleshooting.
+- Reuse what is already working. Do not stop or recreate Postgres, Redis, or the Python environment unless the task is environment setup or troubleshooting.
+
+> MongoDB is **not** required for the default install. It is only needed if
+> the user opts into the Mongo vector-store backend (`VECTOR_STORE=mongodb`)
+> or is running the one-shot `scripts/db/backfill.py` to migrate existing
+> user data from the legacy Mongo-based install. In those cases, `pymongo`
+> is available as an optional extra, not a core dependency.

 ## Normal local development commands

@@ -31,6 +37,22 @@ Run the Flask API (if needed):
 flask --app application/app.py run --host=0.0.0.0 --port=7091
 ```

+That's the fast inner-loop option — quick startup, the Werkzeug interactive
+debugger still works, and it hot-reloads on source changes. It serves the
+Flask routes only (`/api/*`, `/stream`, etc.).
+
+If you need to exercise the full ASGI stack — the `/mcp` FastMCP endpoint,
+or to match the production runtime exactly — run the ASGI composition under
+uvicorn instead:
+
+```bash
+uvicorn application.asgi:asgi_app --host 0.0.0.0 --port 7091 --reload
+```
+
+Production uses `gunicorn -k uvicorn_worker.UvicornWorker` against the same
+`application.asgi:asgi_app` target; see `application/Dockerfile` for the
+full flag set.
+
 Run the Celery worker in a separate terminal (if needed):

 ```bash
@@ -93,7 +115,7 @@ vale .
 - `frontend/`: Vite + React + TypeScript application.
 - `frontend/src/`: main UI code, including `components`, `conversation`, `hooks`, `locale`, `settings`, `upload`, and Redux store wiring in `store.ts`.
 - `docs/`: separate documentation site built with Next.js/Nextra.
- `extensions/`: integrations and widgets such as Chatwoot, Chrome, Discord, React widget, Slack bot, and web widget.
+- `extensions/`: integrations and widgets — currently the Chatwoot webhook bridge and the React widget (published to npm as `docsgpt`). The Discord bot, Slack bot, and Chrome extension have been moved to their own repos under `arc53/`.
 - `deployment/`: Docker Compose variants and Kubernetes manifests.

 ## Coding rules
--- a/README.md
+++ b/README.md
@@ -47,11 +47,13 @@
 </ul>

 ## Roadmap
- [x] Add OAuth 2.0 authentication for MCP ( September 2025 )
- [x] Deep Agents ( October 2025 )
- [x] Prompt Templating ( October 2025 )
- [x] Full api tooling ( Dec 2025 )
- [ ] Agent scheduling ( Jan 2026 )
+- [x] Agent Workflow Builder with conditional nodes ( February 2026 )
+- [x] SharePoint & Confluence connectors ( March – April 2026 )
+- [x] Research mode ( March 2026 )
+- [x] Postgres migration for user data ( April 2026 )
+- [x] OpenTelemetry observability ( April 2026 )
+- [x] Bring Your Own Model (BYOM) ( April 2026 )
+- [ ] Agent scheduling (RedBeat-backed) ( Q2 2026 )

 You can find our full roadmap [here](https://github.com/orgs/arc53/projects/2). Please don't hesitate to contribute or create issues, it helps us improve DocsGPT!

--- a/application/Dockerfile
+++ b/application/Dockerfile
@@ -88,5 +88,15 @@ EXPOSE 7091
 # Switch to non-root user
 USER appuser

-# Start Gunicorn
-CMD ["gunicorn", "-w", "1", "--timeout", "120", "--bind", "0.0.0.0:7091", "--preload", "application.wsgi:app"]
+CMD ["gunicorn", \
+     "-w", "1", \
+     "-k", "uvicorn_worker.UvicornWorker", \
+     "--bind", "0.0.0.0:7091", \
+     "--timeout", "180", \
+     "--graceful-timeout", "120", \
+     "--keep-alive", "5", \
+     "--worker-tmp-dir", "/dev/shm", \
+     "--max-requests", "1000", \
+     "--max-requests-jitter", "100", \
+     "--config", "application/gunicorn_conf.py", \
+     "application.asgi:asgi_app"]
--- a/application/agents/base.py
+++ b/application/agents/base.py
@@ -42,6 +42,7 @@ class BaseAgent(ABC):
        llm_handler=None,
        tool_executor: Optional[ToolExecutor] = None,
        backup_models: Optional[List[str]] = None,
+        model_user_id: Optional[str] = None,
    ):
        self.endpoint = endpoint
        self.llm_name = llm_name
@@ -52,10 +53,13 @@ class BaseAgent(ABC):
        self.prompt = prompt
        self.decoded_token = decoded_token or {}
        self.user: str = self.decoded_token.get("sub")
+        # BYOM-resolution scope: owner for shared agents, caller for
+        # caller-owned BYOM, None for built-ins. Falls back to self.user
+        # for worker/legacy callers that don't thread model_user_id.
+        self.model_user_id = model_user_id
        self.tools: List[Dict] = []
        self.chat_history: List[Dict] = chat_history if chat_history is not None else []

-        # Dependency injection for LLM — fall back to creating if not provided
        if llm is not None:
            self.llm = llm
        else:
@@ -67,8 +71,16 @@ class BaseAgent(ABC):
                model_id=model_id,
                agent_id=agent_id,
                backup_models=backup_models,
+                model_user_id=model_user_id,
            )

+        # For BYOM, registry id (UUID) differs from upstream model id
+        # (e.g. ``mistral-large-latest``). LLMCreator resolved this onto
+        # the LLM instance; cache it for subsequent gen calls.
+        self.upstream_model_id = (
+            getattr(self.llm, "model_id", None) or model_id
+        )
+
        self.retrieved_docs = retrieved_docs or []

        if llm_handler is not None:
@@ -306,7 +318,9 @@ class BaseAgent(ABC):
        try:
            current_tokens = self._calculate_current_context_tokens(messages)
            self.current_token_count = current_tokens
-            context_limit = get_token_limit(self.model_id)
+            context_limit = get_token_limit(
+                self.model_id, user_id=self.model_user_id or self.user
+            )
            threshold = int(context_limit * settings.COMPRESSION_THRESHOLD_PERCENTAGE)

            if current_tokens >= threshold:
@@ -325,7 +339,9 @@ class BaseAgent(ABC):

        current_tokens = self._calculate_current_context_tokens(messages)
        self.current_token_count = current_tokens
-        context_limit = get_token_limit(self.model_id)
+        context_limit = get_token_limit(
+            self.model_id, user_id=self.model_user_id or self.user
+        )
        percentage = (current_tokens / context_limit) * 100

        if current_tokens >= context_limit:
@@ -387,7 +403,9 @@ class BaseAgent(ABC):
            )
            system_prompt = system_prompt + compression_context

-        context_limit = get_token_limit(self.model_id)
+        context_limit = get_token_limit(
+            self.model_id, user_id=self.model_user_id or self.user
+        )
        system_tokens = num_tokens_from_string(system_prompt)

        safety_buffer = int(context_limit * 0.1)
@@ -497,7 +515,10 @@ class BaseAgent(ABC):
    def _llm_gen(self, messages: List[Dict], log_context: Optional[LogContext] = None):
        self._validate_context_size(messages)

-        gen_kwargs = {"model": self.model_id, "messages": messages}
+        # Use the upstream id resolved by LLMCreator (see __init__).
+        # Built-in models: same as self.model_id. BYOM: the user's
+        # typed model name, not the internal UUID.
+        gen_kwargs = {"model": self.upstream_model_id, "messages": messages}
        if self.attachments:
            gen_kwargs["_usage_attachments"] = self.attachments

--- a/application/agents/research_agent.py
+++ b/application/agents/research_agent.py
@@ -312,7 +312,7 @@ class ResearchAgent(BaseAgent):

        try:
            response = self.llm.gen(
-                model=self.model_id,
+                model=self.upstream_model_id,
                messages=messages,
                tools=None,
                response_format={"type": "json_object"},
@@ -390,7 +390,7 @@ class ResearchAgent(BaseAgent):

        try:
            response = self.llm.gen(
-                model=self.model_id,
+                model=self.upstream_model_id,
                messages=messages,
                tools=None,
                response_format={"type": "json_object"},
@@ -506,7 +506,7 @@ class ResearchAgent(BaseAgent):

            try:
                response = self.llm.gen(
-                    model=self.model_id,
+                    model=self.upstream_model_id,
                    messages=messages,
                    tools=self.tools if self.tools else None,
                )
@@ -537,7 +537,7 @@ class ResearchAgent(BaseAgent):
        )
        try:
            response = self.llm.gen(
-                model=self.model_id, messages=messages, tools=None
+                model=self.upstream_model_id, messages=messages, tools=None
            )
            self._track_tokens(self._snapshot_llm_tokens())
            text = self._extract_text(response)
@@ -664,7 +664,7 @@ class ResearchAgent(BaseAgent):
        ]

        llm_response = self.llm.gen_stream(
-            model=self.model_id, messages=messages, tools=None
+            model=self.upstream_model_id, messages=messages, tools=None
        )

        if log_context:
--- a/application/agents/tool_executor.py
+++ b/application/agents/tool_executor.py
@@ -1,19 +1,107 @@
 import logging
 import uuid
 from collections import Counter
-from typing import Dict, List, Optional, Tuple
-
-from bson.objectid import ObjectId
+from typing import Any, Dict, List, Optional, Tuple

 from application.agents.tools.tool_action_parser import ToolActionParser
 from application.agents.tools.tool_manager import ToolManager
-from application.core.mongo_db import MongoDB
-from application.core.settings import settings
 from application.security.encryption import decrypt_credentials
+from application.storage.db.base_repository import looks_like_uuid
+from application.storage.db.repositories.agents import AgentsRepository
+from application.storage.db.repositories.tool_call_attempts import (
+    ToolCallAttemptsRepository,
+)
+from application.storage.db.repositories.user_tools import UserToolsRepository
+from application.storage.db.session import db_readonly, db_session

 logger = logging.getLogger(__name__)


+def _record_proposed(
+    call_id: str,
+    tool_name: str,
+    action_name: str,
+    arguments: Any,
+    *,
+    tool_id: Optional[str] = None,
+) -> bool:
+    """Insert a ``proposed`` row; swallow infra failures so tool calls
+    still run when the journal is unreachable. Returns True iff the row
+    is now journaled (newly created or already present).
+    """
+    try:
+        with db_session() as conn:
+            inserted = ToolCallAttemptsRepository(conn).record_proposed(
+                call_id,
+                tool_name,
+                action_name,
+                arguments,
+                tool_id=tool_id if tool_id and looks_like_uuid(tool_id) else None,
+            )
+        if not inserted:
+            logger.warning(
+                "tool_call_attempts duplicate call_id=%s; existing row left in place",
+                call_id,
+                extra={"alert": "tool_call_id_collision", "call_id": call_id},
+            )
+        return True
+    except Exception:
+        logger.exception("tool_call_attempts proposed write failed for %s", call_id)
+        return False
+
+
+def _mark_executed(
+    call_id: str,
+    result: Any,
+    *,
+    message_id: Optional[str] = None,
+    artifact_id: Optional[str] = None,
+    proposed_ok: bool = True,
+    tool_name: Optional[str] = None,
+    action_name: Optional[str] = None,
+    arguments: Any = None,
+    tool_id: Optional[str] = None,
+) -> None:
+    """Flip the row to ``executed``. If ``proposed_ok`` is False (the
+    proposed write failed earlier), upsert a fresh row in ``executed`` so
+    the reconciler can still see the attempt — without this, the side
+    effect would be invisible to the journal.
+    """
+    try:
+        with db_session() as conn:
+            repo = ToolCallAttemptsRepository(conn)
+            if proposed_ok:
+                updated = repo.mark_executed(
+                    call_id,
+                    result,
+                    message_id=message_id,
+                    artifact_id=artifact_id,
+                )
+                if updated:
+                    return
+            # Fallback synthesizes the row so the journal isn't lost.
+            repo.upsert_executed(
+                call_id,
+                tool_name=tool_name or "unknown",
+                action_name=action_name or "",
+                arguments=arguments if arguments is not None else {},
+                result=result,
+                tool_id=tool_id if tool_id and looks_like_uuid(tool_id) else None,
+                message_id=message_id,
+                artifact_id=artifact_id,
+            )
+    except Exception:
+        logger.exception("tool_call_attempts executed write failed for %s", call_id)
+
+
+def _mark_failed(call_id: str, error: str) -> None:
+    try:
+        with db_session() as conn:
+            ToolCallAttemptsRepository(conn).mark_failed(call_id, error)
+    except Exception:
+        logger.exception("tool_call_attempts failed-write failed for %s", call_id)
+
+
 class ToolExecutor:
    """Handles tool discovery, preparation, and execution.

@@ -32,6 +120,7 @@ class ToolExecutor:
        self.tool_calls: List[Dict] = []
        self._loaded_tools: Dict[str, object] = {}
        self.conversation_id: Optional[str] = None
+        self.message_id: Optional[str] = None
        self.client_tools: Optional[List[Dict]] = None
        self._name_to_tool: Dict[str, Tuple[str, str]] = {}
        self._tool_to_name: Dict[Tuple[str, str], str] = {}
@@ -51,30 +140,28 @@ class ToolExecutor:
        return tools

    def _get_tools_by_api_key(self, api_key: str) -> Dict[str, Dict]:
-        mongo = MongoDB.get_client()
-        db = mongo[settings.MONGO_DB_NAME]
-        agents_collection = db["agents"]
-        tools_collection = db["user_tools"]
-
-        agent_data = agents_collection.find_one({"key": api_key})
-        tool_ids = agent_data.get("tools", []) if agent_data else []
-
-        tools = (
-            tools_collection.find(
-                {"_id": {"$in": [ObjectId(tool_id) for tool_id in tool_ids]}}
-            )
-            if tool_ids
-            else []
-        )
-        tools = list(tools)
-        return {str(tool["_id"]): tool for tool in tools} if tools else {}
+        # Per-operation session: the answer pipeline spans a long-lived
+        # generator; wrapping it in a single connection would pin a PG
+        # conn for the whole stream. Open, fetch, close.
+        with db_readonly() as conn:
+            agent_data = AgentsRepository(conn).find_by_key(api_key)
+            tool_ids = agent_data.get("tools", []) if agent_data else []
+            if not tool_ids:
+                return {}
+            tools_repo = UserToolsRepository(conn)
+            tools: List[Dict] = []
+            owner = (agent_data.get("user_id") or agent_data.get("user")) if agent_data else None
+            for tid in tool_ids:
+                row = None
+                if owner:
+                    row = tools_repo.get_any(str(tid), owner)
+                if row is not None:
+                    tools.append(row)
+        return {str(tool["id"]): tool for tool in tools} if tools else {}

    def _get_user_tools(self, user: str = "local") -> Dict[str, Dict]:
-        mongo = MongoDB.get_client()
-        db = mongo[settings.MONGO_DB_NAME]
-        user_tools_collection = db["user_tools"]
-        user_tools = user_tools_collection.find({"user": user, "status": True})
-        user_tools = list(user_tools)
+        with db_readonly() as conn:
+            user_tools = UserToolsRepository(conn).list_active_for_user(user)
        return {str(i): tool for i, tool in enumerate(user_tools)}

    def merge_client_tools(
@@ -277,7 +364,14 @@ class ToolExecutor:

        if tool_id is None or action_name is None:
            error_message = f"Error: Failed to parse LLM tool call. Tool name: {llm_name}"
-            logger.error(error_message)
+            logger.error(
+                "tool_call_parse_failed",
+                extra={
+                    "llm_class_name": llm_class_name,
+                    "llm_tool_name": llm_name,
+                    "call_id": call_id,
+                },
+            )

            tool_call_data = {
                "tool_name": "unknown",
@@ -292,7 +386,15 @@ class ToolExecutor:

        if tool_id not in tools_dict:
            error_message = f"Error: Tool ID '{tool_id}' extracted from LLM call not found in available tools_dict. Available IDs: {list(tools_dict.keys())}"
-            logger.error(error_message)
+            logger.error(
+                "tool_id_not_found",
+                extra={
+                    "tool_id": tool_id,
+                    "llm_tool_name": llm_name,
+                    "call_id": call_id,
+                    "available_tool_count": len(tools_dict),
+                },
+            )

            tool_call_data = {
                "tool_name": "unknown",
@@ -311,9 +413,36 @@ class ToolExecutor:
            "action_name": llm_name,
            "arguments": call_args,
        }
-        yield {"type": "tool_call", "data": {**tool_call_data, "status": "pending"}}
-
        tool_data = tools_dict[tool_id]
+        # Journal first so the reconciler sees malformed calls and any
+        # subsequent ``_mark_failed`` actually updates a real row.
+        proposed_ok = _record_proposed(
+            call_id,
+            tool_data["name"],
+            action_name,
+            call_args if isinstance(call_args, dict) else {},
+            tool_id=tool_data.get("id"),
+        )
+        # Defensive guard: a non-dict ``call_args`` (e.g. malformed
+        # JSON on the resume path) would crash the param walk below
+        # with AttributeError on ``.items()``. Surface a clean error
+        # event and flip the journal row to ``failed`` instead of
+        # killing the stream.
+        if not isinstance(call_args, dict):
+            error_message = (
+                f"Tool call arguments must be a JSON object, got "
+                f"{type(call_args).__name__}."
+            )
+            tool_call_data["result"] = error_message
+            tool_call_data["arguments"] = {}
+            _mark_failed(call_id, error_message)
+            yield {
+                "type": "tool_call",
+                "data": {**tool_call_data, "status": "error"},
+            }
+            self.tool_calls.append(tool_call_data)
+            return error_message, call_id
+        yield {"type": "tool_call", "data": {**tool_call_data, "status": "pending"}}
        action_data = (
            tool_data["config"]["actions"][action_name]
            if tool_data["name"] == "api_tool"
@@ -354,19 +483,43 @@ class ToolExecutor:
            headers=headers, query_params=query_params,
        )

+        if tool is None:
+            error_message = (
+                f"Failed to load tool '{tool_data.get('name')}' (tool_id key={tool_id}): "
+                "missing 'id' on tool row."
+            )
+            logger.error(
+                "tool_load_failed",
+                extra={
+                    "tool_name": tool_data.get("name"),
+                    "tool_id": tool_id,
+                    "action_name": action_name,
+                    "call_id": call_id,
+                },
+            )
+            tool_call_data["result"] = error_message
+            _mark_failed(call_id, error_message)
+            yield {"type": "tool_call", "data": {**tool_call_data, "status": "error"}}
+            self.tool_calls.append(tool_call_data)
+            return error_message, call_id
+
        resolved_arguments = (
            {"query_params": query_params, "headers": headers, "body": body}
            if tool_data["name"] == "api_tool"
            else parameters
        )
-        if tool_data["name"] == "api_tool":
-            logger.debug(
-                f"Executing api: {action_name} with query_params: {query_params}, headers: {headers}, body: {body}"
-            )
-            result = tool.execute_action(action_name, **body)
-        else:
-            logger.debug(f"Executing tool: {action_name} with args: {call_args}")
-            result = tool.execute_action(action_name, **parameters)
+        try:
+            if tool_data["name"] == "api_tool":
+                logger.debug(
+                    f"Executing api: {action_name} with query_params: {query_params}, headers: {headers}, body: {body}"
+                )
+                result = tool.execute_action(action_name, **body)
+            else:
+                logger.debug(f"Executing tool: {action_name} with args: {call_args}")
+                result = tool.execute_action(action_name, **parameters)
+        except Exception as exc:
+            _mark_failed(call_id, str(exc))
+            raise

        get_artifact_id = (
            getattr(tool, "get_artifact_id", None)
@@ -395,6 +548,22 @@ class ToolExecutor:
            f"{result_full[:50]}..." if len(result_full) > 50 else result_full
        )

+        # Tool side effect has run; flip the journal row so the
+        # message-finalize path can later confirm it. If the proposed
+        # write failed (DB outage), upsert a fresh row in ``executed`` so
+        # the reconciler still sees the side effect.
+        _mark_executed(
+            call_id,
+            result_full,
+            message_id=self.message_id,
+            artifact_id=artifact_id or None,
+            proposed_ok=proposed_ok,
+            tool_name=tool_data["name"],
+            action_name=action_name,
+            arguments=call_args,
+            tool_id=tool_data.get("id"),
+        )
+
        stream_tool_call_data = {
            key: value
            for key, value in tool_call_data.items()
@@ -440,7 +609,18 @@ class ToolExecutor:
                tool_config.update(decrypted)
                tool_config["auth_credentials"] = decrypted
                tool_config.pop("encrypted_credentials", None)
-            tool_config["tool_id"] = str(tool_data.get("_id", tool_id))
+            row_id = tool_data.get("id")
+            if not row_id:
+                logger.error(
+                    "tool_missing_row_id",
+                    extra={
+                        "tool_name": tool_data.get("name"),
+                        "tool_id": tool_id,
+                        "action_name": action_name,
+                    },
+                )
+                return None
+            tool_config["tool_id"] = str(row_id)
            if self.conversation_id:
                tool_config["conversation_id"] = self.conversation_id
            if tool_data["name"] == "mcp_tool":
--- a/application/agents/tools/internal_search.py
+++ b/application/agents/tools/internal_search.py
@@ -39,6 +39,7 @@ class InternalSearchTool(Tool):
                chunks=int(self.config.get("chunks", 2)),
                doc_token_limit=int(self.config.get("doc_token_limit", 50000)),
                model_id=self.config.get("model_id", "docsgpt-local"),
+                model_user_id=self.config.get("model_user_id"),
                user_api_key=self.config.get("user_api_key"),
                agent_id=self.config.get("agent_id"),
                llm_name=self.config.get("llm_name", settings.LLM_PROVIDER),
@@ -48,7 +49,7 @@ class InternalSearchTool(Tool):
        return self._retriever

    def _get_directory_structure(self) -> Optional[Dict]:
-        """Load directory structure from MongoDB for the configured sources."""
+        """Load directory structure from Postgres for the configured sources."""
        if self._dir_structure_loaded:
            return self._directory_structure

@@ -59,35 +60,39 @@ class InternalSearchTool(Tool):
            return None

        try:
-            from bson.objectid import ObjectId
-            from application.core.mongo_db import MongoDB
-
-            mongo = MongoDB.get_client()
-            db = mongo[settings.MONGO_DB_NAME]
-            sources_collection = db["sources"]
+            # Per-operation session: this tool runs inside the answer
+            # generator hot path, so we open a short-lived read
+            # connection for the batch lookup and release immediately.
+            from application.storage.db.repositories.sources import (
+                SourcesRepository,
+            )
+            from application.storage.db.session import db_readonly

            if isinstance(active_docs, str):
                active_docs = [active_docs]

+            decoded_token = self.config.get("decoded_token") or {}
+            user_id = decoded_token.get("sub") if decoded_token else None
+
            merged_structure = {}
-            for doc_id in active_docs:
-                try:
-                    source_doc = sources_collection.find_one(
-                        {"_id": ObjectId(doc_id)}
-                    )
-                    if not source_doc:
-                        continue
-                    dir_str = source_doc.get("directory_structure")
-                    if dir_str:
-                        if isinstance(dir_str, str):
-                            dir_str = json.loads(dir_str)
-                        source_name = source_doc.get("name", doc_id)
-                        if len(active_docs) > 1:
-                            merged_structure[source_name] = dir_str
-                        else:
-                            merged_structure = dir_str
-                except Exception as e:
-                    logger.debug(f"Could not load dir structure for {doc_id}: {e}")
+            with db_readonly() as conn:
+                repo = SourcesRepository(conn)
+                for doc_id in active_docs:
+                    try:
+                        source_doc = repo.get_any(str(doc_id), user_id) if user_id else None
+                        if not source_doc:
+                            continue
+                        dir_str = source_doc.get("directory_structure")
+                        if dir_str:
+                            if isinstance(dir_str, str):
+                                dir_str = json.loads(dir_str)
+                            source_name = source_doc.get("name", doc_id)
+                            if len(active_docs) > 1:
+                                merged_structure[source_name] = dir_str
+                            else:
+                                merged_structure = dir_str
+                    except Exception as e:
+                        logger.debug(f"Could not load dir structure for {doc_id}: {e}")

            self._directory_structure = merged_structure if merged_structure else None
        except Exception as e:
@@ -357,32 +362,48 @@ INTERNAL_TOOL_ENTRY = build_internal_tool_entry(has_directory_structure=False)


 def sources_have_directory_structure(source: Dict) -> bool:
-    """Check if any of the active sources have directory_structure in MongoDB."""
+    """Check if any of the active sources have a ``directory_structure`` row."""
    active_docs = source.get("active_docs", [])
    if not active_docs:
        return False

    try:
-        from bson.objectid import ObjectId
-        from application.core.mongo_db import MongoDB
+        # TODO(pg-cutover): SourcesRepository.get_any requires ``user_id``
+        # scoping, but callers in the agent build path don't always
+        # thread the decoded token through here. Use a direct
+        # short-lived SQL lookup instead of the repo until the call
+        # sites are updated to propagate user context.
+        from sqlalchemy import text as _text

-        mongo = MongoDB.get_client()
-        db = mongo[settings.MONGO_DB_NAME]
-        sources_collection = db["sources"]
+        from application.storage.db.session import db_readonly

        if isinstance(active_docs, str):
            active_docs = [active_docs]

-        for doc_id in active_docs:
-            try:
-                source_doc = sources_collection.find_one(
-                    {"_id": ObjectId(doc_id)},
-                    {"directory_structure": 1},
-                )
-                if source_doc and source_doc.get("directory_structure"):
-                    return True
-            except Exception:
-                continue
+        with db_readonly() as conn:
+            for doc_id in active_docs:
+                try:
+                    value = str(doc_id)
+                    if len(value) == 36 and "-" in value:
+                        row = conn.execute(
+                            _text(
+                                "SELECT directory_structure FROM sources "
+                                "WHERE id = CAST(:id AS uuid)"
+                            ),
+                            {"id": value},
+                        ).fetchone()
+                    else:
+                        row = conn.execute(
+                            _text(
+                                "SELECT directory_structure FROM sources "
+                                "WHERE legacy_mongo_id = :lid"
+                            ),
+                            {"lid": value},
+                        ).fetchone()
+                    if row is not None and row[0]:
+                        return True
+                except Exception:
+                    continue
    except Exception as e:
        logger.debug(f"Could not check directory structure: {e}")

@@ -415,6 +436,7 @@ def build_internal_tool_config(
    chunks: int = 2,
    doc_token_limit: int = 50000,
    model_id: str = "docsgpt-local",
+    model_user_id: Optional[str] = None,
    user_api_key: Optional[str] = None,
    agent_id: Optional[str] = None,
    llm_name: str = None,
@@ -429,6 +451,7 @@ def build_internal_tool_config(
        "chunks": chunks,
        "doc_token_limit": doc_token_limit,
        "model_id": model_id,
+        "model_user_id": model_user_id,
        "user_api_key": user_api_key,
        "agent_id": agent_id,
        "llm_name": llm_name or settings.LLM_PROVIDER,
--- a/application/agents/tools/mcp_tool.py
+++ b/application/agents/tools/mcp_tool.py
@@ -22,16 +22,12 @@ from redis import Redis
 from application.agents.tools.base import Tool
 from application.api.user.tasks import mcp_oauth_status_task, mcp_oauth_task
 from application.cache import get_redis_instance
-from application.core.mongo_db import MongoDB
 from application.core.settings import settings
 from application.core.url_validation import SSRFError, validate_url
 from application.security.encryption import decrypt_credentials

 logger = logging.getLogger(__name__)

-mongo = MongoDB.get_client()
-db = mongo[settings.MONGO_DB_NAME]
-
 _mcp_clients_cache = {}


@@ -161,7 +157,6 @@ class MCPTool(Tool):
                    scopes=self.oauth_scopes,
                    redis_client=redis_client,
                    redirect_uri=self.redirect_uri,
-                    db=db,
                    user_id=self.user_id,
                )
            else:
@@ -171,7 +166,6 @@ class MCPTool(Tool):
                    redis_client=redis_client,
                    redirect_uri=self.redirect_uri,
                    task_id=self.oauth_task_id,
-                    db=db,
                    user_id=self.user_id,
                )
        elif self.auth_type == "bearer":
@@ -491,7 +485,7 @@ class MCPTool(Tool):

    def _test_oauth_connection(self) -> Dict:
        storage = DBTokenStorage(
-            server_url=self.server_url, user_id=self.user_id, db_client=db
+            server_url=self.server_url, user_id=self.user_id,
        )
        loop = asyncio.new_event_loop()
        try:
@@ -683,7 +677,6 @@ class DocsGPTOAuth(OAuthClientProvider):
        scopes: str | list[str] | None = None,
        client_name: str = "DocsGPT-MCP",
        user_id=None,
-        db=None,
        additional_client_metadata: dict[str, Any] | None = None,
        skip_redirect_validation: bool = False,
    ):
@@ -692,7 +685,6 @@ class DocsGPTOAuth(OAuthClientProvider):
        self.redis_prefix = redis_prefix
        self.task_id = task_id
        self.user_id = user_id
-        self.db = db

        parsed_url = urlparse(mcp_url)
        self.server_base_url = f"{parsed_url.scheme}://{parsed_url.netloc}"
@@ -711,7 +703,6 @@ class DocsGPTOAuth(OAuthClientProvider):
        storage = DBTokenStorage(
            server_url=self.server_base_url,
            user_id=self.user_id,
-            db_client=self.db,
            expected_redirect_uri=None if skip_redirect_validation else redirect_uri,
        )

@@ -853,54 +844,95 @@ class DBTokenStorage(TokenStorage):
        self,
        server_url: str,
        user_id: str,
-        db_client,
        expected_redirect_uri: Optional[str] = None,
    ):
        self.server_url = server_url
        self.user_id = user_id
-        self.db_client = db_client
        self.expected_redirect_uri = expected_redirect_uri
-        self.collection = db_client["connector_sessions"]

    @staticmethod
    def get_base_url(url: str) -> str:
        parsed = urlparse(url)
        return f"{parsed.scheme}://{parsed.netloc}"

-    def get_db_key(self) -> dict:
-        return {
-            "server_url": self.get_base_url(self.server_url),
-            "user_id": self.user_id,
-        }
+    def _pg_provider(self) -> str:
+        return f"mcp:{self.get_base_url(self.server_url)}"
+
+    def _fetch_session_data(self) -> dict:
+        """Read the JSONB ``session_data`` blob for this MCP server row."""
+        from application.storage.db.repositories.connector_sessions import (
+            ConnectorSessionsRepository,
+        )
+        from application.storage.db.session import db_readonly
+
+        base_url = self.get_base_url(self.server_url)
+        with db_readonly() as conn:
+            row = ConnectorSessionsRepository(conn).get_by_user_and_server_url(
+                self.user_id, base_url,
+            )
+        if not row:
+            return {}
+        data = row.get("session_data") or {}
+        if isinstance(data, str):
+            try:
+                data = json.loads(data)
+            except ValueError:
+                return {}
+        return data if isinstance(data, dict) else {}

    async def get_tokens(self) -> OAuthToken | None:
-        doc = await asyncio.to_thread(self.collection.find_one, self.get_db_key())
-        if not doc or "tokens" not in doc:
+        data = await asyncio.to_thread(self._fetch_session_data)
+        if not data or "tokens" not in data:
            return None
        try:
-            return OAuthToken.model_validate(doc["tokens"])
+            return OAuthToken.model_validate(data["tokens"])
        except ValidationError as e:
            logger.error("Could not load tokens: %s", e)
            return None

-    async def set_tokens(self, tokens: OAuthToken) -> None:
-        await asyncio.to_thread(
-            self.collection.update_one,
-            self.get_db_key(),
-            {"$set": {"tokens": tokens.model_dump()}},
-            True,
+    def _merge(self, patch: dict) -> None:
+        """Shallow-merge ``patch`` into this row's ``session_data``.
+
+        Threads ``server_url`` through to the repository so it lands in
+        the scalar column — ``get_by_user_and_server_url`` needs that to
+        resolve the row (``NULL = 'https://...'`` is UNKNOWN in SQL).
+        """
+        from application.storage.db.repositories.connector_sessions import (
+            ConnectorSessionsRepository,
        )
-        logger.info("Saved tokens for %s", self.get_base_url(self.server_url))
+        from application.storage.db.session import db_session
+
+        base_url = self.get_base_url(self.server_url)
+        with db_session() as conn:
+            ConnectorSessionsRepository(conn).merge_session_data(
+                self.user_id, self._pg_provider(), base_url, patch,
+            )
+
+    def _delete(self) -> None:
+        from application.storage.db.repositories.connector_sessions import (
+            ConnectorSessionsRepository,
+        )
+        from application.storage.db.session import db_session
+
+        with db_session() as conn:
+            ConnectorSessionsRepository(conn).delete(
+                self.user_id, self._pg_provider(),
+            )
+
+    async def set_tokens(self, tokens: OAuthToken) -> None:
+        base_url = self.get_base_url(self.server_url)
+        token_dump = tokens.model_dump()
+        await asyncio.to_thread(self._merge, {"tokens": token_dump})
+        logger.info("Saved tokens for %s", base_url)

    async def get_client_info(self) -> OAuthClientInformationFull | None:
-        doc = await asyncio.to_thread(self.collection.find_one, self.get_db_key())
-        if not doc or "client_info" not in doc:
-            logger.debug(
-                "No client_info in DB for %s", self.get_base_url(self.server_url)
-            )
+        data = await asyncio.to_thread(self._fetch_session_data)
+        base_url = self.get_base_url(self.server_url)
+        if not data or "client_info" not in data:
+            logger.debug("No client_info in DB for %s", base_url)
            return None
        try:
-            client_info = OAuthClientInformationFull.model_validate(doc["client_info"])
+            client_info = OAuthClientInformationFull.model_validate(data["client_info"])
            if self.expected_redirect_uri:
                stored_uris = [
                    str(uri).rstrip("/") for uri in client_info.redirect_uris
@@ -909,14 +941,16 @@ class DBTokenStorage(TokenStorage):
                if expected_uri not in stored_uris:
                    logger.warning(
                        "Redirect URI mismatch for %s: expected=%s stored=%s — clearing.",
-                        self.get_base_url(self.server_url),
+                        base_url,
                        expected_uri,
                        stored_uris,
                    )
+                    # Drop ``tokens`` and ``client_info`` from the JSONB
+                    # blob via merge_session_data's ``None``-drops-key
+                    # semantics — preserves the row + any other keys.
                    await asyncio.to_thread(
-                        self.collection.update_one,
-                        self.get_db_key(),
-                        {"$unset": {"client_info": "", "tokens": ""}},
+                        self._merge,
+                        {"tokens": None, "client_info": None},
                    )
                    return None
            return client_info
@@ -931,22 +965,37 @@ class DBTokenStorage(TokenStorage):

    async def set_client_info(self, client_info: OAuthClientInformationFull) -> None:
        serialized_info = self._serialize_client_info(client_info.model_dump())
+        base_url = self.get_base_url(self.server_url)
        await asyncio.to_thread(
-            self.collection.update_one,
-            self.get_db_key(),
-            {"$set": {"client_info": serialized_info}},
-            True,
+            self._merge, {"client_info": serialized_info},
        )
-        logger.info("Saved client info for %s", self.get_base_url(self.server_url))
+        logger.info("Saved client info for %s", base_url)

    async def clear(self) -> None:
-        await asyncio.to_thread(self.collection.delete_one, self.get_db_key())
+        await asyncio.to_thread(self._delete)
        logger.info("Cleared OAuth cache for %s", self.get_base_url(self.server_url))

    @classmethod
-    async def clear_all(cls, db_client) -> None:
-        collection = db_client["connector_sessions"]
-        await asyncio.to_thread(collection.delete_many, {})
+    async def clear_all(cls, db_client=None) -> None:
+        """Delete every MCP-tagged connector session row.
+
+        ``db_client`` retained for call-site compatibility but unused —
+        storage is Postgres-only now.
+        """
+        from sqlalchemy import text
+
+        from application.storage.db.session import db_session
+
+        def _delete_all() -> None:
+            with db_session() as conn:
+                conn.execute(
+                    text(
+                        "DELETE FROM connector_sessions "
+                        "WHERE provider LIKE 'mcp:%'"
+                    )
+                )
+
+        await asyncio.to_thread(_delete_all)
        logger.info("Cleared all OAuth client cache data.")


--- a/application/agents/tools/memory.py
+++ b/application/agents/tools/memory.py
@@ -1,12 +1,14 @@
-from datetime import datetime
 from pathlib import Path
 from typing import Any, Dict, List, Optional
-import re
+import logging
 import uuid

 from .base import Tool
-from application.core.mongo_db import MongoDB
-from application.core.settings import settings
+from application.storage.db.repositories.memories import MemoriesRepository
+from application.storage.db.session import db_readonly, db_session
+
+
+logger = logging.getLogger(__name__)


 class MemoryTool(Tool):
@@ -27,7 +29,7 @@ class MemoryTool(Tool):
        self.user_id: Optional[str] = user_id

        # Get tool_id from configuration (passed from user_tools._id in production)
-        # In production, tool_id is the MongoDB ObjectId string from user_tools collection
+        # In production, tool_id is the UUID string from user_tools.id.
        if tool_config and "tool_id" in tool_config:
            self.tool_id = tool_config["tool_id"]
        elif user_id:
@@ -37,8 +39,35 @@ class MemoryTool(Tool):
            # Last resort fallback (shouldn't happen in normal use)
            self.tool_id = str(uuid.uuid4())

-        db = MongoDB.get_client()[settings.MONGO_DB_NAME]
-        self.collection = db["memories"]
+    def _pg_enabled(self) -> bool:
+        """Return True if this MemoryTool's tool_id is a real ``user_tools.id``.
+
+        The ``memories`` PG table has a UUID foreign key to ``user_tools``.
+        The sentinel ``default_{uid}`` fallback tool_id is not a UUID and
+        has no row in ``user_tools``, so any storage operation would fail
+        the foreign-key check. After the Postgres cutover Postgres is the
+        only store, so for the sentinel case there is nowhere to read or
+        write — operations become no-ops and the tool returns an
+        explanatory error to the caller.
+        """
+        tool_id = getattr(self, "tool_id", None)
+        if not tool_id or not isinstance(tool_id, str):
+            return False
+        if tool_id.startswith("default_"):
+            logger.debug(
+                "Skipping Postgres operation for MemoryTool with sentinel tool_id=%s",
+                tool_id,
+            )
+            return False
+        from application.storage.db.base_repository import looks_like_uuid
+
+        if not looks_like_uuid(tool_id):
+            logger.debug(
+                "Skipping Postgres operation for MemoryTool with non-UUID tool_id=%s",
+                tool_id,
+            )
+            return False
+        return True

    # -----------------------------
    # Action implementations
@@ -56,6 +85,12 @@ class MemoryTool(Tool):
        if not self.user_id:
            return "Error: MemoryTool requires a valid user_id."

+        if not self._pg_enabled():
+            return (
+                "Error: MemoryTool is not configured with a persistent tool_id; "
+                "memory storage is unavailable for this session."
+            )
+
        if action_name == "view":
            return self._view(
                kwargs.get("path", "/"),
@@ -282,14 +317,10 @@ class MemoryTool(Tool):
        # Ensure path ends with / for proper prefix matching
        search_path = path if path.endswith("/") else path + "/"

-        # Find all files that start with this directory path
-        query = {
-            "user_id": self.user_id,
-            "tool_id": self.tool_id,
-            "path": {"$regex": f"^{re.escape(search_path)}"}
-        }
-
-        docs = list(self.collection.find(query, {"path": 1}))
+        with db_readonly() as conn:
+            docs = MemoriesRepository(conn).list_by_prefix(
+                self.user_id, self.tool_id, search_path
+            )

        if not docs:
            return f"Directory: {path}\n(empty)"
@@ -310,7 +341,10 @@ class MemoryTool(Tool):

    def _view_file(self, path: str, view_range: Optional[List[int]] = None) -> str:
        """View file contents with optional line range."""
-        doc = self.collection.find_one({"user_id": self.user_id, "tool_id": self.tool_id, "path": path})
+        with db_readonly() as conn:
+            doc = MemoriesRepository(conn).get_by_path(
+                self.user_id, self.tool_id, path
+            )

        if not doc or not doc.get("content"):
            return f"Error: File not found: {path}"
@@ -344,16 +378,10 @@ class MemoryTool(Tool):
        if validated_path == "/" or validated_path.endswith("/"):
            return "Error: Cannot create a file at directory path."

-        self.collection.update_one(
-            {"user_id": self.user_id, "tool_id": self.tool_id, "path": validated_path},
-            {
-                "$set": {
-                    "content": file_text,
-                    "updated_at": datetime.now()
-                }
-            },
-            upsert=True
-        )
+        with db_session() as conn:
+            MemoriesRepository(conn).upsert(
+                self.user_id, self.tool_id, validated_path, file_text
+            )

        return f"File created: {validated_path}"

@@ -366,30 +394,29 @@ class MemoryTool(Tool):
        if not old_str:
            return "Error: old_str is required."

-        doc = self.collection.find_one({"user_id": self.user_id, "tool_id": self.tool_id, "path": validated_path})
+        with db_session() as conn:
+            repo = MemoriesRepository(conn)
+            doc = repo.get_by_path(self.user_id, self.tool_id, validated_path)

-        if not doc or not doc.get("content"):
-            return f"Error: File not found: {validated_path}"
+            if not doc or not doc.get("content"):
+                return f"Error: File not found: {validated_path}"

-        current_content = str(doc["content"])
+            current_content = str(doc["content"])

-        # Check if old_str exists (case-insensitive)
-        if old_str.lower() not in current_content.lower():
-            return f"Error: String '{old_str}' not found in file."
+            # Check if old_str exists (case-insensitive)
+            if old_str.lower() not in current_content.lower():
+                return f"Error: String '{old_str}' not found in file."

-        # Replace the string (case-insensitive)
-        import re as regex_module
-        updated_content = regex_module.sub(regex_module.escape(old_str), new_str, current_content, flags=regex_module.IGNORECASE)
+            # Case-insensitive replace
+            import re as regex_module
+            updated_content = regex_module.sub(
+                regex_module.escape(old_str),
+                new_str,
+                current_content,
+                flags=regex_module.IGNORECASE,
+            )

-        self.collection.update_one(
-            {"user_id": self.user_id, "tool_id": self.tool_id, "path": validated_path},
-            {
-                "$set": {
-                    "content": updated_content,
-                    "updated_at": datetime.now()
-                }
-            }
-        )
+            repo.upsert(self.user_id, self.tool_id, validated_path, updated_content)

        return f"File updated: {validated_path}"

@@ -402,31 +429,25 @@ class MemoryTool(Tool):
        if not insert_text:
            return "Error: insert_text is required."

-        doc = self.collection.find_one({"user_id": self.user_id, "tool_id": self.tool_id, "path": validated_path})
+        with db_session() as conn:
+            repo = MemoriesRepository(conn)
+            doc = repo.get_by_path(self.user_id, self.tool_id, validated_path)

-        if not doc or not doc.get("content"):
-            return f"Error: File not found: {validated_path}"
+            if not doc or not doc.get("content"):
+                return f"Error: File not found: {validated_path}"

-        current_content = str(doc["content"])
-        lines = current_content.split("\n")
+            current_content = str(doc["content"])
+            lines = current_content.split("\n")

-        # Convert to 0-indexed
-        index = insert_line - 1
-        if index < 0 or index > len(lines):
-            return f"Error: Invalid line number. File has {len(lines)} lines."
+            # Convert to 0-indexed
+            index = insert_line - 1
+            if index < 0 or index > len(lines):
+                return f"Error: Invalid line number. File has {len(lines)} lines."

-        lines.insert(index, insert_text)
-        updated_content = "\n".join(lines)
+            lines.insert(index, insert_text)
+            updated_content = "\n".join(lines)

-        self.collection.update_one(
-            {"user_id": self.user_id, "tool_id": self.tool_id, "path": validated_path},
-            {
-                "$set": {
-                    "content": updated_content,
-                    "updated_at": datetime.now()
-                }
-            }
-        )
+            repo.upsert(self.user_id, self.tool_id, validated_path, updated_content)

        return f"Text inserted at line {insert_line} in {validated_path}"

@@ -438,39 +459,36 @@ class MemoryTool(Tool):

        if validated_path == "/":
            # Delete all files for this user and tool
-            result = self.collection.delete_many({"user_id": self.user_id, "tool_id": self.tool_id})
-            return f"Deleted {result.deleted_count} file(s) from memory."
+            with db_session() as conn:
+                deleted = MemoriesRepository(conn).delete_all(
+                    self.user_id, self.tool_id
+                )
+            return f"Deleted {deleted} file(s) from memory."

        # Check if it's a directory (ends with /)
        if validated_path.endswith("/"):
-            # Delete all files in directory
-            result = self.collection.delete_many({
-                "user_id": self.user_id,
-                "tool_id": self.tool_id,
-                "path": {"$regex": f"^{re.escape(validated_path)}"}
-            })
-            return f"Deleted directory and {result.deleted_count} file(s)."
+            with db_session() as conn:
+                deleted = MemoriesRepository(conn).delete_by_prefix(
+                    self.user_id, self.tool_id, validated_path
+                )
+            return f"Deleted directory and {deleted} file(s)."

-        # Try to delete as directory first (without trailing slash)
-        # Check if any files start with this path + /
+        # Try as directory first (without trailing slash)
        search_path = validated_path + "/"
-        directory_result = self.collection.delete_many({
-            "user_id": self.user_id,
-            "tool_id": self.tool_id,
-            "path": {"$regex": f"^{re.escape(search_path)}"}
-        })
+        with db_session() as conn:
+            repo = MemoriesRepository(conn)
+            directory_deleted = repo.delete_by_prefix(
+                self.user_id, self.tool_id, search_path
+            )
+            if directory_deleted > 0:
+                return f"Deleted directory and {directory_deleted} file(s)."

-        if directory_result.deleted_count > 0:
-            return f"Deleted directory and {directory_result.deleted_count} file(s)."
+            # Otherwise delete a single file
+            file_deleted = repo.delete_by_path(
+                self.user_id, self.tool_id, validated_path
+            )

-        # Delete single file
-        result = self.collection.delete_one({
-            "user_id": self.user_id,
-            "tool_id": self.tool_id,
-            "path": validated_path
-        })
-
-        if result.deleted_count:
+        if file_deleted:
            return f"Deleted: {validated_path}"
        return f"Error: File not found: {validated_path}"

@@ -485,62 +503,46 @@ class MemoryTool(Tool):
        if validated_old == "/" or validated_new == "/":
            return "Error: Cannot rename root directory."

-        # Check if renaming a directory
+        # Directory rename: do all path updates inside one transaction so
+        # the rename is atomic from the caller's perspective.
        if validated_old.endswith("/"):
            # Ensure validated_new also ends with / for proper path replacement
            if not validated_new.endswith("/"):
                validated_new = validated_new + "/"

-            # Find all files in the old directory
-            docs = list(self.collection.find({
-                "user_id": self.user_id,
-                "tool_id": self.tool_id,
-                "path": {"$regex": f"^{re.escape(validated_old)}"}
-            }))
-
-            if not docs:
-                return f"Error: Directory not found: {validated_old}"
-
-            # Update paths for all files
-            for doc in docs:
-                old_file_path = doc["path"]
-                new_file_path = old_file_path.replace(validated_old, validated_new, 1)
-
-                self.collection.update_one(
-                    {"_id": doc["_id"]},
-                    {"$set": {"path": new_file_path, "updated_at": datetime.now()}}
+            with db_session() as conn:
+                repo = MemoriesRepository(conn)
+                docs = repo.list_by_prefix(
+                    self.user_id, self.tool_id, validated_old
                )

+                if not docs:
+                    return f"Error: Directory not found: {validated_old}"
+
+                for doc in docs:
+                    old_file_path = doc["path"]
+                    new_file_path = old_file_path.replace(
+                        validated_old, validated_new, 1
+                    )
+                    repo.update_path(
+                        self.user_id, self.tool_id, old_file_path, new_file_path
+                    )
+
            return f"Renamed directory: {validated_old} -> {validated_new} ({len(docs)} files)"

-        # Rename single file
-        doc = self.collection.find_one({
-            "user_id": self.user_id,
-            "tool_id": self.tool_id,
-            "path": validated_old
-        })
+        # Single-file rename: lookup, collision check, and update in one txn.
+        with db_session() as conn:
+            repo = MemoriesRepository(conn)
+            doc = repo.get_by_path(self.user_id, self.tool_id, validated_old)
+            if not doc:
+                return f"Error: File not found: {validated_old}"

-        if not doc:
-            return f"Error: File not found: {validated_old}"
+            existing = repo.get_by_path(self.user_id, self.tool_id, validated_new)
+            if existing:
+                return f"Error: File already exists at {validated_new}"

-        # Check if new path already exists
-        existing = self.collection.find_one({
-            "user_id": self.user_id,
-            "tool_id": self.tool_id,
-            "path": validated_new
-        })
-
-        if existing:
-            return f"Error: File already exists at {validated_new}"
-
-        # Delete the old document and create a new one with the new path
-        self.collection.delete_one({"user_id": self.user_id, "tool_id": self.tool_id, "path": validated_old})
-        self.collection.insert_one({
-            "user_id": self.user_id,
-            "tool_id": self.tool_id,
-            "path": validated_new,
-            "content": doc.get("content", ""),
-            "updated_at": datetime.now()
-        })
+            repo.update_path(
+                self.user_id, self.tool_id, validated_old, validated_new
+            )

        return f"Renamed: {validated_old} -> {validated_new}"
--- a/application/agents/tools/notes.py
+++ b/application/agents/tools/notes.py
@@ -1,10 +1,16 @@
-from datetime import datetime
 from typing import Any, Dict, List, Optional
 import uuid

 from .base import Tool
-from application.core.mongo_db import MongoDB
-from application.core.settings import settings
+from application.storage.db.repositories.notes import NotesRepository
+from application.storage.db.session import db_readonly, db_session
+
+
+# Stable synthetic title used in the Postgres ``notes.title`` column.
+# The notes tool stores one note per (user_id, tool_id); there is no
+# user-facing title. PG requires ``title`` NOT NULL, so we write a stable
+# constant alongside the actual note body in ``content``.
+_NOTE_TITLE = "note"


 class NotesTool(Tool):
@@ -25,7 +31,6 @@ class NotesTool(Tool):
        self.user_id: Optional[str] = user_id

        # Get tool_id from configuration (passed from user_tools._id in production)
-        # In production, tool_id is the MongoDB ObjectId string from user_tools collection
        if tool_config and "tool_id" in tool_config:
            self.tool_id = tool_config["tool_id"]
        elif user_id:
@@ -35,11 +40,25 @@ class NotesTool(Tool):
            # Last resort fallback (shouldn't happen in normal use)
            self.tool_id = str(uuid.uuid4())

-        db = MongoDB.get_client()[settings.MONGO_DB_NAME]
-        self.collection = db["notes"]
-
        self._last_artifact_id: Optional[str] = None

+    def _pg_enabled(self) -> bool:
+        """Return True only when ``tool_id`` is a real ``user_tools.id`` UUID.
+
+        ``notes.tool_id`` is a UUID FK to ``user_tools``; repo queries
+        ``CAST(:tool_id AS uuid)``. The sentinel ``default_{uid}``
+        fallback is neither a UUID nor a ``user_tools`` row, so any DB
+        operation would crash. Mirror MemoryTool's guard and no-op.
+        """
+        tool_id = getattr(self, "tool_id", None)
+        if not tool_id or not isinstance(tool_id, str):
+            return False
+        if tool_id.startswith("default_"):
+            return False
+        from application.storage.db.base_repository import looks_like_uuid
+
+        return looks_like_uuid(tool_id)
+
    # -----------------------------
    # Action implementations
    # -----------------------------
@@ -54,7 +73,13 @@ class NotesTool(Tool):
            A human-readable string result.
        """
        if not self.user_id:
-             return "Error: NotesTool requires a valid user_id."
+            return "Error: NotesTool requires a valid user_id."
+
+        if not self._pg_enabled():
+            return (
+                "Error: NotesTool is not configured with a persistent "
+                "tool_id; note storage is unavailable for this session."
+            )

        self._last_artifact_id = None

@@ -135,37 +160,45 @@ class NotesTool(Tool):
    # -----------------------------
    # Internal helpers (single-note)
    # -----------------------------
+    def _fetch_note(self) -> Optional[dict]:
+        """Read the note row for this (user, tool) from Postgres."""
+        with db_readonly() as conn:
+            return NotesRepository(conn).get_for_user_tool(self.user_id, self.tool_id)
+
    def _get_note(self) -> str:
-        doc = self.collection.find_one({"user_id": self.user_id, "tool_id": self.tool_id})
-        if not doc or not doc.get("note"):
+        doc = self._fetch_note()
+        # ``content`` is the PG column; expose as ``note`` to callers via the
+        # textual return value. Frontends that read the artifact via the
+        # repo dict get ``content`` (PG-native) plus the artifact id below.
+        body = (doc or {}).get("content")
+        if not doc or not body:
            return "No note found."
-        if doc.get("_id") is not None:
-            self._last_artifact_id = str(doc.get("_id"))
-        return str(doc["note"])
+        if doc.get("id") is not None:
+            self._last_artifact_id = str(doc.get("id"))
+        return str(body)

    def _overwrite_note(self, content: str) -> str:
        content = (content or "").strip()
        if not content:
            return "Note content required."
-        result = self.collection.find_one_and_update(
-            {"user_id": self.user_id, "tool_id": self.tool_id},
-            {"$set": {"note": content, "updated_at": datetime.utcnow()}},
-            upsert=True,
-            return_document=True,
-        )
-        if result and result.get("_id") is not None:
-            self._last_artifact_id = str(result.get("_id"))
+        with db_session() as conn:
+            row = NotesRepository(conn).upsert(
+                self.user_id, self.tool_id, _NOTE_TITLE, content
+            )
+        if row and row.get("id") is not None:
+            self._last_artifact_id = str(row.get("id"))
        return "Note saved."

    def _str_replace(self, old_str: str, new_str: str) -> str:
        if not old_str:
            return "old_str is required."

-        doc = self.collection.find_one({"user_id": self.user_id, "tool_id": self.tool_id})
-        if not doc or not doc.get("note"):
+        doc = self._fetch_note()
+        existing = (doc or {}).get("content")
+        if not doc or not existing:
            return "No note found."

-        current_note = str(doc["note"])
+        current_note = str(existing)

        # Case-insensitive search
        if old_str.lower() not in current_note.lower():
@@ -175,24 +208,24 @@ class NotesTool(Tool):
        import re
        updated_note = re.sub(re.escape(old_str), new_str, current_note, flags=re.IGNORECASE)

-        result = self.collection.find_one_and_update(
-            {"user_id": self.user_id, "tool_id": self.tool_id},
-            {"$set": {"note": updated_note, "updated_at": datetime.utcnow()}},
-            return_document=True,
-        )
-        if result and result.get("_id") is not None:
-            self._last_artifact_id = str(result.get("_id"))
+        with db_session() as conn:
+            row = NotesRepository(conn).upsert(
+                self.user_id, self.tool_id, _NOTE_TITLE, updated_note
+            )
+        if row and row.get("id") is not None:
+            self._last_artifact_id = str(row.get("id"))
        return "Note updated."

    def _insert(self, line_number: int, text: str) -> str:
        if not text:
            return "Text is required."

-        doc = self.collection.find_one({"user_id": self.user_id, "tool_id": self.tool_id})
-        if not doc or not doc.get("note"):
+        doc = self._fetch_note()
+        existing = (doc or {}).get("content")
+        if not doc or not existing:
            return "No note found."

-        current_note = str(doc["note"])
+        current_note = str(existing)
        lines = current_note.split("\n")

        # Convert to 0-indexed and validate
@@ -203,21 +236,23 @@ class NotesTool(Tool):
        lines.insert(index, text)
        updated_note = "\n".join(lines)

-        result = self.collection.find_one_and_update(
-            {"user_id": self.user_id, "tool_id": self.tool_id},
-            {"$set": {"note": updated_note, "updated_at": datetime.utcnow()}},
-            return_document=True,
-        )
-        if result and result.get("_id") is not None:
-            self._last_artifact_id = str(result.get("_id"))
+        with db_session() as conn:
+            row = NotesRepository(conn).upsert(
+                self.user_id, self.tool_id, _NOTE_TITLE, updated_note
+            )
+        if row and row.get("id") is not None:
+            self._last_artifact_id = str(row.get("id"))
        return "Text inserted."

    def _delete_note(self) -> str:
-        doc = self.collection.find_one_and_delete(
-            {"user_id": self.user_id, "tool_id": self.tool_id}
-        )
-        if not doc:
+        # Capture the id (for artifact tracking) before deleting.
+        existing = self._fetch_note()
+        if not existing:
            return "No note found to delete."
-        if doc.get("_id") is not None:
-            self._last_artifact_id = str(doc.get("_id"))
+        with db_session() as conn:
+            deleted = NotesRepository(conn).delete(self.user_id, self.tool_id)
+        if not deleted:
+            return "No note found to delete."
+        if existing.get("id") is not None:
+            self._last_artifact_id = str(existing.get("id"))
        return "Note deleted."
--- a/application/agents/tools/postgres.py
+++ b/application/agents/tools/postgres.py
@@ -177,3 +177,4 @@ class PostgresTool(Tool):
                "order": 1,
            },
        }
+
--- a/application/agents/tools/todo_list.py
+++ b/application/agents/tools/todo_list.py
@@ -1,10 +1,19 @@
-from datetime import datetime
 from typing import Any, Dict, List, Optional
 import uuid

 from .base import Tool
-from application.core.mongo_db import MongoDB
-from application.core.settings import settings
+from application.storage.db.repositories.todos import TodosRepository
+from application.storage.db.session import db_readonly, db_session
+
+
+def _status_from_completed(completed: Any) -> str:
+    """Translate the PG ``completed`` boolean to the legacy status string.
+
+    The frontend (and prior LLM-facing tool output) expects
+    ``"open"`` / ``"completed"``. Keeping that contract at the tool
+    boundary insulates callers from the schema change.
+    """
+    return "completed" if bool(completed) else "open"


 class TodoListTool(Tool):
@@ -25,7 +34,6 @@ class TodoListTool(Tool):
        self.user_id: Optional[str] = user_id

        # Get tool_id from configuration (passed from user_tools._id in production)
-        # In production, tool_id is the MongoDB ObjectId string from user_tools collection
        if tool_config and "tool_id" in tool_config:
            self.tool_id = tool_config["tool_id"]
        elif user_id:
@@ -35,11 +43,27 @@ class TodoListTool(Tool):
            # Last resort fallback (shouldn't happen in normal use)
            self.tool_id = str(uuid.uuid4())

-        db = MongoDB.get_client()[settings.MONGO_DB_NAME]
-        self.collection = db["todos"]
-
        self._last_artifact_id: Optional[str] = None

+    def _pg_enabled(self) -> bool:
+        """Return True only when ``tool_id`` is a real ``user_tools.id`` UUID.
+
+        The ``todos`` PG table has a UUID foreign key to ``user_tools`` and
+        the repo queries ``CAST(:tool_id AS uuid)``. The sentinel
+        ``default_{uid}`` fallback is neither a UUID nor a row in
+        ``user_tools`` — binding it would crash ``invalid input syntax for
+        type uuid`` and even if it didn't the FK would reject it. Mirror
+        the MemoryTool guard and no-op in that case.
+        """
+        tool_id = getattr(self, "tool_id", None)
+        if not tool_id or not isinstance(tool_id, str):
+            return False
+        if tool_id.startswith("default_"):
+            return False
+        from application.storage.db.base_repository import looks_like_uuid
+
+        return looks_like_uuid(tool_id)
+
    # -----------------------------
    # Action implementations
    # -----------------------------
@@ -56,6 +80,12 @@ class TodoListTool(Tool):
        if not self.user_id:
            return "Error: TodoListTool requires a valid user_id."

+        if not self._pg_enabled():
+            return (
+                "Error: TodoListTool is not configured with a persistent "
+                "tool_id; todo storage is unavailable for this session."
+            )
+
        self._last_artifact_id = None

        if action_name == "list":
@@ -191,28 +221,10 @@ class TodoListTool(Tool):

        return None

-    def _get_next_todo_id(self) -> int:
-        """Get the next sequential todo_id for this user and tool.
-
-        Returns a simple integer (1, 2, 3, ...) scoped to this user/tool.
-        With 5-10 todos max, scanning is negligible.
-        """
-        query = {"user_id": self.user_id, "tool_id": self.tool_id}
-        todos = list(self.collection.find(query, {"todo_id": 1}))
-
-        # Find the maximum todo_id
-        max_id = 0
-        for todo in todos:
-            todo_id = self._coerce_todo_id(todo.get("todo_id"))
-            if todo_id is not None:
-                max_id = max(max_id, todo_id)
-
-        return max_id + 1
-
    def _list(self) -> str:
        """List all todos for the user."""
-        query = {"user_id": self.user_id, "tool_id": self.tool_id}
-        todos = list(self.collection.find(query))
+        with db_readonly() as conn:
+            todos = TodosRepository(conn).list_for_tool(self.user_id, self.tool_id)

        if not todos:
            return "No todos found."
@@ -221,7 +233,7 @@ class TodoListTool(Tool):
        for doc in todos:
            todo_id = doc.get("todo_id")
            title = doc.get("title", "Untitled")
-            status = doc.get("status", "open")
+            status = _status_from_completed(doc.get("completed"))

            line = f"[{todo_id}] {title} ({status})"
            result_lines.append(line)
@@ -229,27 +241,23 @@ class TodoListTool(Tool):
        return "\n".join(result_lines)

    def _create(self, title: str) -> str:
-        """Create a new todo item."""
+        """Create a new todo item.
+
+        ``TodosRepository.create`` allocates the per-tool monotonic
+        ``todo_id`` inside the same transaction (``COALESCE(MAX(todo_id),0)+1``
+        scoped to ``tool_id``), so we no longer need a separate read-then-
+        write step here.
+        """
        title = (title or "").strip()
        if not title:
            return "Error: Title is required."

-        now = datetime.now()
-        todo_id = self._get_next_todo_id()
+        with db_session() as conn:
+            row = TodosRepository(conn).create(self.user_id, self.tool_id, title)

-        doc = {
-            "todo_id": todo_id,
-            "user_id": self.user_id,
-            "tool_id": self.tool_id,
-            "title": title,
-            "status": "open",
-            "created_at": now,
-            "updated_at": now,
-        }
-        insert_result = self.collection.insert_one(doc)
-        inserted_id = getattr(insert_result, "inserted_id", None) or doc.get("_id")
-        if inserted_id is not None:
-            self._last_artifact_id = str(inserted_id)
+        todo_id = row.get("todo_id")
+        if row.get("id") is not None:
+            self._last_artifact_id = str(row.get("id"))
        return f"Todo created with ID {todo_id}: {title}"

    def _get(self, todo_id: Optional[Any]) -> str:
@@ -258,21 +266,21 @@ class TodoListTool(Tool):
        if parsed_todo_id is None:
            return "Error: todo_id must be a positive integer."

-        query = {"user_id": self.user_id, "tool_id": self.tool_id, "todo_id": parsed_todo_id}
-        doc = self.collection.find_one(query)
+        with db_readonly() as conn:
+            doc = TodosRepository(conn).get_by_tool_and_todo_id(
+                self.user_id, self.tool_id, parsed_todo_id
+            )

        if not doc:
            return f"Error: Todo with ID {parsed_todo_id} not found."

-        if doc.get("_id") is not None:
-            self._last_artifact_id = str(doc.get("_id"))
+        if doc.get("id") is not None:
+            self._last_artifact_id = str(doc.get("id"))

        title = doc.get("title", "Untitled")
-        status = doc.get("status", "open")
+        status = _status_from_completed(doc.get("completed"))

-        result = f"Todo [{parsed_todo_id}]:\nTitle: {title}\nStatus: {status}"
-
-        return result
+        return f"Todo [{parsed_todo_id}]:\nTitle: {title}\nStatus: {status}"

    def _update(self, todo_id: Optional[Any], title: str) -> str:
        """Update a todo's title by ID."""
@@ -284,16 +292,19 @@ class TodoListTool(Tool):
        if not title:
            return "Error: Title is required."

-        query = {"user_id": self.user_id, "tool_id": self.tool_id, "todo_id": parsed_todo_id}
-        doc = self.collection.find_one_and_update(
-            query,
-            {"$set": {"title": title, "updated_at": datetime.now()}},
-        )
-        if not doc:
-            return f"Error: Todo with ID {parsed_todo_id} not found."
+        with db_session() as conn:
+            repo = TodosRepository(conn)
+            existing = repo.get_by_tool_and_todo_id(
+                self.user_id, self.tool_id, parsed_todo_id
+            )
+            if not existing:
+                return f"Error: Todo with ID {parsed_todo_id} not found."
+            repo.update_title_by_tool_and_todo_id(
+                self.user_id, self.tool_id, parsed_todo_id, title
+            )

-        if doc.get("_id") is not None:
-            self._last_artifact_id = str(doc.get("_id"))
+        if existing.get("id") is not None:
+            self._last_artifact_id = str(existing.get("id"))

        return f"Todo {parsed_todo_id} updated to: {title}"

@@ -303,16 +314,17 @@ class TodoListTool(Tool):
        if parsed_todo_id is None:
            return "Error: todo_id must be a positive integer."

-        query = {"user_id": self.user_id, "tool_id": self.tool_id, "todo_id": parsed_todo_id}
-        doc = self.collection.find_one_and_update(
-            query,
-            {"$set": {"status": "completed", "updated_at": datetime.now()}},
-        )
-        if not doc:
-            return f"Error: Todo with ID {parsed_todo_id} not found."
+        with db_session() as conn:
+            repo = TodosRepository(conn)
+            existing = repo.get_by_tool_and_todo_id(
+                self.user_id, self.tool_id, parsed_todo_id
+            )
+            if not existing:
+                return f"Error: Todo with ID {parsed_todo_id} not found."
+            repo.set_completed(self.user_id, self.tool_id, parsed_todo_id, True)

-        if doc.get("_id") is not None:
-            self._last_artifact_id = str(doc.get("_id"))
+        if existing.get("id") is not None:
+            self._last_artifact_id = str(existing.get("id"))

        return f"Todo {parsed_todo_id} marked as completed."

@@ -322,12 +334,18 @@ class TodoListTool(Tool):
        if parsed_todo_id is None:
            return "Error: todo_id must be a positive integer."

-        query = {"user_id": self.user_id, "tool_id": self.tool_id, "todo_id": parsed_todo_id}
-        doc = self.collection.find_one_and_delete(query)
-        if not doc:
-            return f"Error: Todo with ID {parsed_todo_id} not found."
+        with db_session() as conn:
+            repo = TodosRepository(conn)
+            existing = repo.get_by_tool_and_todo_id(
+                self.user_id, self.tool_id, parsed_todo_id
+            )
+            if not existing:
+                return f"Error: Todo with ID {parsed_todo_id} not found."
+            repo.delete_by_tool_and_todo_id(
+                self.user_id, self.tool_id, parsed_todo_id
+            )

-        if doc.get("_id") is not None:
-            self._last_artifact_id = str(doc.get("_id"))
+        if existing.get("id") is not None:
+            self._last_artifact_id = str(existing.get("id"))

        return f"Todo {parsed_todo_id} deleted."
--- a/application/agents/tools/tool_action_parser.py
+++ b/application/agents/tools/tool_action_parser.py
@@ -57,6 +57,29 @@ class ToolActionParser:
    def _parse_google_llm(self, call):
        try:
            call_args = call.arguments
+            # Gemini's SDK natively returns ``args`` as a dict, but the
+            # resume path (``gen_continuation``) stringifies it for the
+            # assistant message. Coerce a JSON string back into a dict;
+            # fall back to an empty dict on malformed input so downstream
+            # ``call_args.items()`` doesn't crash the stream.
+            if isinstance(call_args, str):
+                try:
+                    call_args = json.loads(call_args)
+                except (json.JSONDecodeError, TypeError):
+                    logger.warning(
+                        "Google call.arguments was not valid JSON; "
+                        "falling back to empty args for %s",
+                        getattr(call, "name", "<unknown>"),
+                    )
+                    call_args = {}
+            if not isinstance(call_args, dict):
+                logger.warning(
+                    "Google call.arguments has unexpected type %s; "
+                    "falling back to empty args for %s",
+                    type(call_args).__name__,
+                    getattr(call, "name", "<unknown>"),
+                )
+                call_args = {}

            resolved = self._resolve_via_mapping(call.name)
            if resolved:
--- a/application/agents/workflow_agent.py
+++ b/application/agents/workflow_agent.py
@@ -12,12 +12,13 @@ from application.agents.workflows.schemas import (
    WorkflowRun,
 )
 from application.agents.workflows.workflow_engine import WorkflowEngine
-from application.core.mongo_db import MongoDB
-from application.core.settings import settings
 from application.logging import log_activity, LogContext
-from application.storage.db.dual_write import dual_write
+from application.storage.db.base_repository import looks_like_uuid
+from application.storage.db.repositories.workflow_edges import WorkflowEdgesRepository
+from application.storage.db.repositories.workflow_nodes import WorkflowNodesRepository
 from application.storage.db.repositories.workflow_runs import WorkflowRunsRepository
 from application.storage.db.repositories.workflows import WorkflowsRepository
+from application.storage.db.session import db_readonly, db_session

 logger = logging.getLogger(__name__)

@@ -106,10 +107,8 @@ class WorkflowAgent(BaseAgent):

    def _load_from_database(self) -> Optional[WorkflowGraph]:
        try:
-            from bson.objectid import ObjectId
-
-            if not self.workflow_id or not ObjectId.is_valid(self.workflow_id):
-                logger.error(f"Invalid workflow ID: {self.workflow_id}")
+            if not self.workflow_id:
+                logger.error("Missing workflow ID for load")
                return None
            owner_id = self.workflow_owner
            if not owner_id and isinstance(self.decoded_token, dict):
@@ -120,61 +119,61 @@ class WorkflowAgent(BaseAgent):
                )
                return None

-            mongo = MongoDB.get_client()
-            db = mongo[settings.MONGO_DB_NAME]
-
-            workflows_coll = db["workflows"]
-            workflow_nodes_coll = db["workflow_nodes"]
-            workflow_edges_coll = db["workflow_edges"]
-
-            workflow_doc = workflows_coll.find_one(
-                {"_id": ObjectId(self.workflow_id), "user": owner_id}
-            )
-            if not workflow_doc:
-                logger.error(
-                    f"Workflow {self.workflow_id} not found or inaccessible for user {owner_id}"
-                )
-                return None
-            workflow = Workflow(**workflow_doc)
-            graph_version = workflow_doc.get("current_graph_version", 1)
-            try:
-                graph_version = int(graph_version)
-                if graph_version <= 0:
+            with db_readonly() as conn:
+                wf_repo = WorkflowsRepository(conn)
+                if looks_like_uuid(self.workflow_id):
+                    workflow_row = wf_repo.get(self.workflow_id, owner_id)
+                else:
+                    workflow_row = wf_repo.get_by_legacy_id(self.workflow_id, owner_id)
+                if workflow_row is None:
+                    logger.error(
+                        f"Workflow {self.workflow_id} not found or inaccessible "
+                        f"for user {owner_id}"
+                    )
+                    return None
+                pg_workflow_id = str(workflow_row["id"])
+                graph_version = workflow_row.get("current_graph_version", 1)
+                try:
+                    graph_version = int(graph_version)
+                    if graph_version <= 0:
+                        graph_version = 1
+                except (ValueError, TypeError):
                    graph_version = 1
-            except (ValueError, TypeError):
-                graph_version = 1

-            nodes_docs = list(
-                workflow_nodes_coll.find(
-                    {"workflow_id": self.workflow_id, "graph_version": graph_version}
+                node_rows = WorkflowNodesRepository(conn).find_by_version(
+                    pg_workflow_id, graph_version,
                )
-            )
-            if not nodes_docs and graph_version == 1:
-                nodes_docs = list(
-                    workflow_nodes_coll.find(
-                        {
-                            "workflow_id": self.workflow_id,
-                            "graph_version": {"$exists": False},
-                        }
-                    )
+                edge_rows = WorkflowEdgesRepository(conn).find_by_version(
+                    pg_workflow_id, graph_version,
                )
-            nodes = [WorkflowNode(**doc) for doc in nodes_docs]

-            edges_docs = list(
-                workflow_edges_coll.find(
-                    {"workflow_id": self.workflow_id, "graph_version": graph_version}
-                )
+            workflow = Workflow(
+                name=workflow_row.get("name"),
+                description=workflow_row.get("description"),
            )
-            if not edges_docs and graph_version == 1:
-                edges_docs = list(
-                    workflow_edges_coll.find(
-                        {
-                            "workflow_id": self.workflow_id,
-                            "graph_version": {"$exists": False},
-                        }
-                    )
+            nodes = [
+                WorkflowNode(
+                    id=n["node_id"],
+                    workflow_id=pg_workflow_id,
+                    type=n["node_type"],
+                    title=n.get("title") or "Node",
+                    description=n.get("description"),
+                    position=n.get("position") or {"x": 0, "y": 0},
+                    config=n.get("config") or {},
                )
-            edges = [WorkflowEdge(**doc) for doc in edges_docs]
+                for n in node_rows
+            ]
+            edges = [
+                WorkflowEdge(
+                    id=e["edge_id"],
+                    workflow_id=pg_workflow_id,
+                    source=e.get("source_id"),
+                    target=e.get("target_id"),
+                    sourceHandle=e.get("source_handle"),
+                    targetHandle=e.get("target_handle"),
+                )
+                for e in edge_rows
+            ]

            return WorkflowGraph(workflow=workflow, nodes=nodes, edges=edges)
        except Exception as e:
@@ -188,10 +187,6 @@ class WorkflowAgent(BaseAgent):
        if not owner_id and isinstance(self.decoded_token, dict):
            owner_id = self.decoded_token.get("sub")
        try:
-            mongo = MongoDB.get_client()
-            db = mongo[settings.MONGO_DB_NAME]
-            workflow_runs_coll = db["workflow_runs"]
-
            run = WorkflowRun(
                workflow_id=self.workflow_id or "unknown",
                user=owner_id,
@@ -203,23 +198,20 @@ class WorkflowAgent(BaseAgent):
                completed_at=datetime.now(timezone.utc),
            )

-            result = workflow_runs_coll.insert_one(run.to_mongo_doc())
-            legacy_mongo_id = (
-                str(result.inserted_id)
-                if getattr(result, "inserted_id", None) is not None
-                else None
-            )
-
-            def _pg_write(repo: WorkflowRunsRepository) -> None:
-                if not self.workflow_id or not owner_id or not legacy_mongo_id:
+            if not self.workflow_id or not owner_id:
+                return
+            with db_session() as conn:
+                wf_repo = WorkflowsRepository(conn)
+                if looks_like_uuid(self.workflow_id):
+                    workflow_row = wf_repo.get(self.workflow_id, owner_id)
+                else:
+                    workflow_row = wf_repo.get_by_legacy_id(
+                        self.workflow_id, owner_id,
+                    )
+                if workflow_row is None:
                    return
-                workflow = WorkflowsRepository(repo._conn).get_by_legacy_id(
-                    self.workflow_id, owner_id,
-                )
-                if workflow is None:
-                    return
-                repo.create(
-                    workflow["id"],
+                WorkflowRunsRepository(conn).create(
+                    str(workflow_row["id"]),
                    owner_id,
                    run.status.value,
                    inputs=run.inputs,
@@ -227,10 +219,7 @@ class WorkflowAgent(BaseAgent):
                    steps=[step.model_dump(mode="json") for step in run.steps],
                    started_at=run.created_at,
                    ended_at=run.completed_at,
-                    legacy_mongo_id=legacy_mongo_id,
                )
-
-            dual_write(WorkflowRunsRepository, _pg_write)
        except Exception as e:
            logger.error(f"Failed to save workflow run: {e}")

--- a/application/agents/workflows/schemas.py
+++ b/application/agents/workflows/schemas.py
@@ -2,7 +2,6 @@ from datetime import datetime, timezone
 from enum import Enum
 from typing import Any, Dict, List, Literal, Optional, Union

-from bson import ObjectId
 from pydantic import BaseModel, ConfigDict, Field, field_validator


@@ -81,24 +80,7 @@ class WorkflowEdgeCreate(BaseModel):


 class WorkflowEdge(WorkflowEdgeCreate):
-    mongo_id: Optional[str] = Field(None, alias="_id")
-
-    @field_validator("mongo_id", mode="before")
-    @classmethod
-    def convert_objectid(cls, v: Any) -> Optional[str]:
-        if isinstance(v, ObjectId):
-            return str(v)
-        return v
-
-    def to_mongo_doc(self) -> Dict[str, Any]:
-        return {
-            "id": self.id,
-            "workflow_id": self.workflow_id,
-            "source_id": self.source_id,
-            "target_id": self.target_id,
-            "source_handle": self.source_handle,
-            "target_handle": self.target_handle,
-        }
+    pass


 class WorkflowNodeCreate(BaseModel):
@@ -120,25 +102,7 @@ class WorkflowNodeCreate(BaseModel):


 class WorkflowNode(WorkflowNodeCreate):
-    mongo_id: Optional[str] = Field(None, alias="_id")
-
-    @field_validator("mongo_id", mode="before")
-    @classmethod
-    def convert_objectid(cls, v: Any) -> Optional[str]:
-        if isinstance(v, ObjectId):
-            return str(v)
-        return v
-
-    def to_mongo_doc(self) -> Dict[str, Any]:
-        return {
-            "id": self.id,
-            "workflow_id": self.workflow_id,
-            "type": self.type.value,
-            "title": self.title,
-            "description": self.description,
-            "position": self.position.model_dump(),
-            "config": self.config,
-        }
+    pass


 class WorkflowCreate(BaseModel):
@@ -149,26 +113,10 @@ class WorkflowCreate(BaseModel):


 class Workflow(WorkflowCreate):
-    id: Optional[str] = Field(None, alias="_id")
+    id: Optional[str] = None
    created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
    updated_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))

-    @field_validator("id", mode="before")
-    @classmethod
-    def convert_objectid(cls, v: Any) -> Optional[str]:
-        if isinstance(v, ObjectId):
-            return str(v)
-        return v
-
-    def to_mongo_doc(self) -> Dict[str, Any]:
-        return {
-            "name": self.name,
-            "description": self.description,
-            "user": self.user,
-            "created_at": self.created_at,
-            "updated_at": self.updated_at,
-        }
-

 class WorkflowGraph(BaseModel):
    workflow: Workflow
@@ -209,7 +157,7 @@ class WorkflowRunCreate(BaseModel):

 class WorkflowRun(BaseModel):
    model_config = ConfigDict(extra="allow")
-    id: Optional[str] = Field(None, alias="_id")
+    id: Optional[str] = None
    workflow_id: str
    user: Optional[str] = None
    status: ExecutionStatus = ExecutionStatus.PENDING
@@ -218,25 +166,3 @@ class WorkflowRun(BaseModel):
    steps: List[NodeExecutionLog] = Field(default_factory=list)
    created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
    completed_at: Optional[datetime] = None
-
-    @field_validator("id", mode="before")
-    @classmethod
-    def convert_objectid(cls, v: Any) -> Optional[str]:
-        if isinstance(v, ObjectId):
-            return str(v)
-        return v
-
-    def to_mongo_doc(self) -> Dict[str, Any]:
-        doc = {
-            "workflow_id": self.workflow_id,
-            "status": self.status.value,
-            "inputs": self.inputs,
-            "outputs": self.outputs,
-            "steps": [step.model_dump() for step in self.steps],
-            "created_at": self.created_at,
-            "completed_at": self.completed_at,
-        }
-        if self.user:
-            doc["user"] = self.user
-            doc["user_id"] = self.user
-        return doc
--- a/application/agents/workflows/workflow_engine.py
+++ b/application/agents/workflows/workflow_engine.py
@@ -200,6 +200,9 @@ class WorkflowEngine:

        node_config = AgentNodeConfig(**node.config.get("config", node.config))

+        if node_config.sources:
+            self._retrieve_node_sources(node_config)
+
        if node_config.prompt_template:
            formatted_prompt = self._format_template(node_config.prompt_template)
        else:
@@ -208,15 +211,26 @@ class WorkflowEngine:
            node_config.json_schema, node.title
        )
        node_model_id = node_config.model_id or self.agent.model_id
+        # Inherit BYOM scope from parent agent so owner-stored BYOM
+        # resolves on shared workflows.
+        node_user_id = getattr(self.agent, "model_user_id", None) or (
+            self.agent.decoded_token.get("sub")
+            if isinstance(self.agent.decoded_token, dict)
+            else None
+        )
        node_llm_name = (
            node_config.llm_name
-            or get_provider_from_model_id(node_model_id or "")
+            or get_provider_from_model_id(
+                node_model_id or "", user_id=node_user_id
+            )
            or self.agent.llm_name
        )
        node_api_key = get_api_key_for_provider(node_llm_name) or self.agent.api_key

        if node_json_schema and node_model_id:
-            model_capabilities = get_model_capabilities(node_model_id)
+            model_capabilities = get_model_capabilities(
+                node_model_id, user_id=node_user_id
+            )
            if model_capabilities and not model_capabilities.get(
                "supports_structured_output", False
            ):
@@ -229,6 +243,7 @@ class WorkflowEngine:
            "endpoint": self.agent.endpoint,
            "llm_name": node_llm_name,
            "model_id": node_model_id,
+            "model_user_id": getattr(self.agent, "model_user_id", None),
            "api_key": node_api_key,
            "tool_ids": node_config.tools,
            "prompt": node_config.system_prompt,
@@ -455,6 +470,29 @@ class WorkflowEngine:
        docs_together = "\n\n".join(docs_together_parts) if docs_together_parts else None
        return docs, docs_together

+    def _retrieve_node_sources(self, node_config: AgentNodeConfig) -> None:
+        """Retrieve documents from the node's sources for template resolution."""
+        from application.retriever.retriever_creator import RetrieverCreator
+
+        query = self.state.get("query", "")
+        if not query:
+            return
+
+        try:
+            retriever = RetrieverCreator.create_retriever(
+                node_config.retriever or "classic",
+                source={"active_docs": node_config.sources},
+                chat_history=[],
+                prompt="",
+                chunks=int(node_config.chunks) if node_config.chunks else 2,
+                decoded_token=self.agent.decoded_token,
+            )
+            docs = retriever.search(query)
+            if docs:
+                self.agent.retrieved_docs = docs
+        except Exception:
+            logger.exception("Failed to retrieve docs for workflow node")
+
    def get_execution_summary(self) -> List[NodeExecutionLog]:
        return [
            NodeExecutionLog(
--- a/application/alembic/versions/0001_initial.py
+++ b/application/alembic/versions/0001_initial.py
@@ -167,14 +167,19 @@ def upgrade() -> None:
    op.execute(
        """
        CREATE TABLE user_tools (
-            id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-            user_id      TEXT NOT NULL,
-            name         TEXT NOT NULL,
-            custom_name  TEXT,
-            display_name TEXT,
-            config       JSONB NOT NULL DEFAULT '{}'::jsonb,
-            created_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
-            updated_at   TIMESTAMPTZ NOT NULL DEFAULT now()
+            id                  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+            user_id             TEXT NOT NULL,
+            name                TEXT NOT NULL,
+            custom_name         TEXT,
+            display_name        TEXT,
+            description         TEXT,
+            config              JSONB NOT NULL DEFAULT '{}'::jsonb,
+            config_requirements JSONB NOT NULL DEFAULT '{}'::jsonb,
+            actions             JSONB NOT NULL DEFAULT '[]'::jsonb,
+            status              BOOLEAN NOT NULL DEFAULT true,
+            created_at          TIMESTAMPTZ NOT NULL DEFAULT now(),
+            updated_at          TIMESTAMPTZ NOT NULL DEFAULT now(),
+            legacy_mongo_id     TEXT
        );
        """
    )
@@ -188,7 +193,8 @@ def upgrade() -> None:
            agent_id         UUID,
            prompt_tokens    INTEGER NOT NULL DEFAULT 0,
            generated_tokens INTEGER NOT NULL DEFAULT 0,
-            timestamp        TIMESTAMPTZ NOT NULL DEFAULT now()
+            timestamp        TIMESTAMPTZ NOT NULL DEFAULT now(),
+            mongo_id         TEXT
        );
        """
    )
@@ -204,7 +210,8 @@ def upgrade() -> None:
            user_id   TEXT,
            endpoint  TEXT,
            timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
-            data      JSONB
+            data      JSONB,
+            mongo_id  TEXT
        );
        """
    )
@@ -220,7 +227,8 @@ def upgrade() -> None:
            api_key     TEXT,
            query       TEXT,
            stacks      JSONB NOT NULL DEFAULT '[]'::jsonb,
-            timestamp   TIMESTAMPTZ NOT NULL DEFAULT now()
+            timestamp   TIMESTAMPTZ NOT NULL DEFAULT now(),
+            mongo_id    TEXT
        );
        """
    )
@@ -228,12 +236,14 @@ def upgrade() -> None:
    op.execute(
        """
        CREATE TABLE agent_folders (
-            id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-            user_id     TEXT NOT NULL,
-            name        TEXT NOT NULL,
-            description TEXT,
-            created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
-            updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
+            id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+            user_id         TEXT NOT NULL,
+            name            TEXT NOT NULL,
+            description     TEXT,
+            parent_id       UUID REFERENCES agent_folders(id) ON DELETE SET NULL,
+            created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
+            updated_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
+            legacy_mongo_id TEXT
        );
        """
    )
@@ -241,13 +251,24 @@ def upgrade() -> None:
    op.execute(
        """
        CREATE TABLE sources (
-            id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-            user_id    TEXT NOT NULL,
-            name       TEXT NOT NULL,
-            type       TEXT,
-            metadata   JSONB NOT NULL DEFAULT '{}'::jsonb,
-            created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
-            updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
+            id                  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+            user_id             TEXT NOT NULL,
+            name                TEXT NOT NULL,
+            language            TEXT,
+            date                TIMESTAMPTZ NOT NULL DEFAULT now(),
+            model               TEXT,
+            type                TEXT,
+            metadata            JSONB NOT NULL DEFAULT '{}'::jsonb,
+            retriever           TEXT,
+            sync_frequency      TEXT,
+            tokens              TEXT,
+            file_path           TEXT,
+            remote_data         JSONB,
+            directory_structure JSONB,
+            file_name_map       JSONB,
+            created_at          TIMESTAMPTZ NOT NULL DEFAULT now(),
+            updated_at          TIMESTAMPTZ NOT NULL DEFAULT now(),
+            legacy_mongo_id     TEXT
        );
        """
    )
@@ -255,33 +276,38 @@ def upgrade() -> None:
    op.execute(
        """
        CREATE TABLE agents (
-            id                     UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-            user_id                TEXT NOT NULL,
-            name                   TEXT NOT NULL,
-            description            TEXT,
-            agent_type             TEXT,
-            status                 TEXT NOT NULL,
-            key                    CITEXT UNIQUE,
-            source_id              UUID REFERENCES sources(id) ON DELETE SET NULL,
-            extra_source_ids       UUID[] NOT NULL DEFAULT '{}',
-            chunks                 INTEGER,
-            retriever              TEXT,
-            prompt_id              UUID REFERENCES prompts(id) ON DELETE SET NULL,
-            tools                  JSONB NOT NULL DEFAULT '[]'::jsonb,
-            json_schema            JSONB,
-            models                 JSONB,
-            default_model_id       TEXT,
-            folder_id              UUID REFERENCES agent_folders(id) ON DELETE SET NULL,
-            limited_token_mode     BOOLEAN NOT NULL DEFAULT false,
-            token_limit            INTEGER,
-            limited_request_mode   BOOLEAN NOT NULL DEFAULT false,
-            request_limit          INTEGER,
-            shared                 BOOLEAN NOT NULL DEFAULT false,
-            incoming_webhook_token CITEXT UNIQUE,
-            created_at             TIMESTAMPTZ NOT NULL DEFAULT now(),
-            updated_at             TIMESTAMPTZ NOT NULL DEFAULT now(),
-            last_used_at           TIMESTAMPTZ,
-            legacy_mongo_id        TEXT
+            id                           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+            user_id                      TEXT NOT NULL,
+            name                         TEXT NOT NULL,
+            description                  TEXT,
+            agent_type                   TEXT,
+            status                       TEXT NOT NULL,
+            key                          CITEXT UNIQUE,
+            image                        TEXT,
+            source_id                    UUID REFERENCES sources(id) ON DELETE SET NULL,
+            extra_source_ids             UUID[] NOT NULL DEFAULT '{}',
+            chunks                       INTEGER,
+            retriever                    TEXT,
+            prompt_id                    UUID REFERENCES prompts(id) ON DELETE SET NULL,
+            tools                        JSONB NOT NULL DEFAULT '[]'::jsonb,
+            json_schema                  JSONB,
+            models                       JSONB,
+            default_model_id             TEXT,
+            folder_id                    UUID REFERENCES agent_folders(id) ON DELETE SET NULL,
+            workflow_id                  UUID,
+            limited_token_mode           BOOLEAN NOT NULL DEFAULT false,
+            token_limit                  INTEGER,
+            limited_request_mode         BOOLEAN NOT NULL DEFAULT false,
+            request_limit                INTEGER,
+            allow_system_prompt_override BOOLEAN NOT NULL DEFAULT false,
+            shared                       BOOLEAN NOT NULL DEFAULT false,
+            shared_token                 CITEXT UNIQUE,
+            shared_metadata              JSONB,
+            incoming_webhook_token       CITEXT UNIQUE,
+            created_at                   TIMESTAMPTZ NOT NULL DEFAULT now(),
+            updated_at                   TIMESTAMPTZ NOT NULL DEFAULT now(),
+            last_used_at                 TIMESTAMPTZ,
+            legacy_mongo_id              TEXT
        );
        """
    )
@@ -299,6 +325,11 @@ def upgrade() -> None:
            upload_path     TEXT NOT NULL,
            mime_type       TEXT,
            size            BIGINT,
+            content         TEXT,
+            token_count     INTEGER,
+            openai_file_id  TEXT,
+            google_file_uri TEXT,
+            metadata        JSONB,
            created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
            legacy_mongo_id TEXT
        );
@@ -313,6 +344,7 @@ def upgrade() -> None:
            tool_id    UUID REFERENCES user_tools(id) ON DELETE CASCADE,
            path       TEXT NOT NULL,
            content    TEXT NOT NULL,
+            created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
            updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
        );
        """
@@ -321,13 +353,16 @@ def upgrade() -> None:
    op.execute(
        """
        CREATE TABLE todos (
-            id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-            user_id    TEXT NOT NULL,
-            tool_id    UUID REFERENCES user_tools(id) ON DELETE CASCADE,
-            title      TEXT NOT NULL,
-            completed  BOOLEAN NOT NULL DEFAULT false,
-            created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
-            updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
+            id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+            user_id         TEXT NOT NULL,
+            tool_id         UUID REFERENCES user_tools(id) ON DELETE CASCADE,
+            todo_id         INTEGER,
+            title           TEXT NOT NULL,
+            completed       BOOLEAN NOT NULL DEFAULT false,
+            metadata        JSONB NOT NULL DEFAULT '{}'::jsonb,
+            created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
+            updated_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
+            legacy_mongo_id TEXT
        );
        """
    )
@@ -335,13 +370,15 @@ def upgrade() -> None:
    op.execute(
        """
        CREATE TABLE notes (
-            id         UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-            user_id    TEXT NOT NULL,
-            tool_id    UUID REFERENCES user_tools(id) ON DELETE CASCADE,
-            title      TEXT NOT NULL,
-            content    TEXT NOT NULL,
-            created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
-            updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
+            id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+            user_id         TEXT NOT NULL,
+            tool_id         UUID REFERENCES user_tools(id) ON DELETE CASCADE,
+            title           TEXT NOT NULL,
+            content         TEXT NOT NULL,
+            metadata        JSONB NOT NULL DEFAULT '{}'::jsonb,
+            created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
+            updated_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
+            legacy_mongo_id TEXT
        );
        """
    )
@@ -349,12 +386,18 @@ def upgrade() -> None:
    op.execute(
        """
        CREATE TABLE connector_sessions (
-            id           UUID PRIMARY KEY DEFAULT gen_random_uuid(),
-            user_id      TEXT NOT NULL,
-            provider     TEXT NOT NULL,
-            session_data JSONB NOT NULL,
-            expires_at   TIMESTAMPTZ,
-            created_at   TIMESTAMPTZ NOT NULL DEFAULT now()
+            id              UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+            user_id         TEXT NOT NULL,
+            provider        TEXT NOT NULL,
+            server_url      TEXT,
+            session_token   TEXT UNIQUE,
+            user_email      TEXT,
+            status          TEXT,
+            token_info      JSONB,
+            session_data    JSONB NOT NULL DEFAULT '{}'::jsonb,
+            expires_at      TIMESTAMPTZ,
+            created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
+            legacy_mongo_id TEXT
        );
        """
    )
@@ -454,6 +497,14 @@ def upgrade() -> None:
        );
        """
    )
+    # Backfill the agents.workflow_id FK now that workflows exists.
+    # The column was created without a FK (forward reference to a table
+    # that hadn't been declared yet); add the constraint here so workflow
+    # deletion still cascades through to agent unset.
+    op.execute(
+        "ALTER TABLE agents ADD CONSTRAINT agents_workflow_fk "
+        "FOREIGN KEY (workflow_id) REFERENCES workflows(id) ON DELETE SET NULL;"
+    )

    op.execute(
        """
@@ -539,13 +590,26 @@ def upgrade() -> None:
    )

    op.execute(
-        "CREATE UNIQUE INDEX connector_sessions_user_provider_uidx "
-        "ON connector_sessions (user_id, provider);"
+        # MCP and OAuth connectors share the ``provider`` slot, so the
+        # dedup key is ``(user_id, server_url, provider)``: MCP rows
+        # differentiate by server_url (one per MCP server), OAuth rows
+        # have server_url = NULL and differentiate by provider alone.
+        # COALESCE lets NULL server_url participate in the constraint.
+        "CREATE UNIQUE INDEX connector_sessions_user_endpoint_uidx "
+        "ON connector_sessions (user_id, COALESCE(server_url, ''), provider);"
    )
    op.execute(
        "CREATE INDEX connector_sessions_expiry_idx "
        "ON connector_sessions (expires_at) WHERE expires_at IS NOT NULL;"
    )
+    op.execute(
+        "CREATE INDEX connector_sessions_server_url_idx "
+        "ON connector_sessions (server_url) WHERE server_url IS NOT NULL;"
+    )
+    op.execute(
+        "CREATE UNIQUE INDEX connector_sessions_legacy_mongo_id_uidx "
+        "ON connector_sessions (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
+    )

    op.execute(
        "CREATE UNIQUE INDEX conversation_messages_conv_pos_uidx "
@@ -587,6 +651,10 @@ def upgrade() -> None:

    op.execute("CREATE UNIQUE INDEX notes_user_tool_uidx ON notes (user_id, tool_id);")
    op.execute("CREATE INDEX notes_tool_id_idx ON notes (tool_id);")
+    op.execute(
+        "CREATE UNIQUE INDEX notes_legacy_mongo_id_uidx "
+        "ON notes (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
+    )

    op.execute(
        "CREATE UNIQUE INDEX pending_tool_state_conv_user_uidx "
@@ -616,20 +684,54 @@ def upgrade() -> None:
    )

    op.execute("CREATE INDEX sources_user_idx ON sources (user_id);")
+    op.execute(
+        "CREATE UNIQUE INDEX sources_legacy_mongo_id_uidx "
+        "ON sources (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
+    )
+    op.execute(
+        "CREATE UNIQUE INDEX user_tools_legacy_mongo_id_uidx "
+        "ON user_tools (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
+    )
+    op.execute(
+        "CREATE UNIQUE INDEX agent_folders_legacy_mongo_id_uidx "
+        "ON agent_folders (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
+    )
+    op.execute("CREATE INDEX agent_folders_parent_idx ON agent_folders (parent_id);")
+    op.execute("CREATE INDEX agents_workflow_idx ON agents (workflow_id);")

    op.execute('CREATE INDEX stack_logs_timestamp_idx ON stack_logs ("timestamp" DESC);')
    op.execute('CREATE INDEX stack_logs_user_ts_idx   ON stack_logs (user_id, "timestamp" DESC);')
    op.execute('CREATE INDEX stack_logs_level_ts_idx  ON stack_logs (level, "timestamp" DESC);')
    op.execute("CREATE INDEX stack_logs_activity_idx  ON stack_logs (activity_id);")
+    op.execute(
+        "CREATE UNIQUE INDEX stack_logs_mongo_id_uidx "
+        "ON stack_logs (mongo_id) WHERE mongo_id IS NOT NULL;"
+    )

    op.execute("CREATE INDEX todos_user_tool_idx ON todos (user_id, tool_id);")
    op.execute("CREATE INDEX todos_tool_id_idx   ON todos (tool_id);")
+    op.execute(
+        "CREATE UNIQUE INDEX todos_legacy_mongo_id_uidx "
+        "ON todos (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
+    )
+    op.execute(
+        "CREATE UNIQUE INDEX todos_tool_todo_id_uidx "
+        "ON todos (tool_id, todo_id) WHERE todo_id IS NOT NULL;"
+    )

    op.execute('CREATE INDEX token_usage_user_ts_idx  ON token_usage (user_id, "timestamp" DESC);')
    op.execute('CREATE INDEX token_usage_key_ts_idx   ON token_usage (api_key, "timestamp" DESC);')
    op.execute('CREATE INDEX token_usage_agent_ts_idx ON token_usage (agent_id, "timestamp" DESC);')
+    op.execute(
+        "CREATE UNIQUE INDEX token_usage_mongo_id_uidx "
+        "ON token_usage (mongo_id) WHERE mongo_id IS NOT NULL;"
+    )

    op.execute('CREATE INDEX user_logs_user_ts_idx ON user_logs (user_id, "timestamp" DESC);')
+    op.execute(
+        "CREATE UNIQUE INDEX user_logs_mongo_id_uidx "
+        "ON user_logs (mongo_id) WHERE mongo_id IS NOT NULL;"
+    )

    op.execute("CREATE INDEX user_tools_user_id_idx ON user_tools (user_id);")

--- a/application/alembic/versions/0002_app_metadata.py
+++ b/application/alembic/versions/0002_app_metadata.py
@@ -0,0 +1,37 @@
+"""0002 app_metadata — singleton key/value table for instance-wide state.
+
+Used by the startup version-check client to persist the anonymous
+instance UUID and a one-shot "notice shown" flag. Both values are tiny
+plain-text strings; this is a deliberate generic-config table rather
+than dedicated columns so future one-off settings (telemetry opt-in
+timestamps, feature-flag overrides, etc.) don't each need their own
+migration.
+
+Revision ID: 0002_app_metadata
+Revises: 0001_initial
+"""
+
+from typing import Sequence, Union
+
+from alembic import op
+
+
+revision: str = "0002_app_metadata"
+down_revision: Union[str, None] = "0001_initial"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    op.execute(
+        """
+        CREATE TABLE app_metadata (
+            key   TEXT PRIMARY KEY,
+            value TEXT NOT NULL
+        );
+        """
+    )
+
+
+def downgrade() -> None:
+    op.execute("DROP TABLE IF EXISTS app_metadata;")
--- a/application/alembic/versions/0003_user_custom_models.py
+++ b/application/alembic/versions/0003_user_custom_models.py
@@ -0,0 +1,65 @@
+"""0003 user_custom_models — per-user OpenAI-compatible model registrations.
+
+Revision ID: 0003_user_custom_models
+Revises: 0002_app_metadata
+"""
+
+from typing import Sequence, Union
+
+from alembic import op
+
+
+revision: str = "0003_user_custom_models"
+down_revision: Union[str, None] = "0002_app_metadata"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    op.execute(
+        """
+        CREATE TABLE user_custom_models (
+            id                  UUID PRIMARY KEY DEFAULT gen_random_uuid(),
+            user_id             TEXT NOT NULL,
+            upstream_model_id   TEXT NOT NULL,
+            display_name        TEXT NOT NULL,
+            description         TEXT NOT NULL DEFAULT '',
+            base_url            TEXT NOT NULL,
+            api_key_encrypted   TEXT NOT NULL,
+            capabilities        JSONB NOT NULL DEFAULT '{}'::jsonb,
+            enabled             BOOLEAN NOT NULL DEFAULT true,
+            created_at          TIMESTAMPTZ NOT NULL DEFAULT now(),
+            updated_at          TIMESTAMPTZ NOT NULL DEFAULT now()
+        );
+        """
+    )
+    op.execute(
+        "CREATE INDEX user_custom_models_user_id_idx "
+        "ON user_custom_models (user_id);"
+    )
+
+    # Mirror the project-wide invariants set up in 0001_initial:
+    #   * user_id FK with ON DELETE RESTRICT (deferrable),
+    #   * ensure_user_exists() trigger so the parent users row autocreates,
+    #   * set_updated_at() trigger.
+    op.execute(
+        "ALTER TABLE user_custom_models "
+        "ADD CONSTRAINT user_custom_models_user_id_fk "
+        "FOREIGN KEY (user_id) REFERENCES users(user_id) "
+        "ON DELETE RESTRICT DEFERRABLE INITIALLY IMMEDIATE;"
+    )
+    op.execute(
+        "CREATE TRIGGER user_custom_models_ensure_user "
+        "BEFORE INSERT OR UPDATE OF user_id ON user_custom_models "
+        "FOR EACH ROW EXECUTE FUNCTION ensure_user_exists();"
+    )
+    op.execute(
+        "CREATE TRIGGER user_custom_models_set_updated_at "
+        "BEFORE UPDATE ON user_custom_models "
+        "FOR EACH ROW WHEN (OLD.* IS DISTINCT FROM NEW.*) "
+        "EXECUTE FUNCTION set_updated_at();"
+    )
+
+
+def downgrade() -> None:
+    op.execute("DROP TABLE IF EXISTS user_custom_models;")
--- a/application/alembic/versions/0004_durability_foundation.py
+++ b/application/alembic/versions/0004_durability_foundation.py
@@ -0,0 +1,217 @@
+"""0004 durability foundation — idempotency, tool-call log, ingest checkpoint.
+
+Adds ``task_dedup``, ``webhook_dedup``, ``tool_call_attempts``,
+``ingest_chunk_progress``, and per-row status flags on
+``conversation_messages`` and ``pending_tool_state``. Also adds
+``token_usage.source`` and ``token_usage.request_id`` so per-channel
+cost attribution (``agent_stream`` / ``title`` / ``compression`` /
+``rag_condense`` / ``fallback``) is queryable and multi-call agent runs
+can be DISTINCT-collapsed into a single user request for rate limiting.
+
+Revision ID: 0004_durability_foundation
+Revises: 0003_user_custom_models
+"""
+
+from typing import Sequence, Union
+
+from alembic import op
+
+
+revision: str = "0004_durability_foundation"
+down_revision: Union[str, None] = "0003_user_custom_models"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    # ------------------------------------------------------------------
+    # New tables
+    # ------------------------------------------------------------------
+    # ``attempt_count`` bounds the per-Celery-task idempotency wrapper's
+    # retry loop so a poison message can't run forever; default 0 means
+    # existing rows behave as if no attempts have run yet.
+    op.execute(
+        """
+        CREATE TABLE task_dedup (
+            idempotency_key TEXT PRIMARY KEY,
+            task_name       TEXT NOT NULL,
+            task_id         TEXT NOT NULL,
+            result_json     JSONB,
+            status          TEXT NOT NULL
+                            CHECK (status IN ('pending', 'completed', 'failed')),
+            attempt_count   INT  NOT NULL DEFAULT 0,
+            created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
+        );
+        """
+    )
+
+    op.execute(
+        """
+        CREATE TABLE webhook_dedup (
+            idempotency_key TEXT PRIMARY KEY,
+            agent_id        UUID NOT NULL,
+            task_id         TEXT NOT NULL,
+            response_json   JSONB,
+            created_at      TIMESTAMPTZ NOT NULL DEFAULT now()
+        );
+        """
+    )
+
+    # FK on ``message_id`` uses ``ON DELETE SET NULL`` so the journal row
+    # survives parent-message deletion (compliance / cost-attribution).
+    op.execute(
+        """
+        CREATE TABLE tool_call_attempts (
+            call_id      TEXT PRIMARY KEY,
+            message_id   UUID
+                         REFERENCES conversation_messages (id)
+                         ON DELETE SET NULL,
+            tool_id      UUID,
+            tool_name    TEXT NOT NULL,
+            action_name  TEXT NOT NULL,
+            arguments    JSONB NOT NULL,
+            result       JSONB,
+            error        TEXT,
+            status       TEXT NOT NULL
+                         CHECK (status IN (
+                             'proposed', 'executed', 'confirmed',
+                             'compensated', 'failed'
+                         )),
+            attempted_at TIMESTAMPTZ NOT NULL DEFAULT now(),
+            updated_at   TIMESTAMPTZ NOT NULL DEFAULT now()
+        );
+        """
+    )
+
+    op.execute(
+        """
+        CREATE TABLE ingest_chunk_progress (
+            source_id        UUID PRIMARY KEY,
+            total_chunks     INT NOT NULL,
+            embedded_chunks  INT NOT NULL DEFAULT 0,
+            last_index       INT NOT NULL DEFAULT -1,
+            last_updated     TIMESTAMPTZ NOT NULL DEFAULT now()
+        );
+        """
+    )
+
+    # ------------------------------------------------------------------
+    # Column additions on existing tables
+    # ------------------------------------------------------------------
+    # DEFAULT 'complete' backfills existing rows — they're already done.
+    op.execute(
+        """
+        ALTER TABLE conversation_messages
+            ADD COLUMN status TEXT NOT NULL DEFAULT 'complete'
+                CHECK (status IN ('pending', 'streaming', 'complete', 'failed')),
+            ADD COLUMN request_id TEXT;
+        """
+    )
+
+    op.execute(
+        """
+        ALTER TABLE pending_tool_state
+            ADD COLUMN status TEXT NOT NULL DEFAULT 'pending'
+                CHECK (status IN ('pending', 'resuming')),
+            ADD COLUMN resumed_at TIMESTAMPTZ;
+        """
+    )
+
+    # Default ``agent_stream`` backfills historical rows under the
+    # assumption they were written from the primary path — pre-fix the
+    # only path that wrote was the error branch reading agent.llm.
+    # ``request_id`` is the stream-scoped UUID stamped by the route on
+    # ``agent.llm`` so multi-tool agent runs (which produce N rows)
+    # collapse to one request via DISTINCT in ``count_in_range``.
+    # Side-channel sources (``title`` / ``compression`` / ``rag_condense``
+    # / ``fallback``) leave it NULL and are excluded from the request
+    # count by source filter.
+    op.execute(
+        """
+        ALTER TABLE token_usage
+            ADD COLUMN source     TEXT NOT NULL DEFAULT 'agent_stream',
+            ADD COLUMN request_id TEXT;
+        """
+    )
+
+    # ------------------------------------------------------------------
+    # Indexes — partial where the predicate selects only non-terminal rows
+    # ------------------------------------------------------------------
+    op.execute(
+        "CREATE INDEX conversation_messages_pending_ts_idx "
+        "ON conversation_messages (timestamp) "
+        "WHERE status IN ('pending', 'streaming');"
+    )
+    op.execute(
+        "CREATE INDEX tool_call_attempts_pending_ts_idx "
+        "ON tool_call_attempts (attempted_at) "
+        "WHERE status IN ('proposed', 'executed');"
+    )
+    op.execute(
+        "CREATE INDEX tool_call_attempts_message_idx "
+        "ON tool_call_attempts (message_id) "
+        "WHERE message_id IS NOT NULL;"
+    )
+    op.execute(
+        "CREATE INDEX pending_tool_state_resuming_ts_idx "
+        "ON pending_tool_state (resumed_at) "
+        "WHERE status = 'resuming';"
+    )
+    op.execute(
+        "CREATE INDEX webhook_dedup_agent_idx "
+        "ON webhook_dedup (agent_id);"
+    )
+    op.execute(
+        "CREATE INDEX task_dedup_pending_attempts_idx "
+        "ON task_dedup (attempt_count) WHERE status = 'pending';"
+    )
+    # Cost-attribution dashboards filter ``token_usage`` by
+    # ``(timestamp, source)``; index the same shape so they stay cheap.
+    op.execute(
+        "CREATE INDEX token_usage_source_ts_idx "
+        "ON token_usage (source, timestamp);"
+    )
+    # Partial index — only rows with a stamped request_id participate
+    # in the DISTINCT count. NULL rows fall through to the COUNT(*)
+    # branch in the repository query.
+    op.execute(
+        "CREATE INDEX token_usage_request_id_idx "
+        "ON token_usage (request_id) "
+        "WHERE request_id IS NOT NULL;"
+    )
+
+    op.execute(
+        "CREATE TRIGGER tool_call_attempts_set_updated_at "
+        "BEFORE UPDATE ON tool_call_attempts "
+        "FOR EACH ROW WHEN (OLD.* IS DISTINCT FROM NEW.*) "
+        "EXECUTE FUNCTION set_updated_at();"
+    )
+
+
+def downgrade() -> None:
+    # CASCADE so the downgrade stays safe if later migrations FK into these.
+    for table in (
+        "ingest_chunk_progress",
+        "tool_call_attempts",
+        "webhook_dedup",
+        "task_dedup",
+    ):
+        op.execute(f"DROP TABLE IF EXISTS {table} CASCADE;")
+
+    op.execute(
+        "ALTER TABLE conversation_messages "
+        "DROP COLUMN IF EXISTS request_id, "
+        "DROP COLUMN IF EXISTS status;"
+    )
+    op.execute(
+        "ALTER TABLE pending_tool_state "
+        "DROP COLUMN IF EXISTS resumed_at, "
+        "DROP COLUMN IF EXISTS status;"
+    )
+    op.execute("DROP INDEX IF EXISTS token_usage_request_id_idx;")
+    op.execute("DROP INDEX IF EXISTS token_usage_source_ts_idx;")
+    op.execute(
+        "ALTER TABLE token_usage "
+        "DROP COLUMN IF EXISTS request_id, "
+        "DROP COLUMN IF EXISTS source;"
+    )
--- a/application/alembic/versions/0005_ingest_attempt_id.py
+++ b/application/alembic/versions/0005_ingest_attempt_id.py
@@ -0,0 +1,44 @@
+"""0005 ingest_chunk_progress.attempt_id — per-attempt resume scoping.
+
+Without this column, a completed checkpoint row poisoned every later
+embed call on the same ``source_id``: a sync after an upload finished
+read the upload's terminal ``last_index`` and either embedded zero
+chunks (if new ``total_docs <= last_index + 1``) or stacked new chunks
+on top of the old vectors (if ``total_docs > last_index + 1``).
+
+``attempt_id`` is stamped from ``self.request.id`` (Celery's stable
+task id, which survives ``acks_late`` retries of the same task but
+differs across separate task invocations). The repository's
+``init_progress`` upsert resets ``last_index`` / ``embedded_chunks``
+when the incoming ``attempt_id`` differs from the stored one — so a
+fresh sync starts from chunk 0 while a retry of the same task resumes
+from the last checkpointed chunk.
+
+Revision ID: 0005_ingest_attempt_id
+Revises: 0004_durability_foundation
+"""
+
+from typing import Sequence, Union
+
+from alembic import op
+
+
+revision: str = "0005_ingest_attempt_id"
+down_revision: Union[str, None] = "0004_durability_foundation"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    op.execute(
+        """
+        ALTER TABLE ingest_chunk_progress
+            ADD COLUMN attempt_id TEXT;
+        """
+    )
+
+
+def downgrade() -> None:
+    op.execute(
+        "ALTER TABLE ingest_chunk_progress DROP COLUMN IF EXISTS attempt_id;"
+    )
--- a/application/alembic/versions/0006_idempotency_lease.py
+++ b/application/alembic/versions/0006_idempotency_lease.py
@@ -0,0 +1,57 @@
+"""0006 task_dedup lease columns — running-lease for in-flight tasks.
+
+Without these, ``with_idempotency`` only short-circuits *completed*
+rows. A late-ack redelivery (Redis ``visibility_timeout`` exceeded by a
+long ingest, or a hung-but-alive worker) hands the same message to a
+second worker; ``_claim_or_bump`` only bumped the attempt counter and
+both workers ran the task body in parallel — duplicate vector writes,
+duplicate token spend, duplicate webhook side effects.
+
+``lease_owner_id`` + ``lease_expires_at`` turn that into an atomic
+compare-and-swap. The wrapper claims a lease at entry, refreshes it via
+a 30 s heartbeat thread, and finalises (which makes the lease moot via
+``status='completed'``). A second worker hitting the same key sees a
+fresh lease and ``self.retry(countdown=LEASE_TTL)``s instead of running.
+A crashed worker's lease expires after ``LEASE_TTL`` seconds and the
+next retry can claim it.
+
+Revision ID: 0006_idempotency_lease
+Revises: 0005_ingest_attempt_id
+"""
+
+from typing import Sequence, Union
+
+from alembic import op
+
+
+revision: str = "0006_idempotency_lease"
+down_revision: Union[str, None] = "0005_ingest_attempt_id"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    op.execute(
+        """
+        ALTER TABLE task_dedup
+            ADD COLUMN lease_owner_id   TEXT,
+            ADD COLUMN lease_expires_at TIMESTAMPTZ;
+        """
+    )
+    # Reconciler's stuck-pending sweep filters by
+    # ``(status='pending', lease_expires_at < now() - 60s, attempt_count >= 5)``.
+    # Partial index keeps the scan small even under heavy task throughput.
+    op.execute(
+        "CREATE INDEX task_dedup_pending_lease_idx "
+        "ON task_dedup (lease_expires_at) "
+        "WHERE status = 'pending';"
+    )
+
+
+def downgrade() -> None:
+    op.execute("DROP INDEX IF EXISTS task_dedup_pending_lease_idx;")
+    op.execute(
+        "ALTER TABLE task_dedup "
+        "DROP COLUMN IF EXISTS lease_expires_at, "
+        "DROP COLUMN IF EXISTS lease_owner_id;"
+    )
--- a/application/api/answer/routes/answer.py
+++ b/application/api/answer/routes/answer.py
@@ -102,6 +102,8 @@ class AnswerResource(Resource, BaseAnswerResource):
                        "tools_dict": tools_dict,
                        "pending_tool_calls": pending_tool_calls,
                        "tool_actions": tool_actions,
+                        "reserved_message_id": processor.reserved_message_id,
+                        "request_id": processor.request_id,
                    },
                )
            else:
--- a/application/api/answer/routes/base.py
+++ b/application/api/answer/routes/base.py
@@ -1,23 +1,31 @@
 import datetime
 import json
 import logging
+import time
+import uuid
 from typing import Any, Dict, Generator, List, Optional

 from flask import jsonify, make_response, Response
 from flask_restx import Namespace

 from application.api.answer.services.continuation_service import ContinuationService
-from application.api.answer.services.conversation_service import ConversationService
+from application.api.answer.services.conversation_service import (
+    ConversationService,
+    TERMINATED_RESPONSE_PLACEHOLDER,
+)
 from application.core.model_utils import (
    get_api_key_for_provider,
    get_default_model_id,
    get_provider_from_model_id,
 )

-from application.core.mongo_db import MongoDB
 from application.core.settings import settings
 from application.error import sanitize_api_error
 from application.llm.llm_creator import LLMCreator
+from application.storage.db.repositories.agents import AgentsRepository
+from application.storage.db.repositories.token_usage import TokenUsageRepository
+from application.storage.db.repositories.user_logs import UserLogsRepository
+from application.storage.db.session import db_readonly, db_session
 from application.utils import check_required_fields

 logger = logging.getLogger(__name__)
@@ -30,10 +38,6 @@ class BaseAnswerResource:
    """Shared base class for answer endpoints"""

    def __init__(self):
-        mongo = MongoDB.get_client()
-        db = mongo[settings.MONGO_DB_NAME]
-        self.db = db
-        self.user_logs_collection = db["user_logs"]
        self.default_model_id = get_default_model_id()
        self.conversation_service = ConversationService()

@@ -91,8 +95,8 @@ class BaseAnswerResource:
        api_key = agent_config.get("user_api_key")
        if not api_key:
            return None
-        agents_collection = self.db["agents"]
-        agent = agents_collection.find_one({"key": api_key})
+        with db_readonly() as conn:
+            agent = AgentsRepository(conn).find_by_key(api_key)

        if not agent:
            return make_response(
@@ -113,41 +117,32 @@ class BaseAnswerResource:
        )

        token_limit = int(
-            agent.get("token_limit", settings.DEFAULT_AGENT_LIMITS["token_limit"])
+            agent.get("token_limit") or settings.DEFAULT_AGENT_LIMITS["token_limit"]
        )
        request_limit = int(
-            agent.get("request_limit", settings.DEFAULT_AGENT_LIMITS["request_limit"])
+            agent.get("request_limit") or settings.DEFAULT_AGENT_LIMITS["request_limit"]
        )

-        token_usage_collection = self.db["token_usage"]
-
-        end_date = datetime.datetime.now()
+        end_date = datetime.datetime.now(datetime.timezone.utc)
        start_date = end_date - datetime.timedelta(hours=24)

-        match_query = {
-            "timestamp": {"$gte": start_date, "$lte": end_date},
-            "api_key": api_key,
-        }
-
-        if limited_token_mode:
-            token_pipeline = [
-                {"$match": match_query},
-                {
-                    "$group": {
-                        "_id": None,
-                        "total_tokens": {
-                            "$sum": {"$add": ["$prompt_tokens", "$generated_tokens"]}
-                        },
-                    }
-                },
-            ]
-            token_result = list(token_usage_collection.aggregate(token_pipeline))
-            daily_token_usage = token_result[0]["total_tokens"] if token_result else 0
+        if limited_token_mode or limited_request_mode:
+            with db_readonly() as conn:
+                token_repo = TokenUsageRepository(conn)
+                if limited_token_mode:
+                    daily_token_usage = token_repo.sum_tokens_in_range(
+                        start=start_date, end=end_date, api_key=api_key,
+                    )
+                else:
+                    daily_token_usage = 0
+                if limited_request_mode:
+                    daily_request_usage = token_repo.count_in_range(
+                        start=start_date, end=end_date, api_key=api_key,
+                    )
+                else:
+                    daily_request_usage = 0
        else:
            daily_token_usage = 0
-        if limited_request_mode:
-            daily_request_usage = token_usage_collection.count_documents(match_query)
-        else:
            daily_request_usage = 0
        if not limited_token_mode and not limited_request_mode:
            return None
@@ -187,6 +182,7 @@ class BaseAnswerResource:
        is_shared_usage: bool = False,
        shared_token: Optional[str] = None,
        model_id: Optional[str] = None,
+        model_user_id: Optional[str] = None,
        _continuation: Optional[Dict] = None,
    ) -> Generator[str, None, None]:
        """
@@ -212,13 +208,118 @@ class BaseAnswerResource:
        Yields:
            Server-sent event strings
        """
+        response_full, thought, source_log_docs, tool_calls = "", "", [], []
+        is_structured = False
+        schema_info = None
+        structured_chunks = []
+        query_metadata: Dict[str, Any] = {}
+        paused = False
+
+        # One id shared across the WAL row, primary LLM (token_usage
+        # attribution), the SSE event, and resumed continuations.
+        request_id = (
+            _continuation.get("request_id") if _continuation else None
+        ) or str(uuid.uuid4())
+
+        # Reserve the placeholder row before the LLM call so a crash
+        # mid-stream still leaves the question queryable. Continuations
+        # reuse the original placeholder.
+        reserved_message_id: Optional[str] = None
+        wal_eligible = should_save_conversation and not _continuation
+        if wal_eligible:
+            try:
+                reservation = self.conversation_service.save_user_question(
+                    conversation_id=conversation_id,
+                    question=question,
+                    decoded_token=decoded_token,
+                    attachment_ids=attachment_ids,
+                    api_key=user_api_key,
+                    agent_id=agent_id,
+                    is_shared_usage=is_shared_usage,
+                    shared_token=shared_token,
+                    model_id=model_id or self.default_model_id,
+                    request_id=request_id,
+                    index=index,
+                )
+                conversation_id = reservation["conversation_id"]
+                reserved_message_id = reservation["message_id"]
+            except Exception as e:
+                logger.error(
+                    f"Failed to reserve message row before stream: {e}",
+                    exc_info=True,
+                )
+        elif _continuation and _continuation.get("reserved_message_id"):
+            reserved_message_id = _continuation["reserved_message_id"]
+
+        primary_llm = getattr(agent, "llm", None)
+        if primary_llm is not None:
+            primary_llm._request_id = request_id
+
+        # Flipped to ``streaming`` on first chunk; reconciler uses this
+        # to tell "never started" from "in flight".
+        streaming_marked = False
+        # Heartbeat goes into ``metadata.last_heartbeat_at`` (not
+        # ``updated_at``, which reconciler-side writes share) and uses
+        # ``time.monotonic`` so a blocked event loop can't fake fresh.
+        STREAM_HEARTBEAT_INTERVAL = 60
+        last_heartbeat_at = time.monotonic()
+
+        def _mark_streaming_once() -> None:
+            nonlocal streaming_marked, last_heartbeat_at
+            if streaming_marked or not reserved_message_id:
+                return
+            try:
+                self.conversation_service.update_message_status(
+                    reserved_message_id, "streaming",
+                )
+            except Exception:
+                logger.exception(
+                    "update_message_status streaming failed for %s",
+                    reserved_message_id,
+                )
+            streaming_marked = True
+            last_heartbeat_at = time.monotonic()
+
+        def _heartbeat_streaming() -> None:
+            nonlocal last_heartbeat_at
+            if not reserved_message_id or not streaming_marked:
+                return
+            now_mono = time.monotonic()
+            if now_mono - last_heartbeat_at < STREAM_HEARTBEAT_INTERVAL:
+                return
+            try:
+                self.conversation_service.heartbeat_message(
+                    reserved_message_id,
+                )
+            except Exception:
+                logger.exception(
+                    "stream heartbeat update failed for %s",
+                    reserved_message_id,
+                )
+            last_heartbeat_at = now_mono
+
+        # Correlates tool_call_attempts rows with this message.
+        if reserved_message_id and getattr(agent, "tool_executor", None):
+            try:
+                agent.tool_executor.message_id = reserved_message_id
+            except Exception:
+                pass
+
        try:
-            response_full, thought, source_log_docs, tool_calls = "", "", [], []
-            is_structured = False
-            schema_info = None
-            structured_chunks = []
-            query_metadata = {}
-            paused = False
+            # Surface the placeholder id before any LLM tokens so a
+            # mid-handshake disconnect still has a row to tail-poll.
+            if reserved_message_id:
+                early_event = json.dumps(
+                    {
+                        "type": "message_id",
+                        "message_id": reserved_message_id,
+                        "conversation_id": (
+                            str(conversation_id) if conversation_id else None
+                        ),
+                        "request_id": request_id,
+                    }
+                )
+                yield f"data: {early_event}\n\n"

            if _continuation:
                gen_iter = agent.gen_continuation(
@@ -231,9 +332,13 @@ class BaseAnswerResource:
                gen_iter = agent.gen(query=question)

            for line in gen_iter:
+                # Cheap closure check that only hits the DB when the
+                # heartbeat interval has elapsed.
+                _heartbeat_streaming()
                if "metadata" in line:
                    query_metadata.update(line["metadata"])
                elif "answer" in line:
+                    _mark_streaming_once()
                    response_full += str(line["answer"])
                    if line.get("structured"):
                        is_structured = True
@@ -243,6 +348,7 @@ class BaseAnswerResource:
                        data = json.dumps({"type": "answer", "answer": line["answer"]})
                        yield f"data: {data}\n\n"
                elif "sources" in line:
+                    _mark_streaming_once()
                    truncated_sources = []
                    source_log_docs = line["sources"]
                    for source in line["sources"]:
@@ -295,12 +401,19 @@ class BaseAnswerResource:
            if paused:
                continuation = getattr(agent, "_pending_continuation", None)
                if continuation:
-                    # Ensure we have a conversation_id — create a partial
-                    # conversation if this is the first turn.
+                    # First-turn pause needs a conversation row to attach to.
                    if not conversation_id and should_save_conversation:
                        try:
                            provider = (
-                                get_provider_from_model_id(model_id)
+                                get_provider_from_model_id(
+                                    model_id,
+                                    user_id=model_user_id
+                                    or (
+                                        decoded_token.get("sub")
+                                        if decoded_token
+                                        else None
+                                    ),
+                                )
                                if model_id
                                else settings.LLM_PROVIDER
                            )
@@ -314,6 +427,7 @@ class BaseAnswerResource:
                                decoded_token=decoded_token,
                                model_id=model_id,
                                agent_id=agent_id,
+                                model_user_id=model_user_id,
                            )
                            conversation_id = (
                                self.conversation_service.save_conversation(
@@ -350,6 +464,9 @@ class BaseAnswerResource:
                                tool_schemas=getattr(agent, "tools", []),
                                agent_config={
                                    "model_id": model_id or self.default_model_id,
+                                    # BYOM scope; without it resume falls
+                                    # back to caller's layer.
+                                    "model_user_id": model_user_id,
                                    "llm_name": getattr(agent, "llm_name", settings.LLM_PROVIDER),
                                    "api_key": getattr(agent, "api_key", None),
                                    "user_api_key": user_api_key,
@@ -358,6 +475,11 @@ class BaseAnswerResource:
                                    "prompt": getattr(agent, "prompt", ""),
                                    "json_schema": getattr(agent, "json_schema", None),
                                    "retriever_config": getattr(agent, "retriever_config", None),
+                                    # Reused on resume so the same WAL row
+                                    # is finalised and request_id stays
+                                    # consistent across token_usage rows.
+                                    "reserved_message_id": reserved_message_id,
+                                    "request_id": request_id,
                                },
                                client_tools=getattr(
                                    agent.tool_executor, "client_tools", None
@@ -380,8 +502,13 @@ class BaseAnswerResource:
            if isNoneDoc:
                for doc in source_log_docs:
                    doc["source"] = "None"
+            # Model-owner scope so title-gen uses owner's BYOM key.
            provider = (
-                get_provider_from_model_id(model_id)
+                get_provider_from_model_id(
+                    model_id,
+                    user_id=model_user_id
+                    or (decoded_token.get("sub") if decoded_token else None),
+                )
                if model_id
                else settings.LLM_PROVIDER
            )
@@ -394,27 +521,51 @@ class BaseAnswerResource:
                decoded_token=decoded_token,
                model_id=model_id,
                agent_id=agent_id,
+                model_user_id=model_user_id,
            )
+            # Title-gen only; agent stream tokens live on ``agent.llm``.
+            llm._token_usage_source = "title"

            if should_save_conversation:
-                conversation_id = self.conversation_service.save_conversation(
-                    conversation_id,
-                    question,
-                    response_full,
-                    thought,
-                    source_log_docs,
-                    tool_calls,
-                    llm,
-                    model_id or self.default_model_id,
-                    decoded_token,
-                    index=index,
-                    api_key=user_api_key,
-                    agent_id=agent_id,
-                    is_shared_usage=is_shared_usage,
-                    shared_token=shared_token,
-                    attachment_ids=attachment_ids,
-                    metadata=query_metadata if query_metadata else None,
-                )
+                if reserved_message_id is not None:
+                    self.conversation_service.finalize_message(
+                        reserved_message_id,
+                        response_full,
+                        thought=thought,
+                        sources=source_log_docs,
+                        tool_calls=tool_calls,
+                        model_id=model_id or self.default_model_id,
+                        metadata=query_metadata if query_metadata else None,
+                        status="complete",
+                        title_inputs={
+                            "llm": llm,
+                            "question": question,
+                            "response": response_full,
+                            "model_id": model_id or self.default_model_id,
+                            "fallback_name": (
+                                question[:50] if question else "New Conversation"
+                            ),
+                        },
+                    )
+                else:
+                    conversation_id = self.conversation_service.save_conversation(
+                        conversation_id,
+                        question,
+                        response_full,
+                        thought,
+                        source_log_docs,
+                        tool_calls,
+                        llm,
+                        model_id or self.default_model_id,
+                        decoded_token,
+                        index=index,
+                        api_key=user_api_key,
+                        agent_id=agent_id,
+                        is_shared_usage=is_shared_usage,
+                        shared_token=shared_token,
+                        attachment_ids=attachment_ids,
+                        metadata=query_metadata if query_metadata else None,
+                    )
                # Persist compression metadata/summary if it exists and wasn't saved mid-execution
                compression_meta = getattr(agent, "compression_metadata", None)
                compression_saved = getattr(agent, "compression_saved", False)
@@ -437,6 +588,21 @@ class BaseAnswerResource:
                        )
            else:
                conversation_id = None
+            # Resume finished cleanly; drop the continuation row.
+            # Crash-paths leave it ``resuming`` for the janitor to revert.
+            if _continuation and conversation_id:
+                try:
+                    cont_service = ContinuationService()
+                    cont_service.delete_state(
+                        str(conversation_id),
+                        decoded_token.get("sub", "local"),
+                    )
+                except Exception as e:
+                    logger.error(
+                        f"Failed to delete continuation state on resume "
+                        f"completion: {e}",
+                        exc_info=True,
+                    )
            id_data = {"type": "id", "id": str(conversation_id)}
            data = json.dumps(id_data)
            yield f"data: {data}\n\n"
@@ -467,19 +633,18 @@ class BaseAnswerResource:
            for key, value in log_data.items():
                if isinstance(value, str) and len(value) > 10000:
                    log_data[key] = value[:10000]
-            self.user_logs_collection.insert_one(log_data)
-
-            from application.storage.db.dual_write import dual_write
-            from application.storage.db.repositories.user_logs import UserLogsRepository
-
-            dual_write(
-                UserLogsRepository,
-                lambda repo, d=log_data: repo.insert(
-                    user_id=d.get("user"),
-                    endpoint="stream_answer",
-                    data=d,
-                ),
-            )
+            try:
+                with db_session() as conn:
+                    UserLogsRepository(conn).insert(
+                        user_id=log_data.get("user"),
+                        endpoint="stream_answer",
+                        data=log_data,
+                    )
+            except Exception as log_err:
+                logger.error(
+                    f"Failed to persist stream_answer user log: {log_err}",
+                    exc_info=True,
+                )

            data = json.dumps({"type": "end"})
            yield f"data: {data}\n\n"
@@ -492,31 +657,73 @@ class BaseAnswerResource:
                    if isNoneDoc:
                        for doc in source_log_docs:
                            doc["source"] = "None"
+                    # Resolve under model-owner scope so shared-agent
+                    # title-gen uses owner BYOM, not deployment default.
+                    provider = (
+                        get_provider_from_model_id(
+                            model_id,
+                            user_id=model_user_id
+                            or (
+                                decoded_token.get("sub")
+                                if decoded_token
+                                else None
+                            ),
+                        )
+                        if model_id
+                        else settings.LLM_PROVIDER
+                    )
+                    sys_api_key = get_api_key_for_provider(
+                        provider or settings.LLM_PROVIDER
+                    )
                    llm = LLMCreator.create_llm(
-                        settings.LLM_PROVIDER,
-                        api_key=settings.API_KEY,
+                        provider or settings.LLM_PROVIDER,
+                        api_key=sys_api_key,
                        user_api_key=user_api_key,
                        decoded_token=decoded_token,
+                        model_id=model_id,
                        agent_id=agent_id,
+                        model_user_id=model_user_id,
                    )
-                    self.conversation_service.save_conversation(
-                        conversation_id,
-                        question,
-                        response_full,
-                        thought,
-                        source_log_docs,
-                        tool_calls,
-                        llm,
-                        model_id or self.default_model_id,
-                        decoded_token,
-                        index=index,
-                        api_key=user_api_key,
-                        agent_id=agent_id,
-                        is_shared_usage=is_shared_usage,
-                        shared_token=shared_token,
-                        attachment_ids=attachment_ids,
-                        metadata=query_metadata if query_metadata else None,
-                    )
+                    llm._token_usage_source = "title"
+                    if reserved_message_id is not None:
+                        self.conversation_service.finalize_message(
+                            reserved_message_id,
+                            response_full,
+                            thought=thought,
+                            sources=source_log_docs,
+                            tool_calls=tool_calls,
+                            model_id=model_id or self.default_model_id,
+                            metadata=query_metadata if query_metadata else None,
+                            status="complete",
+                            title_inputs={
+                                "llm": llm,
+                                "question": question,
+                                "response": response_full,
+                                "model_id": model_id or self.default_model_id,
+                                "fallback_name": (
+                                    question[:50] if question else "New Conversation"
+                                ),
+                            },
+                        )
+                    else:
+                        self.conversation_service.save_conversation(
+                            conversation_id,
+                            question,
+                            response_full,
+                            thought,
+                            source_log_docs,
+                            tool_calls,
+                            llm,
+                            model_id or self.default_model_id,
+                            decoded_token,
+                            index=index,
+                            api_key=user_api_key,
+                            agent_id=agent_id,
+                            is_shared_usage=is_shared_usage,
+                            shared_token=shared_token,
+                            attachment_ids=attachment_ids,
+                            metadata=query_metadata if query_metadata else None,
+                        )
                    compression_meta = getattr(agent, "compression_metadata", None)
                    compression_saved = getattr(agent, "compression_saved", False)
                    if conversation_id and compression_meta and not compression_saved:
@@ -543,6 +750,24 @@ class BaseAnswerResource:
            raise
        except Exception as e:
            logger.error(f"Error in stream: {str(e)}", exc_info=True)
+            if reserved_message_id is not None:
+                try:
+                    self.conversation_service.finalize_message(
+                        reserved_message_id,
+                        response_full or TERMINATED_RESPONSE_PLACEHOLDER,
+                        thought=thought,
+                        sources=source_log_docs,
+                        tool_calls=tool_calls,
+                        model_id=model_id or self.default_model_id,
+                        metadata=query_metadata if query_metadata else None,
+                        status="failed",
+                        error=e,
+                    )
+                except Exception as fin_err:
+                    logger.error(
+                        f"Failed to finalize errored message: {fin_err}",
+                        exc_info=True,
+                    )
            data = json.dumps(
                {
                    "type": "error",
--- a/application/api/answer/routes/search.py
+++ b/application/api/answer/routes/search.py
@@ -1,28 +1,21 @@
 import logging
-from typing import Any, Dict, List

 from flask import make_response, request
 from flask_restx import fields, Resource

-from bson.dbref import DBRef
-
 from application.api.answer.routes.base import answer_ns
-from application.core.mongo_db import MongoDB
-from application.core.settings import settings
-from application.vectorstore.vector_creator import VectorCreator
+from application.services.search_service import (
+    InvalidAPIKey,
+    SearchFailed,
+    search,
+)

 logger = logging.getLogger(__name__)


@answer_ns.route("/api/search")
 class SearchResource(Resource):
-    """Fast search endpoint for retrieving relevant documents"""
-
-    def __init__(self, *args, **kwargs):
-        super().__init__(*args, **kwargs)
-        mongo = MongoDB.get_client()
-        self.db = mongo[settings.MONGO_DB_NAME]
-        self.agents_collection = self.db["agents"]
+    """Fast search endpoint for retrieving relevant documents."""

    search_model = answer_ns.model(
        "SearchModel",
@@ -39,116 +32,10 @@ class SearchResource(Resource):
        },
    )

-    def _get_sources_from_api_key(self, api_key: str) -> List[str]:
-        """Get source IDs connected to the API key/agent.
-
-        """
-        agent_data = self.agents_collection.find_one({"key": api_key})
-        if not agent_data:
-            return []
-
-        source_ids = []
-
-        # Handle multiple sources (only if non-empty)
-        sources = agent_data.get("sources", [])
-        if sources and isinstance(sources, list) and len(sources) > 0:
-            for source_ref in sources:
-                # Skip "default" - it's a placeholder, not an actual vectorstore
-                if source_ref == "default":
-                    continue
-                elif isinstance(source_ref, DBRef):
-                    source_doc = self.db.dereference(source_ref)
-                    if source_doc:
-                        source_ids.append(str(source_doc["_id"]))
-
-        # Handle single source (legacy) - check if sources was empty or didn't yield results
-        if not source_ids:
-            source = agent_data.get("source")
-            if isinstance(source, DBRef):
-                source_doc = self.db.dereference(source)
-                if source_doc:
-                    source_ids.append(str(source_doc["_id"]))
-            # Skip "default" - it's a placeholder, not an actual vectorstore
-            elif source and source != "default":
-                source_ids.append(source)
-
-        return source_ids
-
-    def _search_vectorstores(
-        self, query: str, source_ids: List[str], chunks: int
-    ) -> List[Dict[str, Any]]:
-        """Search across vectorstores and return results"""
-        if not source_ids:
-            return []
-
-        results = []
-        chunks_per_source = max(1, chunks // len(source_ids))
-        seen_texts = set()
-
-        for source_id in source_ids:
-            if not source_id or not source_id.strip():
-                continue
-
-            try:
-                docsearch = VectorCreator.create_vectorstore(
-                    settings.VECTOR_STORE, source_id, settings.EMBEDDINGS_KEY
-                )
-                docs = docsearch.search(query, k=chunks_per_source * 2)
-
-                for doc in docs:
-                    if len(results) >= chunks:
-                        break
-
-                    if hasattr(doc, "page_content") and hasattr(doc, "metadata"):
-                        page_content = doc.page_content
-                        metadata = doc.metadata
-                    else:
-                        page_content = doc.get("text", doc.get("page_content", ""))
-                        metadata = doc.get("metadata", {})
-
-                    # Skip duplicates
-                    text_hash = hash(page_content[:200])
-                    if text_hash in seen_texts:
-                        continue
-                    seen_texts.add(text_hash)
-
-                    title = metadata.get(
-                        "title", metadata.get("post_title", "")
-                    )
-                    if not isinstance(title, str):
-                        title = str(title) if title else ""
-
-                    # Clean up title
-                    if title:
-                        title = title.split("/")[-1]
-                    else:
-                        # Use filename or first part of content as title
-                        title = metadata.get("filename", page_content[:50] + "...")
-
-                    source = metadata.get("source", source_id)
-
-                    results.append({
-                        "text": page_content,
-                        "title": title,
-                        "source": source,
-                    })
-
-                if len(results) >= chunks:
-                    break
-
-            except Exception as e:
-                logger.error(
-                    f"Error searching vectorstore {source_id}: {e}",
-                    exc_info=True,
-                )
-                continue
-
-        return results[:chunks]
-
    @answer_ns.expect(search_model)
    @answer_ns.doc(description="Search for relevant documents based on query")
    def post(self):
-        data = request.get_json()
+        data = request.get_json() or {}

        question = data.get("question")
        api_key = data.get("api_key")
@@ -156,31 +43,13 @@ class SearchResource(Resource):

        if not question:
            return make_response({"error": "question is required"}, 400)
-
        if not api_key:
            return make_response({"error": "api_key is required"}, 400)

-        # Validate API key
-        agent = self.agents_collection.find_one({"key": api_key})
-        if not agent:
-            return make_response({"error": "Invalid API key"}, 401)
-
        try:
-            # Get sources connected to this API key
-            source_ids = self._get_sources_from_api_key(api_key)
-
-            if not source_ids:
-                return make_response([], 200)
-
-            # Perform search
-            results = self._search_vectorstores(question, source_ids, chunks)
-
-            return make_response(results, 200)
-
-        except Exception as e:
-            logger.error(
-                f"/api/search - error: {str(e)}",
-                extra={"error": str(e)},
-                exc_info=True,
-            )
+            return make_response(search(api_key, question, chunks), 200)
+        except InvalidAPIKey:
+            return make_response({"error": "Invalid API key"}, 401)
+        except SearchFailed:
+            logger.exception("/api/search failed")
            return make_response({"error": "Search failed"}, 500)
--- a/application/api/answer/routes/stream.py
+++ b/application/api/answer/routes/stream.py
@@ -109,11 +109,14 @@ class StreamResource(Resource, BaseAnswerResource):
                        decoded_token=processor.decoded_token,
                        agent_id=processor.agent_id,
                        model_id=processor.model_id,
+                        model_user_id=processor.model_user_id,
                        _continuation={
                            "messages": messages,
                            "tools_dict": tools_dict,
                            "pending_tool_calls": pending_tool_calls,
                            "tool_actions": tool_actions,
+                            "reserved_message_id": processor.reserved_message_id,
+                            "request_id": processor.request_id,
                        },
                    ),
                    mimetype="text/event-stream",
@@ -145,6 +148,7 @@ class StreamResource(Resource, BaseAnswerResource):
                    is_shared_usage=processor.is_shared_usage,
                    shared_token=processor.shared_token,
                    model_id=processor.model_id,
+                    model_user_id=processor.model_user_id,
                ),
                mimetype="text/event-stream",
            )
--- a/application/api/answer/services/compression/orchestrator.py
+++ b/application/api/answer/services/compression/orchestrator.py
@@ -49,6 +49,7 @@ class CompressionOrchestrator:
        model_id: str,
        decoded_token: Dict[str, Any],
        current_query_tokens: int = 500,
+        model_user_id: Optional[str] = None,
    ) -> CompressionResult:
        """
        Check if compression is needed and perform it if so.
@@ -57,16 +58,18 @@ class CompressionOrchestrator:

        Args:
            conversation_id: Conversation ID
-            user_id: User ID
+            user_id: Caller's user id — used for conversation access checks
            model_id: Model being used for conversation
            decoded_token: User's decoded JWT token
            current_query_tokens: Estimated tokens for current query
+            model_user_id: BYOM-resolution scope (model owner); defaults
+                to ``user_id`` for built-in / caller-owned models.

        Returns:
            CompressionResult with summary and recent queries
        """
        try:
-            # Load conversation
+            # Conversation row is owned by the caller, not the model owner.
            conversation = self.conversation_service.get_conversation(
                conversation_id, user_id
            )
@@ -77,9 +80,14 @@ class CompressionOrchestrator:
                )
                return CompressionResult.failure("Conversation not found")

-            # Check if compression is needed
+            # Use model-owner scope so per-user BYOM context windows
+            # (e.g. 8k) compute the threshold against the right limit.
+            registry_user_id = model_user_id or user_id
            if not self.threshold_checker.should_compress(
-                conversation, model_id, current_query_tokens
+                conversation,
+                model_id,
+                current_query_tokens,
+                user_id=registry_user_id,
            ):
                # No compression needed, return full history
                queries = conversation.get("queries", [])
@@ -87,7 +95,12 @@ class CompressionOrchestrator:

            # Perform compression
            return self._perform_compression(
-                conversation_id, conversation, model_id, decoded_token
+                conversation_id,
+                conversation,
+                model_id,
+                decoded_token,
+                user_id=user_id,
+                model_user_id=model_user_id,
            )

        except Exception as e:
@@ -102,6 +115,8 @@ class CompressionOrchestrator:
        conversation: Dict[str, Any],
        model_id: str,
        decoded_token: Dict[str, Any],
+        user_id: Optional[str] = None,
+        model_user_id: Optional[str] = None,
    ) -> CompressionResult:
        """
        Perform the actual compression operation.
@@ -111,6 +126,8 @@ class CompressionOrchestrator:
            conversation: Conversation document
            model_id: Model ID for conversation
            decoded_token: User token
+            user_id: Caller's id (for conversation reload after compression)
+            model_user_id: BYOM-resolution scope (model owner)

        Returns:
            CompressionResult
@@ -123,11 +140,17 @@ class CompressionOrchestrator:
                else model_id
            )

-            # Get provider and API key for compression model
-            provider = get_provider_from_model_id(compression_model)
+            # Use model-owner scope so provider/api_key resolves to the
+            # owner's BYOM record (shared-agent dispatch).
+            caller_user_id = user_id
+            if caller_user_id is None and isinstance(decoded_token, dict):
+                caller_user_id = decoded_token.get("sub")
+            registry_user_id = model_user_id or caller_user_id
+            provider = get_provider_from_model_id(
+                compression_model, user_id=registry_user_id
+            )
            api_key = get_api_key_for_provider(provider)

-            # Create compression LLM
            compression_llm = LLMCreator.create_llm(
                provider,
                api_key=api_key,
@@ -135,7 +158,11 @@ class CompressionOrchestrator:
                decoded_token=decoded_token,
                model_id=compression_model,
                agent_id=conversation.get("agent_id"),
+                model_user_id=registry_user_id,
            )
+            # Side-channel LLM tag — distinguishes compression rows
+            # from primary stream rows for cost-attribution dashboards.
+            compression_llm._token_usage_source = "compression"

            # Create compression service with DB update capability
            compression_service = CompressionService(
@@ -167,9 +194,12 @@ class CompressionOrchestrator:
                f"saved {metadata.original_token_count - metadata.compressed_token_count} tokens"
            )

-            # Reload conversation with updated metadata
+            # Reload under caller (conversation is owned by caller).
+            reload_user_id = caller_user_id
+            if reload_user_id is None and isinstance(decoded_token, dict):
+                reload_user_id = decoded_token.get("sub")
            conversation = self.conversation_service.get_conversation(
-                conversation_id, user_id=decoded_token.get("sub")
+                conversation_id, user_id=reload_user_id
            )

            # Get compressed context
@@ -192,16 +222,21 @@ class CompressionOrchestrator:
        model_id: str,
        decoded_token: Dict[str, Any],
        current_conversation: Optional[Dict[str, Any]] = None,
+        model_user_id: Optional[str] = None,
    ) -> CompressionResult:
        """
        Perform compression during tool execution.

        Args:
            conversation_id: Conversation ID
-            user_id: User ID
+            user_id: Caller's user id — used for conversation access checks
            model_id: Model ID
            decoded_token: User token
            current_conversation: Pre-loaded conversation (optional)
+            model_user_id: BYOM-resolution scope (model owner). For
+                shared-agent dispatch this is the agent owner; defaults
+                to ``user_id`` so built-in / caller-owned models are
+                unaffected.

        Returns:
            CompressionResult
@@ -223,7 +258,12 @@ class CompressionOrchestrator:

            # Perform compression
            return self._perform_compression(
-                conversation_id, conversation, model_id, decoded_token
+                conversation_id,
+                conversation,
+                model_id,
+                decoded_token,
+                user_id=user_id,
+                model_user_id=model_user_id,
            )

        except Exception as e:
--- a/application/api/answer/services/compression/service.py
+++ b/application/api/answer/services/compression/service.py
@@ -106,8 +106,13 @@ class CompressionService:
                f"using model {self.model_id}"
            )

+            # See note in conversation_service.py: ``self.model_id`` is
+            # the registry id (UUID for BYOM); the LLM's own model_id is
+            # what the provider's API actually expects.
            response = self.llm.gen(
-                model=self.model_id, messages=messages, max_tokens=4000
+                model=getattr(self.llm, "model_id", None) or self.model_id,
+                messages=messages,
+                max_tokens=4000,
            )

            # Extract summary from response
--- a/application/api/answer/services/compression/threshold_checker.py
+++ b/application/api/answer/services/compression/threshold_checker.py
@@ -30,6 +30,7 @@ class CompressionThresholdChecker:
        conversation: Dict[str, Any],
        model_id: str,
        current_query_tokens: int = 500,
+        user_id: str | None = None,
    ) -> bool:
        """
        Determine if compression is needed.
@@ -38,6 +39,8 @@ class CompressionThresholdChecker:
            conversation: Full conversation document
            model_id: Target model for this request
            current_query_tokens: Estimated tokens for current query
+            user_id: Owner — needed so per-user BYOM custom-model UUIDs
+                resolve when looking up the context window.

        Returns:
            True if tokens >= threshold% of context window
@@ -48,7 +51,7 @@ class CompressionThresholdChecker:
            total_tokens += current_query_tokens

            # Get context window limit for model
-            context_limit = get_token_limit(model_id)
+            context_limit = get_token_limit(model_id, user_id=user_id)

            # Calculate threshold
            threshold = int(context_limit * self.threshold_percentage)
@@ -73,20 +76,24 @@ class CompressionThresholdChecker:
            logger.error(f"Error checking compression need: {str(e)}", exc_info=True)
            return False

-    def check_message_tokens(self, messages: list, model_id: str) -> bool:
+    def check_message_tokens(
+        self, messages: list, model_id: str, user_id: str | None = None
+    ) -> bool:
        """
        Check if message list exceeds threshold.

        Args:
            messages: List of message dicts
            model_id: Target model
+            user_id: Owner — needed so per-user BYOM custom-model UUIDs
+                resolve when looking up the context window.

        Returns:
            True if at or above threshold
        """
        try:
            current_tokens = TokenCounter.count_message_tokens(messages)
-            context_limit = get_token_limit(model_id)
+            context_limit = get_token_limit(model_id, user_id=user_id)
            threshold = int(context_limit * self.threshold_percentage)

            if current_tokens >= threshold:
--- a/application/api/answer/services/compression/token_counter.py
+++ b/application/api/answer/services/compression/token_counter.py
@@ -12,6 +12,12 @@ logger = logging.getLogger(__name__)
 class TokenCounter:
    """Centralized token counting for conversations and messages."""

+    # Per-image token estimate. Provider tokenizers vary widely
+    # (Gemini ~258, GPT-4o 85-1500, Claude ~1500) and the actual cost
+    # depends on resolution/detail we can't see here. Errs slightly high
+    # so the threshold check stays conservative.
+    _IMAGE_PART_TOKEN_ESTIMATE = 1500
+
    @staticmethod
    def count_message_tokens(messages: List[Dict]) -> int:
        """
@@ -29,12 +35,36 @@ class TokenCounter:
            if isinstance(content, str):
                total_tokens += num_tokens_from_string(content)
            elif isinstance(content, list):
-                # Handle structured content (tool calls, etc.)
+                # Handle structured content (tool calls, image parts, etc.)
                for item in content:
                    if isinstance(item, dict):
-                        total_tokens += num_tokens_from_string(str(item))
+                        total_tokens += TokenCounter._count_content_part(item)
        return total_tokens

+    @staticmethod
+    def _count_content_part(item: Dict) -> int:
+        # Image/file attachments are billed by the provider per image,
+        # not proportional to the inline bytes/base64 string.
+        # ``str(item)`` on a 1MB image inflates the count by ~10000x,
+        # which trips spurious compression and overflows downstream
+        # input limits.
+        item_type = item.get("type")
+
+        if "files" in item:
+            files = item.get("files")
+            count = len(files) if isinstance(files, list) and files else 1
+            return TokenCounter._IMAGE_PART_TOKEN_ESTIMATE * count
+
+        if "image_url" in item or item_type in {
+            "image",
+            "image_url",
+            "input_image",
+            "file",
+        }:
+            return TokenCounter._IMAGE_PART_TOKEN_ESTIMATE
+
+        return num_tokens_from_string(str(item))
+
    @staticmethod
    def count_query_tokens(
        queries: List[Dict[str, Any]], include_tool_calls: bool = True
--- a/application/api/answer/services/continuation_service.py
+++ b/application/api/answer/services/continuation_service.py
@@ -1,63 +1,39 @@
 """Service for saving and restoring tool-call continuation state.

 When a stream pauses (tool needs approval or client-side execution),
-the full execution state is persisted to MongoDB so the client can
+the full execution state is persisted to Postgres so the client can
 resume later by sending tool_actions.
 """

-import datetime
 import logging
 from typing import Any, Dict, List, Optional

-from bson import ObjectId
-
-from application.core.mongo_db import MongoDB
-from application.core.settings import settings
-from application.storage.db.dual_write import dual_write
+from application.storage.db.base_repository import looks_like_uuid
 from application.storage.db.repositories.conversations import ConversationsRepository
 from application.storage.db.repositories.pending_tool_state import (
    PendingToolStateRepository,
 )
+from application.storage.db.serialization import coerce_pg_native as _make_serializable
+from application.storage.db.session import db_readonly, db_session

 logger = logging.getLogger(__name__)

 # TTL for pending states — auto-cleaned after this period
 PENDING_STATE_TTL_SECONDS = 30 * 60  # 30 minutes

-
-def _make_serializable(obj: Any) -> Any:
-    """Recursively convert MongoDB ObjectIds and other non-JSON types."""
-    if isinstance(obj, ObjectId):
-        return str(obj)
-    if isinstance(obj, dict):
-        return {str(k): _make_serializable(v) for k, v in obj.items()}
-    if isinstance(obj, list):
-        return [_make_serializable(v) for v in obj]
-    if isinstance(obj, bytes):
-        return obj.decode("utf-8", errors="replace")
-    return obj
+# Re-export so the existing tests at tests/api/answer/services/test_continuation_service_pg.py
+# can keep importing ``_make_serializable`` from here.
+__all__ = ["_make_serializable", "ContinuationService", "PENDING_STATE_TTL_SECONDS"]


 class ContinuationService:
-    """Manages pending tool-call state in MongoDB."""
+    """Manages pending tool-call state in Postgres."""

    def __init__(self):
-        mongo = MongoDB.get_client()
-        db = mongo[settings.MONGO_DB_NAME]
-        self.collection = db["pending_tool_state"]
-        self._ensure_indexes()
-
-    def _ensure_indexes(self):
-        try:
-            self.collection.create_index(
-                "expires_at", expireAfterSeconds=0
-            )
-            self.collection.create_index(
-                [("conversation_id", 1), ("user", 1)], unique=True
-            )
-        except Exception:
-            # Indexes may already exist or mongomock doesn't support TTL
-            pass
+        # No-op constructor retained for call-site compatibility. State
+        # lives in Postgres now; each operation opens its own short-lived
+        # session rather than holding a connection on the service.
+        pass

    def save_state(
        self,
@@ -72,6 +48,10 @@ class ContinuationService:
    ) -> str:
        """Save execution state for later continuation.

+        ``conversation_id`` may be a Postgres UUID or the legacy Mongo
+        ``ObjectId`` string — the latter is resolved via
+        ``conversations.legacy_mongo_id`` to find the matching row.
+
        Args:
            conversation_id: The conversation this state belongs to.
            user: Owner user ID.
@@ -83,45 +63,26 @@ class ContinuationService:
            client_tools: Client-provided tool schemas for client-side execution.

        Returns:
-            The string ID of the saved state document.
+            The string ID (conversation_id as provided) of the saved state.
        """
-        now = datetime.datetime.now(datetime.timezone.utc)
-        expires_at = now + datetime.timedelta(seconds=PENDING_STATE_TTL_SECONDS)
-
-        doc = {
-            "conversation_id": conversation_id,
-            "user": user,
-            "messages": _make_serializable(messages),
-            "pending_tool_calls": _make_serializable(pending_tool_calls),
-            "tools_dict": _make_serializable(tools_dict),
-            "tool_schemas": _make_serializable(tool_schemas),
-            "agent_config": _make_serializable(agent_config),
-            "client_tools": _make_serializable(client_tools) if client_tools else None,
-            "created_at": now,
-            "expires_at": expires_at,
-        }
-
-        # Upsert — only one pending state per conversation per user
-        result = self.collection.replace_one(
-            {"conversation_id": conversation_id, "user": user},
-            doc,
-            upsert=True,
-        )
-        state_id = str(result.upserted_id) if result.upserted_id else conversation_id
-        logger.info(
-            f"Saved continuation state for conversation {conversation_id} "
-            f"with {len(pending_tool_calls)} pending tool call(s)"
-        )
-
-        # Dual-write to Postgres — upsert against the same Mongo conversation
-        # by resolving its UUID via conversations.legacy_mongo_id.
-        def _pg_save(_: PendingToolStateRepository) -> None:
-            conn = _._conn  # reuse the existing transaction
+        with db_session() as conn:
            conv = ConversationsRepository(conn).get_by_legacy_id(conversation_id)
-            if conv is None:
-                return
-            _.save_state(
-                conv["id"],
+            if conv is not None:
+                pg_conv_id = conv["id"]
+            elif looks_like_uuid(conversation_id):
+                pg_conv_id = conversation_id
+            else:
+                # Unresolvable legacy ObjectId — downstream ``CAST AS uuid``
+                # would raise and poison the save. Surface the mismatch so
+                # the caller can decide (the stream loop in routes/base.py
+                # already wraps this in try/except).
+                raise ValueError(
+                    f"Cannot save continuation state: conversation_id "
+                    f"{conversation_id!r} is neither a PG UUID nor a "
+                    f"backfilled legacy Mongo id."
+                )
+            PendingToolStateRepository(conn).save_state(
+                pg_conv_id,
                user,
                messages=_make_serializable(messages),
                pending_tool_calls=_make_serializable(pending_tool_calls),
@@ -131,8 +92,11 @@ class ContinuationService:
                client_tools=_make_serializable(client_tools) if client_tools else None,
            )

-        dual_write(PendingToolStateRepository, _pg_save)
-        return state_id
+        logger.info(
+            f"Saved continuation state for conversation {conversation_id} "
+            f"with {len(pending_tool_calls)} pending tool call(s)"
+        )
+        return conversation_id

    def load_state(
        self, conversation_id: str, user: str
@@ -142,34 +106,58 @@ class ContinuationService:
        Returns:
            The state dict, or None if no pending state exists.
        """
-        doc = self.collection.find_one(
-            {"conversation_id": conversation_id, "user": user}
-        )
+        with db_readonly() as conn:
+            conv = ConversationsRepository(conn).get_by_legacy_id(conversation_id)
+            if conv is not None:
+                pg_conv_id = conv["id"]
+            elif looks_like_uuid(conversation_id):
+                pg_conv_id = conversation_id
+            else:
+                # Unresolvable legacy ObjectId → no state can exist for it.
+                return None
+            doc = PendingToolStateRepository(conn).load_state(pg_conv_id, user)
        if not doc:
            return None
-        doc["_id"] = str(doc["_id"])
        return doc

    def delete_state(self, conversation_id: str, user: str) -> bool:
        """Delete pending state after successful resumption.

        Returns:
-            True if a document was deleted.
+            True if a row was deleted.
        """
-        result = self.collection.delete_one(
-            {"conversation_id": conversation_id, "user": user}
-        )
-        if result.deleted_count:
+        with db_session() as conn:
+            conv = ConversationsRepository(conn).get_by_legacy_id(conversation_id)
+            if conv is not None:
+                pg_conv_id = conv["id"]
+            elif looks_like_uuid(conversation_id):
+                pg_conv_id = conversation_id
+            else:
+                # Unresolvable legacy ObjectId → nothing to delete.
+                return False
+            deleted = PendingToolStateRepository(conn).delete_state(pg_conv_id, user)
+        if deleted:
            logger.info(
                f"Deleted continuation state for conversation {conversation_id}"
            )
+        return deleted

-        # Dual-write to Postgres — delete the same row.
-        def _pg_delete(repo: PendingToolStateRepository) -> None:
-            conv = ConversationsRepository(repo._conn).get_by_legacy_id(conversation_id)
-            if conv is None:
-                return
-            repo.delete_state(conv["id"], user)
-
-        dual_write(PendingToolStateRepository, _pg_delete)
-        return result.deleted_count > 0
+    def mark_resuming(self, conversation_id: str, user: str) -> bool:
+        """Flip the pending row to ``resuming`` so a crashed resume can be retried."""
+        with db_session() as conn:
+            conv = ConversationsRepository(conn).get_by_legacy_id(conversation_id)
+            if conv is not None:
+                pg_conv_id = conv["id"]
+            elif looks_like_uuid(conversation_id):
+                pg_conv_id = conversation_id
+            else:
+                return False
+            flipped = PendingToolStateRepository(conn).mark_resuming(
+                pg_conv_id, user
+            )
+        if flipped:
+            logger.info(
+                f"Marked continuation state as resuming for conversation "
+                f"{conversation_id}"
+            )
+        return flipped
--- a/application/api/answer/services/conversation_service.py
+++ b/application/api/answer/services/conversation_service.py
@@ -1,46 +1,58 @@
+"""Conversation persistence service backed by Postgres.
+
+Handles create / append / update / compression for conversations during
+the answer-streaming path. Connections are opened per-operation rather
+than held for the duration of a stream.
+"""
+
 import logging
+import uuid
 from datetime import datetime, timezone
 from typing import Any, Dict, List, Optional

-from application.core.mongo_db import MongoDB
+from sqlalchemy import text as sql_text

 from application.core.settings import settings
-from application.storage.db.dual_write import dual_write
+from application.storage.db.base_repository import looks_like_uuid
+from application.storage.db.repositories.agents import AgentsRepository
 from application.storage.db.repositories.conversations import ConversationsRepository
-from bson import ObjectId
+from application.storage.db.session import db_readonly, db_session


 logger = logging.getLogger(__name__)


-class ConversationService:
-    def __init__(self):
-        mongo = MongoDB.get_client()
-        db = mongo[settings.MONGO_DB_NAME]
-        self.conversations_collection = db["conversations"]
-        self.agents_collection = db["agents"]
+# Shown to the user if the worker dies mid-stream and the response is never finalised.
+TERMINATED_RESPONSE_PLACEHOLDER = (
+    "Response was terminated prior to completion, try regenerating."
+)

+
+class ConversationService:
    def get_conversation(
        self, conversation_id: str, user_id: str
    ) -> Optional[Dict[str, Any]]:
-        """Retrieve a conversation with proper access control"""
+        """Retrieve a conversation with owner-or-shared access control.
+
+        Returns a dict in the legacy Mongo shape — ``queries`` is a list
+        of message dicts (prompt/response/...) — for compatibility with
+        the streaming pipeline that consumes this shape.
+        """
        if not conversation_id or not user_id:
            return None
        try:
-            conversation = self.conversations_collection.find_one(
-                {
-                    "_id": ObjectId(conversation_id),
-                    "$or": [{"user": user_id}, {"shared_with": user_id}],
-                }
-            )
-
-            if not conversation:
-                logger.warning(
-                    f"Conversation not found or unauthorized - ID: {conversation_id}, User: {user_id}"
-                )
-                return None
-            conversation["_id"] = str(conversation["_id"])
-            return conversation
+            with db_readonly() as conn:
+                repo = ConversationsRepository(conn)
+                conv = repo.get_any(conversation_id, user_id)
+                if conv is None:
+                    logger.warning(
+                        f"Conversation not found or unauthorized - ID: {conversation_id}, User: {user_id}"
+                    )
+                    return None
+                messages = repo.get_messages(str(conv["id"]))
+            conv["queries"] = messages
+            conv["_id"] = str(conv["id"])
+            return conv
        except Exception as e:
            logger.error(f"Error fetching conversation: {str(e)}", exc_info=True)
            return None
@@ -64,7 +76,11 @@ class ConversationService:
        attachment_ids: Optional[List[str]] = None,
        metadata: Optional[Dict[str, Any]] = None,
    ) -> str:
-        """Save or update a conversation in the database"""
+        """Save or update a conversation in Postgres.
+
+        Returns the string conversation id (PG UUID as string, or the
+        caller-provided id if it was already a UUID).
+        """
        if decoded_token is None:
            raise ValueError("Invalid or missing authentication token")
        user_id = decoded_token.get("sub")
@@ -72,117 +88,47 @@ class ConversationService:
            raise ValueError("User ID not found in token")
        current_time = datetime.now(timezone.utc)

-        # clean up in sources array such that we save max 1k characters for text part
+        # Trim huge inline source text to a reasonable max before persist.
        for source in sources:
            if "text" in source and isinstance(source["text"], str):
                source["text"] = source["text"][:1000]

+        message_payload = {
+            "prompt": question,
+            "response": response,
+            "thought": thought,
+            "sources": sources,
+            "tool_calls": tool_calls,
+            "attachments": attachment_ids,
+            "model_id": model_id,
+            "timestamp": current_time,
+        }
+        if metadata:
+            message_payload["metadata"] = metadata
+
        if conversation_id is not None and index is not None:
-            # Update existing conversation with new query
-
-            result = self.conversations_collection.update_one(
-                {
-                    "_id": ObjectId(conversation_id),
-                    "user": user_id,
-                    f"queries.{index}": {"$exists": True},
-                },
-                {
-                    "$set": {
-                        f"queries.{index}.prompt": question,
-                        f"queries.{index}.response": response,
-                        f"queries.{index}.thought": thought,
-                        f"queries.{index}.sources": sources,
-                        f"queries.{index}.tool_calls": tool_calls,
-                        f"queries.{index}.timestamp": current_time,
-                        f"queries.{index}.attachments": attachment_ids,
-                        f"queries.{index}.model_id": model_id,
-                        **(
-                            {f"queries.{index}.metadata": metadata}
-                            if metadata
-                            else {}
-                        ),
-                    }
-                },
-            )
-
-            if result.matched_count == 0:
-                raise ValueError("Conversation not found or unauthorized")
-            self.conversations_collection.update_one(
-                {
-                    "_id": ObjectId(conversation_id),
-                    "user": user_id,
-                    f"queries.{index}": {"$exists": True},
-                },
-                {"$push": {"queries": {"$each": [], "$slice": index + 1}}},
-            )
-            # Dual-write to Postgres: update the message at :index and
-            # truncate anything after it, mirroring Mongo's $set+$slice.
-            def _pg_update_at_index(repo: ConversationsRepository) -> None:
-                conv = repo.get_by_legacy_id(conversation_id)
+            with db_session() as conn:
+                repo = ConversationsRepository(conn)
+                conv = repo.get_any(conversation_id, user_id)
                if conv is None:
-                    return
-                repo.update_message_at(conv["id"], index, {
-                    "prompt": question,
-                    "response": response,
-                    "thought": thought,
-                    "sources": sources,
-                    "tool_calls": tool_calls,
-                    "attachments": attachment_ids,
-                    "model_id": model_id,
-                    "timestamp": current_time,
-                    **({"metadata": metadata} if metadata else {}),
-                })
-                repo.truncate_after(conv["id"], index)
-
-            dual_write(ConversationsRepository, _pg_update_at_index)
+                    raise ValueError("Conversation not found or unauthorized")
+                conv_pg_id = str(conv["id"])
+                repo.update_message_at(conv_pg_id, index, message_payload)
+                repo.truncate_after(conv_pg_id, index)
            return conversation_id
        elif conversation_id:
-            # Append new message to existing conversation
-
-            result = self.conversations_collection.update_one(
-                {"_id": ObjectId(conversation_id), "user": user_id},
-                {
-                    "$push": {
-                        "queries": {
-                            "prompt": question,
-                            "response": response,
-                            "thought": thought,
-                            "sources": sources,
-                            "tool_calls": tool_calls,
-                            "timestamp": current_time,
-                            "attachments": attachment_ids,
-                            "model_id": model_id,
-                            **({"metadata": metadata} if metadata else {}),
-                        }
-                    }
-                },
-            )
-
-            if result.matched_count == 0:
-                raise ValueError("Conversation not found or unauthorized")
-
-            # Dual-write to Postgres: append the same message.
-            def _pg_append(repo: ConversationsRepository) -> None:
-                conv = repo.get_by_legacy_id(conversation_id)
+            with db_session() as conn:
+                repo = ConversationsRepository(conn)
+                conv = repo.get_any(conversation_id, user_id)
                if conv is None:
-                    return
-                repo.append_message(conv["id"], {
-                    "prompt": question,
-                    "response": response,
-                    "thought": thought,
-                    "sources": sources,
-                    "tool_calls": tool_calls,
-                    "attachments": attachment_ids,
-                    "model_id": model_id,
-                    "timestamp": current_time,
-                    "metadata": metadata or {},
-                })
-
-            dual_write(ConversationsRepository, _pg_append)
+                    raise ValueError("Conversation not found or unauthorized")
+                conv_pg_id = str(conv["id"])
+                # append_message expects 'metadata' key either way; normalise.
+                append_payload = dict(message_payload)
+                append_payload.setdefault("metadata", metadata or {})
+                repo.append_message(conv_pg_id, append_payload)
            return conversation_id
        else:
-            # Create new conversation
-
            messages_summary = [
                {
                    "role": "system",
@@ -197,125 +143,310 @@ class ConversationService:
                },
            ]

+            # ``model_id`` here is the registry id (a UUID for BYOM
+            # records). The LLM's own ``model_id`` is the upstream name
+            # LLMCreator resolved at construction time — that's what
+            # the provider's API expects. Built-ins are unaffected.
            completion = llm.gen(
-                model=model_id, messages=messages_summary, max_tokens=500
+                model=getattr(llm, "model_id", None) or model_id,
+                messages=messages_summary,
+                max_tokens=500,
            )

            if not completion or not completion.strip():
                completion = question[:50] if question else "New Conversation"

-            query_doc = {
-                "prompt": question,
-                "response": response,
-                "thought": thought,
-                "sources": sources,
-                "tool_calls": tool_calls,
-                "timestamp": current_time,
-                "attachments": attachment_ids,
-                "model_id": model_id,
-            }
-            if metadata:
-                query_doc["metadata"] = metadata
-
-            conversation_data = {
-                "user": user_id,
-                "date": current_time,
-                "name": completion,
-                "queries": [query_doc],
-            }
-
+            resolved_api_key: Optional[str] = None
+            resolved_agent_id: Optional[str] = None
            if api_key:
-                if agent_id:
-                    conversation_data["agent_id"] = agent_id
-                    if is_shared_usage:
-                        conversation_data["is_shared_usage"] = is_shared_usage
-                        conversation_data["shared_token"] = shared_token
-                agent = self.agents_collection.find_one({"key": api_key})
+                with db_readonly() as conn:
+                    agent = AgentsRepository(conn).find_by_key(api_key)
                if agent:
-                    conversation_data["api_key"] = agent["key"]
-            result = self.conversations_collection.insert_one(conversation_data)
-            inserted_id = str(result.inserted_id)
+                    resolved_api_key = agent.get("key")
+                if agent_id:
+                    resolved_agent_id = agent_id

-            # Dual-write to Postgres: create the conversation row with
-            # legacy_mongo_id and append the first message.
-            def _pg_create(repo: ConversationsRepository) -> None:
+            with db_session() as conn:
+                repo = ConversationsRepository(conn)
                conv = repo.create(
                    user_id,
                    completion,
-                    agent_id=conversation_data.get("agent_id"),
-                    api_key=conversation_data.get("api_key"),
-                    is_shared_usage=conversation_data.get("is_shared_usage", False),
-                    shared_token=conversation_data.get("shared_token"),
-                    legacy_mongo_id=inserted_id,
+                    agent_id=resolved_agent_id,
+                    api_key=resolved_api_key,
+                    is_shared_usage=bool(resolved_agent_id and is_shared_usage),
+                    shared_token=(
+                        shared_token
+                        if (resolved_agent_id and is_shared_usage)
+                        else None
+                    ),
                )
-                repo.append_message(conv["id"], {
-                    "prompt": question,
-                    "response": response,
-                    "thought": thought,
-                    "sources": sources,
-                    "tool_calls": tool_calls,
-                    "attachments": attachment_ids,
-                    "model_id": model_id,
-                    "timestamp": current_time,
-                    "metadata": metadata or {},
-                })
+                conv_pg_id = str(conv["id"])
+                append_payload = dict(message_payload)
+                append_payload.setdefault("metadata", metadata or {})
+                repo.append_message(conv_pg_id, append_payload)
+            return conv_pg_id

-            dual_write(ConversationsRepository, _pg_create)
-            return inserted_id
+    def save_user_question(
+        self,
+        conversation_id: Optional[str],
+        question: str,
+        decoded_token: Dict[str, Any],
+        *,
+        attachment_ids: Optional[List[str]] = None,
+        api_key: Optional[str] = None,
+        agent_id: Optional[str] = None,
+        is_shared_usage: bool = False,
+        shared_token: Optional[str] = None,
+        model_id: Optional[str] = None,
+        request_id: Optional[str] = None,
+        status: str = "pending",
+        index: Optional[int] = None,
+    ) -> Dict[str, str]:
+        """Reserve the placeholder message row before the LLM call.
+
+        ``index`` triggers regenerate semantics: messages at
+        ``position >= index`` are truncated so the new placeholder
+        lands at ``position = index`` rather than appending.
+
+        Returns ``{"conversation_id", "message_id", "request_id"}``.
+        """
+        if decoded_token is None:
+            raise ValueError("Invalid or missing authentication token")
+        user_id = decoded_token.get("sub")
+        if not user_id:
+            raise ValueError("User ID not found in token")
+
+        request_id = request_id or str(uuid.uuid4())
+
+        resolved_api_key: Optional[str] = None
+        resolved_agent_id: Optional[str] = None
+        if api_key and not conversation_id:
+            with db_readonly() as conn:
+                agent = AgentsRepository(conn).find_by_key(api_key)
+            if agent:
+                resolved_api_key = agent.get("key")
+            if agent_id:
+                resolved_agent_id = agent_id
+
+        with db_session() as conn:
+            repo = ConversationsRepository(conn)
+            if conversation_id:
+                conv = repo.get_any(conversation_id, user_id)
+                if conv is None:
+                    raise ValueError("Conversation not found or unauthorized")
+                conv_pg_id = str(conv["id"])
+                # Regenerate / edit-prior-question: drop the message at
+                # ``index`` and everything after it so the new
+                # ``reserve_message`` lands at ``position=index`` rather
+                # than appending at the end of the conversation.
+                if isinstance(index, int) and index >= 0:
+                    repo.truncate_after(conv_pg_id, keep_up_to=index - 1)
+            else:
+                fallback_name = (question[:50] if question else "New Conversation")
+                conv = repo.create(
+                    user_id,
+                    fallback_name,
+                    agent_id=resolved_agent_id,
+                    api_key=resolved_api_key,
+                    is_shared_usage=bool(resolved_agent_id and is_shared_usage),
+                    shared_token=(
+                        shared_token
+                        if (resolved_agent_id and is_shared_usage)
+                        else None
+                    ),
+                )
+                conv_pg_id = str(conv["id"])
+
+            row = repo.reserve_message(
+                conv_pg_id,
+                prompt=question,
+                placeholder_response=TERMINATED_RESPONSE_PLACEHOLDER,
+                request_id=request_id,
+                status=status,
+                attachments=attachment_ids,
+                model_id=model_id,
+            )
+            message_id = str(row["id"])
+
+        return {
+            "conversation_id": conv_pg_id,
+            "message_id": message_id,
+            "request_id": request_id,
+        }
+
+    def update_message_status(self, message_id: str, status: str) -> bool:
+        """Cheap status-only transition (e.g. ``pending → streaming``)."""
+        if not message_id:
+            return False
+        with db_session() as conn:
+            return ConversationsRepository(conn).update_message_status(
+                message_id, status,
+            )
+
+    def heartbeat_message(self, message_id: str) -> bool:
+        """Bump ``message_metadata.last_heartbeat_at`` so the reconciler's
+        staleness sweep counts the row as alive. No-ops on terminal rows.
+        """
+        if not message_id:
+            return False
+        with db_session() as conn:
+            return ConversationsRepository(conn).heartbeat_message(message_id)
+
+    def finalize_message(
+        self,
+        message_id: str,
+        response: str,
+        *,
+        thought: str = "",
+        sources: Optional[List[Dict[str, Any]]] = None,
+        tool_calls: Optional[List[Dict[str, Any]]] = None,
+        model_id: Optional[str] = None,
+        metadata: Optional[Dict[str, Any]] = None,
+        status: str = "complete",
+        error: Optional[BaseException] = None,
+        title_inputs: Optional[Dict[str, Any]] = None,
+    ) -> bool:
+        """Commit the response and tool_call confirms in one transaction."""
+        if not message_id:
+            return False
+        sources = sources or []
+        for source in sources:
+            if "text" in source and isinstance(source["text"], str):
+                source["text"] = source["text"][:1000]
+
+        merged_metadata: Dict[str, Any] = dict(metadata or {})
+        if status == "failed" and error is not None:
+            merged_metadata.setdefault(
+                "error", f"{type(error).__name__}: {str(error)}"
+            )
+
+        update_fields: Dict[str, Any] = {
+            "response": response,
+            "status": status,
+            "thought": thought,
+            "sources": sources,
+            "tool_calls": tool_calls or [],
+            "metadata": merged_metadata,
+        }
+        if model_id is not None:
+            update_fields["model_id"] = model_id
+
+        # Atomic message update + tool_call_attempts confirm; the
+        # ``only_if_non_terminal`` guard prevents a late stream from
+        # retracting a row the reconciler already escalated.
+        with db_session() as conn:
+            repo = ConversationsRepository(conn)
+            ok = repo.update_message_by_id(
+                message_id, update_fields,
+                only_if_non_terminal=True,
+            )
+            if not ok:
+                logger.warning(
+                    f"finalize_message: no row updated for message_id={message_id} "
+                    f"(possibly already terminal — reconciler may have escalated)"
+                )
+                return False
+            repo.confirm_executed_tool_calls(message_id)
+
+        # Outside the txn — title-gen is a multi-second LLM round trip.
+        if title_inputs and status == "complete":
+            try:
+                with db_session() as conn:
+                    self._maybe_generate_title(conn, message_id, title_inputs)
+            except Exception as e:
+                logger.error(
+                    f"finalize_message title generation failed: {e}",
+                    exc_info=True,
+                )
+        return True
+
+    def _maybe_generate_title(
+        self,
+        conn,
+        message_id: str,
+        title_inputs: Dict[str, Any],
+    ) -> None:
+        """Generate an LLM-summarised conversation name if one isn't set yet."""
+        llm = title_inputs.get("llm")
+        question = title_inputs.get("question") or ""
+        response = title_inputs.get("response") or ""
+        fallback_name = title_inputs.get("fallback_name") or question[:50]
+        if llm is None:
+            return
+
+        row = conn.execute(
+            sql_text(
+                "SELECT c.id, c.name FROM conversation_messages m "
+                "JOIN conversations c ON c.id = m.conversation_id "
+                "WHERE m.id = CAST(:mid AS uuid)"
+            ),
+            {"mid": message_id},
+        ).fetchone()
+        if row is None:
+            return
+        conv_id, current_name = str(row[0]), row[1]
+        if current_name and current_name != fallback_name:
+            return
+
+        messages_summary = [
+            {
+                "role": "system",
+                "content": "You are a helpful assistant that creates concise conversation titles. "
+                "Summarize conversations in 3 words or less using the same language as the user.",
+            },
+            {
+                "role": "user",
+                "content": "Summarise following conversation in no more than 3 words, "
+                "respond ONLY with the summary, use the same language as the "
+                "user query \n\nUser: " + question + "\n\n" + "AI: " + response,
+            },
+        ]
+        completion = llm.gen(
+            model=getattr(llm, "model_id", None) or title_inputs.get("model_id"),
+            messages=messages_summary,
+            max_tokens=500,
+        )
+        if not completion or not completion.strip():
+            completion = fallback_name or "New Conversation"
+        conn.execute(
+            sql_text(
+                "UPDATE conversations SET name = :name, updated_at = now() "
+                "WHERE id = CAST(:id AS uuid)"
+            ),
+            {"id": conv_id, "name": completion.strip()},
+        )

    def update_compression_metadata(
        self, conversation_id: str, compression_metadata: Dict[str, Any]
    ) -> None:
-        """
-        Update conversation with compression metadata.
+        """Persist compression flags and append a compression point.

-        Uses $push with $slice to keep only the most recent compression points,
-        preventing unbounded array growth. Since each compression incorporates
-        previous compressions, older points become redundant.
-
-        Args:
-            conversation_id: Conversation ID
-            compression_metadata: Compression point data
+        Mirrors the Mongo-era ``$set`` + ``$push $slice`` on
+        ``compression_metadata`` but goes through the PG repo API.
        """
        try:
-            self.conversations_collection.update_one(
-                {"_id": ObjectId(conversation_id)},
-                {
-                    "$set": {
-                        "compression_metadata.is_compressed": True,
-                        "compression_metadata.last_compression_at": compression_metadata.get(
-                            "timestamp"
-                        ),
-                    },
-                    "$push": {
-                        "compression_metadata.compression_points": {
-                            "$each": [compression_metadata],
-                            "$slice": -settings.COMPRESSION_MAX_HISTORY_POINTS,
-                        }
-                    },
-                },
-            )
-            logger.info(
-                f"Updated compression metadata for conversation {conversation_id}"
-            )
-
-            # Dual-write to Postgres: mirror $set + $push $slice.
-            def _pg_compression(repo: ConversationsRepository) -> None:
+            with db_session() as conn:
+                repo = ConversationsRepository(conn)
+                # conversation_id here comes from the streaming pipeline
+                # which has already resolved it; accept either UUID or
+                # legacy id for safety.
                conv = repo.get_by_legacy_id(conversation_id)
-                if conv is None:
-                    return
+                conv_pg_id = (
+                    str(conv["id"]) if conv is not None else conversation_id
+                )
                repo.set_compression_flags(
-                    conv["id"],
+                    conv_pg_id,
                    is_compressed=True,
                    last_compression_at=compression_metadata.get("timestamp"),
                )
                repo.append_compression_point(
-                    conv["id"],
+                    conv_pg_id,
                    compression_metadata,
                    max_points=settings.COMPRESSION_MAX_HISTORY_POINTS,
                )
-
-            dual_write(ConversationsRepository, _pg_compression)
+            logger.info(
+                f"Updated compression metadata for conversation {conversation_id}"
+            )
        except Exception as e:
            logger.error(
                f"Error updating compression metadata: {str(e)}", exc_info=True
@@ -325,39 +456,22 @@ class ConversationService:
    def append_compression_message(
        self, conversation_id: str, compression_metadata: Dict[str, Any]
    ) -> None:
-        """
-        Append a synthetic compression summary entry into the conversation history.
-        This makes the summary visible in the DB alongside normal queries.
-        """
+        """Append a synthetic compression summary message to the conversation."""
        try:
            summary = compression_metadata.get("compressed_summary", "")
            if not summary:
                return
-            timestamp = compression_metadata.get("timestamp", datetime.now(timezone.utc))
-
-            self.conversations_collection.update_one(
-                {"_id": ObjectId(conversation_id)},
-                {
-                    "$push": {
-                        "queries": {
-                            "prompt": "[Context Compression Summary]",
-                            "response": summary,
-                            "thought": "",
-                            "sources": [],
-                            "tool_calls": [],
-                            "timestamp": timestamp,
-                            "attachments": [],
-                            "model_id": compression_metadata.get("model_used"),
-                        }
-                    }
-                },
+            timestamp = compression_metadata.get(
+                "timestamp", datetime.now(timezone.utc)
            )

-            def _pg_append_summary(repo: ConversationsRepository) -> None:
+            with db_session() as conn:
+                repo = ConversationsRepository(conn)
                conv = repo.get_by_legacy_id(conversation_id)
-                if conv is None:
-                    return
-                repo.append_message(conv["id"], {
+                conv_pg_id = (
+                    str(conv["id"]) if conv is not None else conversation_id
+                )
+                repo.append_message(conv_pg_id, {
                    "prompt": "[Context Compression Summary]",
                    "response": summary,
                    "thought": "",
@@ -367,9 +481,9 @@ class ConversationService:
                    "model_id": compression_metadata.get("model_used"),
                    "timestamp": timestamp,
                })
-
-            dual_write(ConversationsRepository, _pg_append_summary)
-            logger.info(f"Appended compression summary to conversation {conversation_id}")
+            logger.info(
+                f"Appended compression summary to conversation {conversation_id}"
+            )
        except Exception as e:
            logger.error(
                f"Error appending compression summary: {str(e)}", exc_info=True
@@ -378,20 +492,30 @@ class ConversationService:
    def get_compression_metadata(
        self, conversation_id: str
    ) -> Optional[Dict[str, Any]]:
-        """
-        Get compression metadata for a conversation.
-
-        Args:
-            conversation_id: Conversation ID
-
-        Returns:
-            Compression metadata dict or None
-        """
+        """Fetch the stored compression metadata JSONB blob for a conversation."""
        try:
-            conversation = self.conversations_collection.find_one(
-                {"_id": ObjectId(conversation_id)}, {"compression_metadata": 1}
-            )
-            return conversation.get("compression_metadata") if conversation else None
+            with db_readonly() as conn:
+                repo = ConversationsRepository(conn)
+                conv = repo.get_by_legacy_id(conversation_id)
+                if conv is None:
+                    # Fallback to UUID lookup without user scoping — the
+                    # caller already holds an authenticated conversation
+                    # id from the streaming path. Gate on id shape so a
+                    # non-UUID (legacy ObjectId that wasn't backfilled)
+                    # doesn't reach CAST — the cast raises and spams the
+                    # logs with a stack trace on every call.
+                    if not looks_like_uuid(conversation_id):
+                        return None
+                    result = conn.execute(
+                        sql_text(
+                            "SELECT compression_metadata FROM conversations "
+                            "WHERE id = CAST(:id AS uuid)"
+                        ),
+                        {"id": conversation_id},
+                    )
+                    row = result.fetchone()
+                    return row[0] if row is not None else None
+            return conv.get("compression_metadata") if conv else None
        except Exception as e:
            logger.error(
                f"Error getting compression metadata: {str(e)}", exc_info=True
--- a/application/api/answer/services/stream_processor.py
+++ b/application/api/answer/services/stream_processor.py
@@ -5,10 +5,6 @@ import os
 from pathlib import Path
 from typing import Any, Dict, Optional, Set

-from bson.dbref import DBRef
-
-from bson.objectid import ObjectId
-
 from application.agents.agent_creator import AgentCreator
 from application.api.answer.services.compression import CompressionOrchestrator
 from application.api.answer.services.compression.token_counter import TokenCounter
@@ -20,8 +16,16 @@ from application.core.model_utils import (
    get_provider_from_model_id,
    validate_model_id,
 )
-from application.core.mongo_db import MongoDB
 from application.core.settings import settings
+from sqlalchemy import text as sql_text
+
+from application.storage.db.base_repository import looks_like_uuid, row_to_dict
+from application.storage.db.repositories.agents import AgentsRepository
+from application.storage.db.repositories.attachments import AttachmentsRepository
+from application.storage.db.repositories.prompts import PromptsRepository
+from application.storage.db.repositories.sources import SourcesRepository
+from application.storage.db.repositories.user_tools import UserToolsRepository
+from application.storage.db.session import db_readonly, db_session
 from application.retriever.retriever_creator import RetrieverCreator
 from application.utils import (
    calculate_doc_token_budget,
@@ -32,28 +36,41 @@ logger = logging.getLogger(__name__)


 def get_prompt(prompt_id: str, prompts_collection=None) -> str:
+    """Get a prompt by preset name or Postgres ID (UUID or legacy ObjectId).
+
+    The ``prompts_collection`` parameter is retained for backwards
+    compatibility with call sites that still pass it positionally; it is
+    ignored post-cutover.
    """
-    Get a prompt by preset name or MongoDB ID
-    """
+    del prompts_collection  # unused — retained for call-site compatibility
+    # Callers may pass a ``uuid.UUID`` (from a PG ``prompt_id`` column) or a
+    # plain string ("default"/"creative"/legacy ObjectId). Normalise to str
+    # so both the preset lookup and the UUID-vs-legacy branching work.
+    # ``None`` / empty means "use the default prompt" — agents that never
+    # set a custom prompt land here (PG ``agents.prompt_id`` is NULL).
+    if prompt_id is None or prompt_id == "":
+        prompt_id = "default"
+    elif not isinstance(prompt_id, str):
+        prompt_id = str(prompt_id)
    current_dir = Path(__file__).resolve().parents[3]
    prompts_dir = current_dir / "prompts"

-    # Maps for classic agent types
    CLASSIC_PRESETS = {
        "default": "chat_combine_default.txt",
        "creative": "chat_combine_creative.txt",
        "strict": "chat_combine_strict.txt",
        "reduce": "chat_reduce_prompt.txt",
    }
-
-    # Agentic counterparts — same styles, but with search tool instructions
    AGENTIC_PRESETS = {
        "default": "agentic/default.txt",
        "creative": "agentic/creative.txt",
        "strict": "agentic/strict.txt",
    }

-    preset_mapping = {**CLASSIC_PRESETS, **{f"agentic_{k}": v for k, v in AGENTIC_PRESETS.items()}}
+    preset_mapping = {
+        **CLASSIC_PRESETS,
+        **{f"agentic_{k}": v for k, v in AGENTIC_PRESETS.items()},
+    }

    if prompt_id in preset_mapping:
        file_path = os.path.join(prompts_dir, preset_mapping[prompt_id])
@@ -63,14 +80,18 @@ def get_prompt(prompt_id: str, prompts_collection=None) -> str:
        except FileNotFoundError:
            raise FileNotFoundError(f"Prompt file not found: {file_path}")
    try:
-        if prompts_collection is None:
-            mongo = MongoDB.get_client()
-            db = mongo[settings.MONGO_DB_NAME]
-            prompts_collection = db["prompts"]
-        prompt_doc = prompts_collection.find_one({"_id": ObjectId(prompt_id)})
+        with db_readonly() as conn:
+            repo = PromptsRepository(conn)
+            prompt_doc = None
+            if looks_like_uuid(prompt_id):
+                prompt_doc = repo.get_for_rendering(prompt_id)
+            if prompt_doc is None:
+                prompt_doc = repo.get_by_legacy_id(prompt_id)
        if not prompt_doc:
            raise ValueError(f"Prompt with ID {prompt_id} not found")
        return prompt_doc["content"]
+    except ValueError:
+        raise
    except Exception as e:
        raise ValueError(f"Invalid prompt ID: {prompt_id}") from e

@@ -79,12 +100,9 @@ class StreamProcessor:
    def __init__(
        self, request_data: Dict[str, Any], decoded_token: Optional[Dict[str, Any]]
    ):
-        mongo = MongoDB.get_client()
-        self.db = mongo[settings.MONGO_DB_NAME]
-        self.agents_collection = self.db["agents"]
-        self.attachments_collection = self.db["attachments"]
-        self.prompts_collection = self.db["prompts"]
-
+        # Legacy attribute retained as None for any external callers that
+        # introspect the processor; all DB access uses per-op connections.
+        self.prompts_collection = None
        self.data = request_data
        self.decoded_token = decoded_token
        self.initial_user_id = (
@@ -103,6 +121,12 @@ class StreamProcessor:
        self.agent_id = self.data.get("agent_id")
        self.agent_key = None
        self.model_id: Optional[str] = None
+        # BYOM-resolution scope, set by _validate_and_set_model.
+        self.model_user_id: Optional[str] = None
+        # WAL placeholder id pulled from continuation state on resume.
+        self.reserved_message_id: Optional[str] = None
+        # Carried through resumes so multi-pause runs keep one request_id.
+        self.request_id: Optional[str] = None
        self.conversation_service = ConversationService()
        self.compression_orchestrator = CompressionOrchestrator(
            self.conversation_service
@@ -173,16 +197,23 @@ class StreamProcessor:
                    for query in conversation.get("queries", [])
                ]
        else:
+            # model_user_id keeps history trim aligned with the BYOM's
+            # actual context window instead of the default 128k.
            self.history = limit_chat_history(
-                json.loads(self.data.get("history", "[]")), model_id=self.model_id
+                json.loads(self.data.get("history", "[]")),
+                model_id=self.model_id,
+                user_id=self.model_user_id,
            )

    def _handle_compression(self, conversation: Dict[str, Any]):
        """Handle conversation compression logic using orchestrator."""
        try:
+            # initial_user_id for conversation access; model_user_id
+            # for BYOM context-window / provider lookups.
            result = self.compression_orchestrator.compress_if_needed(
                conversation_id=self.conversation_id,
                user_id=self.initial_user_id,
+                model_user_id=self.model_user_id,
                model_id=self.model_id,
                decoded_token=self.decoded_token,
            )
@@ -244,17 +275,21 @@ class StreamProcessor:
        if not attachment_ids:
            return []
        attachments = []
-        for attachment_id in attachment_ids:
-            try:
-                attachment_doc = self.attachments_collection.find_one(
-                    {"_id": ObjectId(attachment_id), "user": user_id}
-                )
-                if attachment_doc:
-                    attachments.append(attachment_doc)
-            except Exception as e:
-                logger.error(
-                    f"Error retrieving attachment {attachment_id}: {e}", exc_info=True
-                )
+        try:
+            with db_readonly() as conn:
+                repo = AttachmentsRepository(conn)
+                for attachment_id in attachment_ids:
+                    try:
+                        attachment_doc = repo.get_any(str(attachment_id), user_id)
+                        if attachment_doc:
+                            attachments.append(attachment_doc)
+                    except Exception as e:
+                        logger.error(
+                            f"Error retrieving attachment {attachment_id}: {e}",
+                            exc_info=True,
+                        )
+        except Exception as e:
+            logger.error(f"Error opening attachments connection: {e}", exc_info=True)
        return attachments

    def _validate_and_set_model(self):
@@ -262,11 +297,18 @@ class StreamProcessor:
        from application.core.model_settings import ModelRegistry

        requested_model = self.data.get("model_id")
+        # Caller picks from their own BYOM layer; agent defaults resolve
+        # under the owner's layer (shared agents have caller != owner).
+        caller_user_id = self.initial_user_id
+        owner_user_id = self.agent_config.get("user_id") or caller_user_id

        if requested_model:
-            if not validate_model_id(requested_model):
+            if not validate_model_id(requested_model, user_id=caller_user_id):
                registry = ModelRegistry.get_instance()
-                available_models = [m.id for m in registry.get_enabled_models()]
+                available_models = [
+                    m.id
+                    for m in registry.get_enabled_models(user_id=caller_user_id)
+                ]
                raise ValueError(
                    f"Invalid model_id '{requested_model}'. "
                    f"Available models: {', '.join(available_models[:5])}"
@@ -277,86 +319,114 @@ class StreamProcessor:
                    )
                )
            self.model_id = requested_model
+            self.model_user_id = caller_user_id
        else:
            agent_default_model = self.agent_config.get("default_model_id", "")
-            if agent_default_model and validate_model_id(agent_default_model):
+            if agent_default_model and validate_model_id(
+                agent_default_model, user_id=owner_user_id
+            ):
                self.model_id = agent_default_model
+                self.model_user_id = owner_user_id
            else:
                self.model_id = get_default_model_id()
+                self.model_user_id = None

    def _get_agent_key(self, agent_id: Optional[str], user_id: Optional[str]) -> tuple:
-        """Get API key for agent with access control"""
+        """Get API key for agent with access control."""
        if not agent_id:
            return None, False, None
        try:
-            agent = self.agents_collection.find_one({"_id": ObjectId(agent_id)})
+            with db_readonly() as conn:
+                # Lookup without user scoping — access control is done
+                # against ``user_id`` / ``shared_with`` / ``shared`` flags
+                # right below, matching the legacy Mongo semantics.
+                repo = AgentsRepository(conn)
+                agent = None
+                if looks_like_uuid(str(agent_id)):
+                    result = conn.execute(
+                        sql_text(
+                            "SELECT * FROM agents WHERE id = CAST(:id AS uuid)"
+                        ),
+                        {"id": str(agent_id)},
+                    )
+                    row = result.fetchone()
+                    if row is not None:
+                        agent = row_to_dict(row)
+                if agent is None:
+                    agent = repo.get_by_legacy_id(str(agent_id))
            if agent is None:
                raise Exception("Agent not found")
-            is_owner = agent.get("user") == user_id
-            is_shared_with_user = agent.get(
-                "shared_publicly", False
-            ) or user_id in agent.get("shared_with", [])
+            agent_owner = agent.get("user_id")
+            is_owner = agent_owner == user_id
+            is_shared_with_user = bool(agent.get("shared", False))

            if not (is_owner or is_shared_with_user):
                raise Exception("Unauthorized access to the agent")
            if is_owner:
-                self.agents_collection.update_one(
-                    {"_id": ObjectId(agent_id)},
-                    {
-                        "$set": {
-                            "lastUsedAt": datetime.datetime.now(datetime.timezone.utc)
-                        }
-                    },
-                )
-            return str(agent["key"]), not is_owner, agent.get("shared_token")
+                now = datetime.datetime.now(datetime.timezone.utc)
+                try:
+                    with db_session() as conn:
+                        AgentsRepository(conn).update(
+                            str(agent["id"]), agent_owner,
+                            {"last_used_at": now},
+                        )
+                except Exception:
+                    logger.warning(
+                        "Failed to update last_used_at for agent",
+                        exc_info=True,
+                    )
+            return (
+                str(agent["key"]) if agent.get("key") else None,
+                not is_owner,
+                agent.get("shared_token"),
+            )
        except Exception as e:
            logger.error(f"Error in get_agent_key: {str(e)}", exc_info=True)
            raise

    def _get_data_from_api_key(self, api_key: str) -> Dict[str, Any]:
-        data = self.agents_collection.find_one({"key": api_key})
-        if not data:
-            raise Exception("Invalid API Key, please generate a new key", 401)
-        source = data.get("source")
-        if isinstance(source, DBRef):
-            source_doc = self.db.dereference(source)
-            if source_doc:
-                data["source"] = str(source_doc["_id"])
-                data["retriever"] = source_doc.get("retriever", data.get("retriever"))
-                data["chunks"] = source_doc.get("chunks", data.get("chunks"))
+        with db_readonly() as conn:
+            agent = AgentsRepository(conn).find_by_key(api_key)
+            if not agent:
+                raise Exception("Invalid API Key, please generate a new key", 401)
+            sources_repo = SourcesRepository(conn)
+            # The repo dict uses "user_id" — the streaming path expects
+            # a "user" key (legacy Mongo shape) for identity propagation.
+            data: Dict[str, Any] = dict(agent)
+            data["user"] = agent.get("user_id")
+
+            # Resolve the primary source row (if any) for retriever/chunks.
+            source_id = agent.get("source_id")
+            if source_id:
+                source_doc = sources_repo.get(str(source_id), agent.get("user_id"))
+                if source_doc:
+                    data["source"] = str(source_doc["id"])
+                    data["retriever"] = source_doc.get(
+                        "retriever", data.get("retriever")
+                    )
+                    data["chunks"] = source_doc.get("chunks", data.get("chunks"))
+                else:
+                    data["source"] = None
            else:
                data["source"] = None
-        elif source == "default":
-            data["source"] = "default"
-        else:
-            data["source"] = None

-        sources = data.get("sources", [])
-        if sources and isinstance(sources, list):
            sources_list = []
-            for i, source_ref in enumerate(sources):
-                if source_ref == "default":
-                    processed_source = {
-                        "id": "default",
-                        "retriever": "classic",
-                        "chunks": data.get("chunks", "2"),
-                    }
-                    sources_list.append(processed_source)
-                elif isinstance(source_ref, DBRef):
-                    source_doc = self.db.dereference(source_ref)
+            extra = agent.get("extra_source_ids") or []
+            if extra:
+                for sid in extra:
+                    source_doc = sources_repo.get(str(sid), agent.get("user_id"))
                    if source_doc:
-                        processed_source = {
-                            "id": str(source_doc["_id"]),
-                            "retriever": source_doc.get("retriever", "classic"),
-                            "chunks": source_doc.get("chunks", data.get("chunks", "2")),
-                        }
-                        sources_list.append(processed_source)
-            data["sources"] = sources_list
-        else:
-            data["sources"] = []
-
+                        sources_list.append(
+                            {
+                                "id": str(source_doc["id"]),
+                                "retriever": source_doc.get("retriever", "classic"),
+                                "chunks": source_doc.get(
+                                    "chunks", data.get("chunks", "2")
+                                ),
+                            }
+                        )
+        data["sources"] = sources_list
        data["default_model_id"] = data.get("default_model_id", "")
-
        return data

    def _configure_source(self):
@@ -469,6 +539,10 @@ class StreamProcessor:
                    "allow_system_prompt_override": self._agent_data.get(
                        "allow_system_prompt_override", False
                    ),
+                    # Owner identity — _validate_and_set_model reads this to
+                    # resolve owner-stored BYOM default_model_id against the
+                    # owner's per-user model layer rather than the caller's.
+                    "user_id": self._agent_data.get("user"),
                }
            )

@@ -484,8 +558,14 @@ class StreamProcessor:
                # Owner using their own agent
                self.decoded_token = {"sub": self._agent_data.get("user")}

-            if self._agent_data.get("workflow"):
-                self.agent_config["workflow"] = self._agent_data["workflow"]
+            # PG row exposes the workflow as ``workflow_id`` (UUID column);
+            # legacy Mongo shape used the key ``workflow``. Accept either so
+            # API-key-invoked workflow agents bind correctly downstream.
+            wf_ref = self._agent_data.get("workflow") or self._agent_data.get(
+                "workflow_id"
+            )
+            if wf_ref:
+                self.agent_config["workflow"] = str(wf_ref)
                self.agent_config["workflow_owner"] = self._agent_data.get("user")
        else:
            # No API key — default/workflow configuration
@@ -510,7 +590,13 @@ class StreamProcessor:

    def _configure_retriever(self):
        """Assemble retriever config with precedence: request > agent > default."""
-        doc_token_limit = calculate_doc_token_budget(model_id=self.model_id)
+        # BYOM scope: owner for shared-agent BYOM, caller for own BYOM,
+        # None for built-ins. Without ``user_id`` here, the doc budget
+        # falls back to settings.DEFAULT_LLM_TOKEN_LIMIT and overfills
+        # the upstream context window for any small (e.g. 8k/32k) BYOM.
+        doc_token_limit = calculate_doc_token_budget(
+            model_id=self.model_id, user_id=self.model_user_id
+        )

        # Start with defaults
        retriever_name = "classic"
@@ -561,6 +647,7 @@ class StreamProcessor:
            chunks=self.retriever_config["chunks"],
            doc_token_limit=self.retriever_config.get("doc_token_limit", 50000),
            model_id=self.model_id,
+            model_user_id=self.model_user_id,
            user_api_key=self.agent_config["user_api_key"],
            agent_id=self.agent_id,
            decoded_token=self.decoded_token,
@@ -620,12 +707,9 @@ class StreamProcessor:
        filtering_enabled = required_tool_actions is not None

        try:
-            user_tools_collection = self.db["user_tools"]
            user_id = self.initial_user_id or "local"
-
-            user_tools = list(
-                user_tools_collection.find({"user": user_id, "status": True})
-            )
+            with db_readonly() as conn:
+                user_tools = UserToolsRepository(conn).list_active_for_user(user_id)

            if not user_tools:
                return None
@@ -848,6 +932,20 @@ class StreamProcessor:
        if not state:
            raise ValueError("No pending tool state found for this conversation")

+        # Claim the resume up-front. ``mark_resuming`` only flips ``pending``
+        # → ``resuming``; if it returns False, another resume already
+        # claimed this row (status='resuming') — bail before any further
+        # LLM/tool work to avoid double-execution. The cleanup janitor
+        # reverts a stale ``resuming`` claim back to ``pending`` after the
+        # 10-minute grace window so the user can retry.
+        if not cont_service.mark_resuming(
+            conversation_id, self.initial_user_id,
+        ):
+            raise ValueError(
+                "Resume already in progress for this conversation; "
+                "retry after the grace window if it stalls."
+            )
+
        messages = state["messages"]
        pending_tool_calls = state["pending_tool_calls"]
        tools_dict = state["tools_dict"]
@@ -855,6 +953,11 @@ class StreamProcessor:
        agent_config = state["agent_config"]

        model_id = agent_config.get("model_id")
+        # BYOM scope captured at initial dispatch. None for built-ins or
+        # caller-owned BYOM where decoded_token['sub'] is already the
+        # right scope; non-None for shared-agent owner BYOM where the
+        # caller's identity differs from the model owner's.
+        model_user_id = agent_config.get("model_user_id")
        llm_name = agent_config.get("llm_name", settings.LLM_PROVIDER)
        api_key = agent_config.get("api_key")
        user_api_key = agent_config.get("user_api_key")
@@ -872,6 +975,7 @@ class StreamProcessor:
            decoded_token=self.decoded_token,
            model_id=model_id,
            agent_id=agent_id,
+            model_user_id=model_user_id,
        )
        llm_handler = LLMHandlerCreator.create_handler(llm_name or "default")
        tool_executor = ToolExecutor(
@@ -901,6 +1005,7 @@ class StreamProcessor:
            "endpoint": "stream",
            "llm_name": llm_name,
            "model_id": model_id,
+            "model_user_id": model_user_id,
            "api_key": system_api_key,
            "agent_id": agent_id,
            "user_api_key": user_api_key,
@@ -923,12 +1028,22 @@ class StreamProcessor:

        # Store config for the route layer
        self.model_id = model_id
+        # Mirror ``model_user_id`` back onto the processor so the route
+        # layer (StreamResource) reads the owner scope captured at
+        # initial dispatch. Without this, ``processor.model_user_id``
+        # stays at the __init__ default (None) and complete_stream
+        # falls back to the caller's sub: the post-resume title-LLM
+        # save misses the owner's BYOM layer, and any second tool
+        # pause persists ``model_user_id=None`` — losing owner scope
+        # for every subsequent resume of this conversation.
+        self.model_user_id = model_user_id
        self.agent_id = agent_id
        self.agent_config["user_api_key"] = user_api_key
        self.conversation_id = conversation_id
-
-        # Delete state so it can't be replayed
-        cont_service.delete_state(conversation_id, self.initial_user_id)
+        # Reused on resume so the same WAL row gets finalised and
+        # request_id stays consistent across token_usage rows.
+        self.reserved_message_id = agent_config.get("reserved_message_id")
+        self.request_id = agent_config.get("request_id")

        return agent, messages, tools_dict, pending_tool_calls, tool_actions

@@ -974,8 +1089,11 @@ class StreamProcessor:
                tools_data=tools_data,
            )

+        # Use the user_id that resolved the model so owner-scoped BYOM
+        # records dispatch correctly on shared-agent requests.
+        model_user_id = getattr(self, "model_user_id", self.initial_user_id)
        provider = (
-            get_provider_from_model_id(self.model_id)
+            get_provider_from_model_id(self.model_id, user_id=model_user_id)
            if self.model_id
            else settings.LLM_PROVIDER
        )
@@ -986,8 +1104,10 @@ class StreamProcessor:
        from application.llm.handlers.handler_creator import LLMHandlerCreator
        from application.agents.tool_executor import ToolExecutor

-        # Compute backup models: agent's configured models minus the active one
-        agent_models = self.agent_config.get("models", [])
+        # Compute backup models: agent's configured models minus the active one.
+        # PG agents may carry an explicit ``models: NULL`` (not absent), so
+        # ``.get("models", [])`` isn't enough — coerce None → [].
+        agent_models = self.agent_config.get("models") or []
        backup_models = [m for m in agent_models if m != self.model_id]

        llm = LLMCreator.create_llm(
@@ -998,6 +1118,8 @@ class StreamProcessor:
            model_id=self.model_id,
            agent_id=self.agent_id,
            backup_models=backup_models,
+            # Owner-scope on shared-agent BYOM dispatch.
+            model_user_id=model_user_id,
        )
        llm_handler = LLMHandlerCreator.create_handler(
            provider if provider else "default"
@@ -1020,6 +1142,7 @@ class StreamProcessor:
            "endpoint": "stream",
            "llm_name": provider or settings.LLM_PROVIDER,
            "model_id": self.model_id,
+            "model_user_id": self.model_user_id,
            "api_key": system_api_key,
            "agent_id": self.agent_id,
            "user_api_key": self.agent_config["user_api_key"],
@@ -1047,6 +1170,7 @@ class StreamProcessor:
                    "doc_token_limit", 50000
                ),
                "model_id": self.model_id,
+                "model_user_id": self.model_user_id,
                "user_api_key": self.agent_config["user_api_key"],
                "agent_id": self.agent_id,
                "llm_name": provider or settings.LLM_PROVIDER,
--- a/application/api/connector/routes.py
+++ b/application/api/connector/routes.py
@@ -1,12 +1,10 @@
 import base64
-import datetime
 import html
 import json
 import uuid
 from urllib.parse import urlencode


-from bson.objectid import ObjectId
 from flask import (
    Blueprint,
    current_app,
@@ -17,22 +15,18 @@ from flask import (
 from flask_restx import fields, Namespace, Resource


+from application.api import api
 from application.api.user.tasks import (
    ingest_connector_task,
 )
-from application.core.mongo_db import MongoDB
-from application.core.settings import settings
-from application.api import api
-
-
 from application.parser.connectors.connector_creator import ConnectorCreator
+from application.storage.db.repositories.connector_sessions import (
+    ConnectorSessionsRepository,
+)
+from application.storage.db.repositories.sources import SourcesRepository
+from application.storage.db.session import db_readonly, db_session


-mongo = MongoDB.get_client()
-db = mongo[settings.MONGO_DB_NAME]
-sources_collection = db["sources"]
-sessions_collection = db["connector_sessions"]
-
 connector = Blueprint("connector", __name__)
 connectors_ns = Namespace("connectors", description="Connector operations", path="/")
 api.add_namespace(connectors_ns)
@@ -68,16 +62,14 @@ class ConnectorAuth(Resource):
                return make_response(jsonify({"success": False, "error": "Unauthorized"}), 401)
            user_id = decoded_token.get('sub')

-            now = datetime.datetime.now(datetime.timezone.utc)
-            result = sessions_collection.insert_one({
-                "provider": provider,
-                "user": user_id,
-                "status": "pending",
-                "created_at": now
-            })
+            with db_session() as conn:
+                session_row = ConnectorSessionsRepository(conn).upsert(
+                    user_id, provider, status="pending",
+                )
+            session_pg_id = str(session_row["id"])
            state_dict = {
                "provider": provider,
-                "object_id": str(result.inserted_id)
+                "object_id": session_pg_id,
            }
            state = base64.urlsafe_b64encode(json.dumps(state_dict).encode()).decode()

@@ -160,17 +152,25 @@ class ConnectorsCallback(Resource):

                sanitized_token_info = auth.sanitize_token_info(token_info)

-                sessions_collection.find_one_and_update(
-                    {"_id": ObjectId(state_object_id), "provider": provider},
-                    {
-                        "$set": {
-                            "session_token": session_token,
-                            "token_info": sanitized_token_info,
-                            "user_email": user_email,
-                            "status": "authorized"
-                        }
-                    }
-                )
+                # ``object_id`` in the OAuth state is the PG session row
+                # UUID (new flow) or a legacy Mongo ObjectId (pre-cutover
+                # issued state). Try UUID update first; fall back to
+                # legacy id path.
+                patch = {
+                    "session_token": session_token,
+                    "token_info": sanitized_token_info,
+                    "user_email": user_email,
+                    "status": "authorized",
+                }
+                with db_session() as conn:
+                    repo = ConnectorSessionsRepository(conn)
+                    if state_object_id:
+                        value = str(state_object_id)
+                        updated = False
+                        if len(value) == 36 and "-" in value:
+                            updated = repo.update(value, patch)
+                        if not updated:
+                            repo.update_by_legacy_id(value, patch)

                # Redirect to success page with session token and user email
                return redirect(build_callback_redirect({
@@ -222,8 +222,11 @@ class ConnectorFiles(Resource):
            if not decoded_token:
                return make_response(jsonify({"success": False, "error": "Unauthorized"}), 401)
            user = decoded_token.get('sub')
-            session = sessions_collection.find_one({"session_token": session_token, "user": user})
-            if not session:
+            with db_readonly() as conn:
+                session = ConnectorSessionsRepository(conn).get_by_session_token(
+                    session_token,
+                )
+            if not session or session.get("user_id") != user:
                return make_response(jsonify({"success": False, "error": "Invalid or unauthorized session"}), 401)

            loader = ConnectorCreator.create_connector(provider, session_token)
@@ -288,8 +291,11 @@ class ConnectorValidateSession(Resource):
                return make_response(jsonify({"success": False, "error": "Unauthorized"}), 401)
            user = decoded_token.get('sub')

-            session = sessions_collection.find_one({"session_token": session_token, "user": user})
-            if not session or "token_info" not in session:
+            with db_readonly() as conn:
+                session = ConnectorSessionsRepository(conn).get_by_session_token(
+                    session_token,
+                )
+            if not session or session.get("user_id") != user or not session.get("token_info"):
                return make_response(jsonify({"success": False, "error": "Invalid or expired session"}), 401)

            token_info = session["token_info"]
@@ -300,10 +306,11 @@ class ConnectorValidateSession(Resource):
                try:
                    refreshed_token_info = auth.refresh_access_token(token_info.get('refresh_token'))
                    sanitized_token_info = auth.sanitize_token_info(refreshed_token_info)
-                    sessions_collection.update_one(
-                        {"session_token": session_token},
-                        {"$set": {"token_info": sanitized_token_info}}
-                    )
+                    with db_session() as conn:
+                        repo = ConnectorSessionsRepository(conn)
+                        row = repo.get_by_session_token(session_token)
+                        if row:
+                            repo.update(str(row["id"]), {"token_info": sanitized_token_info})
                    token_info = sanitized_token_info
                    is_expired = False
                except Exception as refresh_error:
@@ -347,8 +354,11 @@ class ConnectorDisconnect(Resource):


            if session_token:
-                sessions_collection.delete_one({"session_token": session_token})
-            
+                with db_session() as conn:
+                    ConnectorSessionsRepository(conn).delete_by_session_token(
+                        session_token,
+                    )
+
            return make_response(jsonify({"success": True}), 200)
        except Exception as e:
            current_app.logger.error(f"Error disconnecting connector session: {e}", exc_info=True)
@@ -385,32 +395,28 @@ class ConnectorSync(Resource):
                    }), 
                    400
                )
-            source = sources_collection.find_one({"_id": ObjectId(source_id)})
+            user_id = decoded_token.get('sub')
+            with db_readonly() as conn:
+                source = SourcesRepository(conn).get_any(source_id, user_id)
            if not source:
                return make_response(
                    jsonify({
                        "success": False,
                        "error": "Source not found"
-                    }), 
+                    }),
                    404
                )

-            if source.get('user') != decoded_token.get('sub'):
-                return make_response(
-                    jsonify({
-                        "success": False,
-                        "error": "Unauthorized access to source"
-                    }), 
-                    403
-                )
+            # ``get_any`` already scopes by ``user_id``; an extra guard
+            # here would be dead code.

-            remote_data = {}
-            try:
-                if source.get('remote_data'):
-                    remote_data = json.loads(source.get('remote_data'))
-            except json.JSONDecodeError:
-                current_app.logger.error(f"Invalid remote_data format for source {source_id}")
-                remote_data = {}
+            remote_data = source.get('remote_data') or {}
+            if isinstance(remote_data, str):
+                try:
+                    remote_data = json.loads(remote_data)
+                except json.JSONDecodeError:
+                    current_app.logger.error(f"Invalid remote_data format for source {source_id}")
+                    remote_data = {}

            source_type = remote_data.get('provider')
            if not source_type:
@@ -438,7 +444,7 @@ class ConnectorSync(Resource):
                recursive=recursive,
                retriever=source.get('retriever', 'classic'),
                operation_mode="sync",
-                doc_id=source_id,
+                doc_id=str(source.get('id') or source_id),
                sync_frequency=source.get('sync_frequency', 'never')
            )

--- a/application/api/internal/routes.py
+++ b/application/api/internal/routes.py
@@ -3,18 +3,16 @@ import datetime
 import json
 from flask import Blueprint, request, send_from_directory, jsonify
 from werkzeug.utils import secure_filename
-from bson.objectid import ObjectId
 import logging
-from application.core.mongo_db import MongoDB
+
 from application.core.settings import settings
+from application.storage.db.base_repository import looks_like_uuid
+from application.storage.db.repositories.sources import SourcesRepository
+from application.storage.db.session import db_session
 from application.storage.storage_creator import StorageCreator


 logger = logging.getLogger(__name__)
-mongo = MongoDB.get_client()
-db = mongo[settings.MONGO_DB_NAME]
-conversations_collection = db["conversations"]
-sources_collection = db["sources"]

 current_dir = os.path.dirname(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
@@ -56,21 +54,21 @@ def upload_index_files():
    """Upload two files(index.faiss, index.pkl) to the user's folder."""
    if "user" not in request.form:
        return {"status": "no user"}
-    user = request.form["user"] 
+    user = request.form["user"]
    if "name" not in request.form:
        return {"status": "no name"}
    job_name = request.form["name"]
    tokens = request.form["tokens"]
    retriever = request.form["retriever"]
-    id = request.form["id"]
+    source_id = request.form["id"]
    type = request.form["type"]
    remote_data = request.form["remote_data"] if "remote_data" in request.form else None
    sync_frequency = request.form["sync_frequency"] if "sync_frequency" in request.form else None
-    
+
    file_path = request.form.get("file_path")
    directory_structure = request.form.get("directory_structure")
    file_name_map = request.form.get("file_name_map")
-    
+
    if directory_structure:
        try:
            directory_structure = json.loads(directory_structure)
@@ -89,8 +87,8 @@ def upload_index_files():
        file_name_map = None

    storage = StorageCreator.get_storage()
-    index_base_path = f"indexes/{id}"
-    
+    index_base_path = f"indexes/{source_id}"
+
    if settings.VECTOR_STORE == "faiss":
        if "file_faiss" not in request.files:
            logger.error("No file_faiss part")
@@ -111,46 +109,48 @@ def upload_index_files():
        storage.save_file(file_faiss, faiss_storage_path)
        storage.save_file(file_pkl, pkl_storage_path)

+    now = datetime.datetime.now(datetime.timezone.utc)
+    update_fields = {
+        "name": job_name,
+        "type": type,
+        "language": job_name,
+        "date": now,
+        "model": settings.EMBEDDINGS_NAME,
+        "tokens": tokens,
+        "retriever": retriever,
+        "remote_data": remote_data,
+        "sync_frequency": sync_frequency,
+        "file_path": file_path,
+        "directory_structure": directory_structure,
+    }
+    if file_name_map is not None:
+        update_fields["file_name_map"] = file_name_map

-    existing_entry = sources_collection.find_one({"_id": ObjectId(id)})
-    if existing_entry:
-        update_fields = {
-            "user": user,
-            "name": job_name,
-            "language": job_name,
-            "date": datetime.datetime.now(),
-            "model": settings.EMBEDDINGS_NAME,
-            "type": type,
-            "tokens": tokens,
-            "retriever": retriever,
-            "remote_data": remote_data,
-            "sync_frequency": sync_frequency,
-            "file_path": file_path,
-            "directory_structure": directory_structure,
-        }
-        if file_name_map is not None:
-            update_fields["file_name_map"] = file_name_map
-        sources_collection.update_one(
-            {"_id": ObjectId(id)},
-            {"$set": update_fields},
-        )
-    else:
-        insert_doc = {
-            "_id": ObjectId(id),
-            "user": user,
-            "name": job_name,
-            "language": job_name,
-            "date": datetime.datetime.now(),
-            "model": settings.EMBEDDINGS_NAME,
-            "type": type,
-            "tokens": tokens,
-            "retriever": retriever,
-            "remote_data": remote_data,
-            "sync_frequency": sync_frequency,
-            "file_path": file_path,
-            "directory_structure": directory_structure,
-        }
-        if file_name_map is not None:
-            insert_doc["file_name_map"] = file_name_map
-        sources_collection.insert_one(insert_doc)
+    with db_session() as conn:
+        repo = SourcesRepository(conn)
+        existing = None
+        if looks_like_uuid(source_id):
+            existing = repo.get(source_id, user)
+        if existing is None:
+            existing = repo.get_by_legacy_id(source_id, user)
+        if existing is not None:
+            repo.update(str(existing["id"]), user, update_fields)
+        else:
+            repo.create(
+                job_name,
+                source_id=source_id if looks_like_uuid(source_id) else None,
+                user_id=user,
+                type=type,
+                tokens=tokens,
+                retriever=retriever,
+                remote_data=remote_data,
+                sync_frequency=sync_frequency,
+                file_path=file_path,
+                directory_structure=directory_structure,
+                file_name_map=file_name_map,
+                language=job_name,
+                model=settings.EMBEDDINGS_NAME,
+                date=now,
+                legacy_mongo_id=None if looks_like_uuid(source_id) else str(source_id),
+            )
    return {"status": "ok"}
--- a/application/api/user/agents/folders.py
+++ b/application/api/user/agents/folders.py
@@ -3,29 +3,50 @@ Agent folders management routes.
 Provides virtual folder organization for agents (Google Drive-like structure).
 """

-import datetime
-from bson.objectid import ObjectId
 from flask import current_app, jsonify, make_response, request
 from flask_restx import Namespace, Resource, fields
+from sqlalchemy import text as _sql_text

 from application.api import api
-from application.api.user.base import (
-    agent_folders_collection,
-    agents_collection,
-)
-from application.storage.db.dual_write import dual_write
+from application.storage.db.base_repository import looks_like_uuid
 from application.storage.db.repositories.agent_folders import AgentFoldersRepository
+from application.storage.db.repositories.agents import AgentsRepository
+from application.storage.db.session import db_readonly, db_session
+

 agents_folders_ns = Namespace(
    "agents_folders", description="Agent folder management", path="/api/agents/folders"
 )


+def _resolve_folder_id(repo: AgentFoldersRepository, folder_id: str, user: str):
+    """Resolve a folder id that may be either a UUID or legacy Mongo ObjectId."""
+    if not folder_id:
+        return None
+    if looks_like_uuid(folder_id):
+        row = repo.get(folder_id, user)
+        if row is not None:
+            return row
+    return repo.get_by_legacy_id(folder_id, user)
+
+
 def _folder_error_response(message: str, err: Exception):
    current_app.logger.error(f"{message}: {err}", exc_info=True)
    return make_response(jsonify({"success": False, "message": message}), 400)


+def _serialize_folder(f: dict) -> dict:
+    created_at = f.get("created_at")
+    updated_at = f.get("updated_at")
+    return {
+        "id": str(f["id"]),
+        "name": f.get("name"),
+        "parent_id": str(f["parent_id"]) if f.get("parent_id") else None,
+        "created_at": created_at.isoformat() if hasattr(created_at, "isoformat") else created_at,
+        "updated_at": updated_at.isoformat() if hasattr(updated_at, "isoformat") else updated_at,
+    }
+
+
@agents_folders_ns.route("/")
 class AgentFolders(Resource):
    @api.doc(description="Get all folders for the user")
@@ -35,17 +56,9 @@ class AgentFolders(Resource):
            return make_response(jsonify({"success": False}), 401)
        user = decoded_token.get("sub")
        try:
-            folders = list(agent_folders_collection.find({"user": user}))
-            result = [
-                {
-                    "id": str(f["_id"]),
-                    "name": f["name"],
-                    "parent_id": f.get("parent_id"),
-                    "created_at": f.get("created_at", "").isoformat() if f.get("created_at") else None,
-                    "updated_at": f.get("updated_at", "").isoformat() if f.get("updated_at") else None,
-                }
-                for f in folders
-            ]
+            with db_readonly() as conn:
+                folders = AgentFoldersRepository(conn).list_for_user(user)
+            result = [_serialize_folder(f) for f in folders]
            return make_response(jsonify({"folders": result}), 200)
        except Exception as err:
            return _folder_error_response("Failed to fetch folders", err)
@@ -69,28 +82,34 @@ class AgentFolders(Resource):
        if not data or not data.get("name"):
            return make_response(jsonify({"success": False, "message": "Folder name is required"}), 400)

-        parent_id = data.get("parent_id")
-        if parent_id:
-            parent = agent_folders_collection.find_one({"_id": ObjectId(parent_id), "user": user})
-            if not parent:
-                return make_response(jsonify({"success": False, "message": "Parent folder not found"}), 404)
+        parent_id_input = data.get("parent_id")
+        description = data.get("description")

        try:
-            now = datetime.datetime.now(datetime.timezone.utc)
-            folder = {
-                "user": user,
-                "name": data["name"],
-                "parent_id": parent_id,
-                "created_at": now,
-                "updated_at": now,
-            }
-            result = agent_folders_collection.insert_one(folder)
-            dual_write(
-                AgentFoldersRepository,
-                lambda repo, u=user, n=data["name"]: repo.create(u, n),
-            )
+            with db_session() as conn:
+                repo = AgentFoldersRepository(conn)
+                pg_parent_id = None
+                if parent_id_input:
+                    parent = _resolve_folder_id(repo, parent_id_input, user)
+                    if not parent:
+                        return make_response(
+                            jsonify({"success": False, "message": "Parent folder not found"}),
+                            404,
+                        )
+                    pg_parent_id = str(parent["id"])
+                folder = repo.create(
+                    user, data["name"],
+                    description=description,
+                    parent_id=pg_parent_id,
+                )
            return make_response(
-                jsonify({"id": str(result.inserted_id), "name": data["name"], "parent_id": parent_id}),
+                jsonify(
+                    {
+                        "id": str(folder["id"]),
+                        "name": folder["name"],
+                        "parent_id": pg_parent_id,
+                    }
+                ),
                201,
            )
        except Exception as err:
@@ -106,26 +125,51 @@ class AgentFolder(Resource):
            return make_response(jsonify({"success": False}), 401)
        user = decoded_token.get("sub")
        try:
-            folder = agent_folders_collection.find_one({"_id": ObjectId(folder_id), "user": user})
-            if not folder:
-                return make_response(jsonify({"success": False, "message": "Folder not found"}), 404)
-            
-            agents = list(agents_collection.find({"user": user, "folder_id": folder_id}))
-            agents_list = [
-                {"id": str(a["_id"]), "name": a["name"], "description": a.get("description", "")}
-                for a in agents
-            ]
-            subfolders = list(agent_folders_collection.find({"user": user, "parent_id": folder_id}))
-            subfolders_list = [{"id": str(sf["_id"]), "name": sf["name"]} for sf in subfolders]
+            with db_readonly() as conn:
+                folders_repo = AgentFoldersRepository(conn)
+                folder = _resolve_folder_id(folders_repo, folder_id, user)
+                if not folder:
+                    return make_response(
+                        jsonify({"success": False, "message": "Folder not found"}),
+                        404,
+                    )
+                pg_folder_id = str(folder["id"])
+
+                agents_rows = conn.execute(
+                    _sql_text(
+                        "SELECT id, name, description FROM agents "
+                        "WHERE user_id = :user_id AND folder_id = CAST(:fid AS uuid) "
+                        "ORDER BY created_at DESC"
+                    ),
+                    {"user_id": user, "fid": pg_folder_id},
+                ).fetchall()
+                agents_list = [
+                    {
+                        "id": str(row._mapping["id"]),
+                        "name": row._mapping["name"],
+                        "description": row._mapping.get("description", "") or "",
+                    }
+                    for row in agents_rows
+                ]
+
+                subfolders = folders_repo.list_children(pg_folder_id, user)
+                subfolders_list = [
+                    {"id": str(sf["id"]), "name": sf["name"]}
+                    for sf in subfolders
+                ]

            return make_response(
-                jsonify({
-                    "id": str(folder["_id"]),
-                    "name": folder["name"],
-                    "parent_id": folder.get("parent_id"),
-                    "agents": agents_list,
-                    "subfolders": subfolders_list,
-                }),
+                jsonify(
+                    {
+                        "id": pg_folder_id,
+                        "name": folder["name"],
+                        "parent_id": (
+                            str(folder["parent_id"]) if folder.get("parent_id") else None
+                        ),
+                        "agents": agents_list,
+                        "subfolders": subfolders_list,
+                    }
+                ),
                200,
            )
        except Exception as err:
@@ -142,19 +186,57 @@ class AgentFolder(Resource):
            return make_response(jsonify({"success": False, "message": "No data provided"}), 400)

        try:
-            update_fields = {"updated_at": datetime.datetime.now(datetime.timezone.utc)}
-            if "name" in data:
-                update_fields["name"] = data["name"]
-            if "parent_id" in data:
-                if data["parent_id"] == folder_id:
-                    return make_response(jsonify({"success": False, "message": "Cannot set folder as its own parent"}), 400)
-                update_fields["parent_id"] = data["parent_id"]
+            with db_session() as conn:
+                repo = AgentFoldersRepository(conn)
+                folder = _resolve_folder_id(repo, folder_id, user)
+                if not folder:
+                    return make_response(
+                        jsonify({"success": False, "message": "Folder not found"}),
+                        404,
+                    )
+                pg_folder_id = str(folder["id"])
+
+                update_fields: dict = {}
+                if "name" in data:
+                    update_fields["name"] = data["name"]
+                if "description" in data:
+                    update_fields["description"] = data["description"]
+                if "parent_id" in data:
+                    parent_input = data.get("parent_id")
+                    if parent_input:
+                        if parent_input == folder_id or parent_input == pg_folder_id:
+                            return make_response(
+                                jsonify(
+                                    {
+                                        "success": False,
+                                        "message": "Cannot set folder as its own parent",
+                                    }
+                                ),
+                                400,
+                            )
+                        parent = _resolve_folder_id(repo, parent_input, user)
+                        if not parent:
+                            return make_response(
+                                jsonify({"success": False, "message": "Parent folder not found"}),
+                                404,
+                            )
+                        if str(parent["id"]) == pg_folder_id:
+                            return make_response(
+                                jsonify(
+                                    {
+                                        "success": False,
+                                        "message": "Cannot set folder as its own parent",
+                                    }
+                                ),
+                                400,
+                            )
+                        update_fields["parent_id"] = str(parent["id"])
+                    else:
+                        update_fields["parent_id"] = None
+
+                if update_fields:
+                    repo.update(pg_folder_id, user, update_fields)

-            result = agent_folders_collection.update_one(
-                {"_id": ObjectId(folder_id), "user": user}, {"$set": update_fields}
-            )
-            if result.matched_count == 0:
-                return make_response(jsonify({"success": False, "message": "Folder not found"}), 404)
            return make_response(jsonify({"success": True}), 200)
        except Exception as err:
            return _folder_error_response("Failed to update folder", err)
@@ -166,19 +248,24 @@ class AgentFolder(Resource):
            return make_response(jsonify({"success": False}), 401)
        user = decoded_token.get("sub")
        try:
-            agents_collection.update_many(
-                {"user": user, "folder_id": folder_id}, {"$unset": {"folder_id": ""}}
-            )
-            agent_folders_collection.update_many(
-                {"user": user, "parent_id": folder_id}, {"$unset": {"parent_id": ""}}
-            )
-            result = agent_folders_collection.delete_one({"_id": ObjectId(folder_id), "user": user})
-            dual_write(
-                AgentFoldersRepository,
-                lambda repo, fid=folder_id, u=user: repo.delete(fid, u),
-            )
-            if result.deleted_count == 0:
-                return make_response(jsonify({"success": False, "message": "Folder not found"}), 404)
+            with db_session() as conn:
+                repo = AgentFoldersRepository(conn)
+                folder = _resolve_folder_id(repo, folder_id, user)
+                if not folder:
+                    return make_response(
+                        jsonify({"success": False, "message": "Folder not found"}),
+                        404,
+                    )
+                pg_folder_id = str(folder["id"])
+                # Clear folder assignments from agents; self-FK
+                # ``ON DELETE SET NULL`` handles child folders.
+                AgentsRepository(conn).clear_folder_for_all(pg_folder_id, user)
+                deleted = repo.delete(pg_folder_id, user)
+            if not deleted:
+                return make_response(
+                    jsonify({"success": False, "message": "Folder not found"}),
+                    404,
+                )
            return make_response(jsonify({"success": True}), 200)
        except Exception as err:
            return _folder_error_response("Failed to delete folder", err)
@@ -205,26 +292,29 @@ class MoveAgentToFolder(Resource):
        if not data or not data.get("agent_id"):
            return make_response(jsonify({"success": False, "message": "Agent ID is required"}), 400)

-        agent_id = data["agent_id"]
-        folder_id = data.get("folder_id")
+        agent_id_input = data["agent_id"]
+        folder_id_input = data.get("folder_id")

        try:
-            agent = agents_collection.find_one({"_id": ObjectId(agent_id), "user": user})
-            if not agent:
-                return make_response(jsonify({"success": False, "message": "Agent not found"}), 404)
-
-            if folder_id:
-                folder = agent_folders_collection.find_one({"_id": ObjectId(folder_id), "user": user})
-                if not folder:
-                    return make_response(jsonify({"success": False, "message": "Folder not found"}), 404)
-                agents_collection.update_one(
-                    {"_id": ObjectId(agent_id)}, {"$set": {"folder_id": folder_id}}
-                )
-            else:
-                agents_collection.update_one(
-                    {"_id": ObjectId(agent_id)}, {"$unset": {"folder_id": ""}}
-                )
-
+            with db_session() as conn:
+                agents_repo = AgentsRepository(conn)
+                agent = agents_repo.get_any(agent_id_input, user)
+                if not agent:
+                    return make_response(
+                        jsonify({"success": False, "message": "Agent not found"}),
+                        404,
+                    )
+                pg_folder_id = None
+                if folder_id_input:
+                    folders_repo = AgentFoldersRepository(conn)
+                    folder = _resolve_folder_id(folders_repo, folder_id_input, user)
+                    if not folder:
+                        return make_response(
+                            jsonify({"success": False, "message": "Folder not found"}),
+                            404,
+                        )
+                    pg_folder_id = str(folder["id"])
+                agents_repo.set_folder(str(agent["id"]), user, pg_folder_id)
            return make_response(jsonify({"success": True}), 200)
        except Exception as err:
            return _folder_error_response("Failed to move agent", err)
@@ -252,25 +342,25 @@ class BulkMoveAgents(Resource):
            return make_response(jsonify({"success": False, "message": "Agent IDs are required"}), 400)

        agent_ids = data["agent_ids"]
-        folder_id = data.get("folder_id")
+        folder_id_input = data.get("folder_id")

        try:
-            if folder_id:
-                folder = agent_folders_collection.find_one({"_id": ObjectId(folder_id), "user": user})
-                if not folder:
-                    return make_response(jsonify({"success": False, "message": "Folder not found"}), 404)
-
-            object_ids = [ObjectId(aid) for aid in agent_ids]
-            if folder_id:
-                agents_collection.update_many(
-                    {"_id": {"$in": object_ids}, "user": user},
-                    {"$set": {"folder_id": folder_id}},
-                )
-            else:
-                agents_collection.update_many(
-                    {"_id": {"$in": object_ids}, "user": user},
-                    {"$unset": {"folder_id": ""}},
-                )
+            with db_session() as conn:
+                agents_repo = AgentsRepository(conn)
+                pg_folder_id = None
+                if folder_id_input:
+                    folders_repo = AgentFoldersRepository(conn)
+                    folder = _resolve_folder_id(folders_repo, folder_id_input, user)
+                    if not folder:
+                        return make_response(
+                            jsonify({"success": False, "message": "Folder not found"}),
+                            404,
+                        )
+                    pg_folder_id = str(folder["id"])
+                for agent_id_input in agent_ids:
+                    agent = agents_repo.get_any(agent_id_input, user)
+                    if agent is not None:
+                        agents_repo.set_folder(str(agent["id"]), user, pg_folder_id)
            return make_response(jsonify({"success": True}), 200)
        except Exception as err:
            return _folder_error_response("Failed to move agents", err)
--- a/application/api/user/agents/routes.py
+++ b/application/api/user/agents/routes.py
--- a/application/api/user/agents/sharing.py
+++ b/application/api/user/agents/sharing.py
@@ -3,23 +3,17 @@
 import datetime
 import secrets

-from bson import DBRef
-from bson.objectid import ObjectId
 from flask import current_app, jsonify, make_response, request
 from flask_restx import fields, Namespace, Resource
+from sqlalchemy import text as _sql_text

 from application.api import api
 from application.core.settings import settings
-from application.api.user.base import (
-    agents_collection,
-    db,
-    ensure_user_doc,
-    resolve_tool_details,
-    user_tools_collection,
-    users_collection,
-)
-from application.storage.db.dual_write import dual_write
+from application.api.user.base import resolve_tool_details
+from application.storage.db.base_repository import looks_like_uuid
+from application.storage.db.repositories.agents import AgentsRepository
 from application.storage.db.repositories.users import UsersRepository
+from application.storage.db.session import db_readonly, db_session
 from application.utils import generate_image_url

 agents_sharing_ns = Namespace(
@@ -27,6 +21,38 @@ agents_sharing_ns = Namespace(
 )


+def _serialize_agent_basic(agent: dict) -> dict:
+    """Shape a PG agent row into the API response dict."""
+    source_id = agent.get("source_id")
+    return {
+        "id": str(agent["id"]),
+        "user": agent.get("user_id", ""),
+        "name": agent.get("name", ""),
+        "image": (
+            generate_image_url(agent["image"]) if agent.get("image") else ""
+        ),
+        "description": agent.get("description", ""),
+        "source": str(source_id) if source_id else "",
+        "chunks": str(agent["chunks"]) if agent.get("chunks") is not None else "0",
+        "retriever": agent.get("retriever", "classic") or "classic",
+        "prompt_id": str(agent["prompt_id"]) if agent.get("prompt_id") else "default",
+        "tools": agent.get("tools", []) or [],
+        "tool_details": resolve_tool_details(agent.get("tools", []) or []),
+        "agent_type": agent.get("agent_type", "") or "",
+        "status": agent.get("status", "") or "",
+        "json_schema": agent.get("json_schema"),
+        "limited_token_mode": agent.get("limited_token_mode", False),
+        "token_limit": agent.get("token_limit") or settings.DEFAULT_AGENT_LIMITS["token_limit"],
+        "limited_request_mode": agent.get("limited_request_mode", False),
+        "request_limit": agent.get("request_limit") or settings.DEFAULT_AGENT_LIMITS["request_limit"],
+        "created_at": agent.get("created_at", ""),
+        "updated_at": agent.get("updated_at", ""),
+        "shared": bool(agent.get("shared", False)),
+        "shared_token": agent.get("shared_token", "") or "",
+        "shared_metadata": agent.get("shared_metadata", {}) or {},
+    }
+
+
@agents_sharing_ns.route("/shared_agent")
 class SharedAgent(Resource):
    @api.doc(
@@ -43,73 +69,33 @@ class SharedAgent(Resource):
                jsonify({"success": False, "message": "Token or ID is required"}), 400
            )
        try:
-            query = {
-                "shared_publicly": True,
-                "shared_token": shared_token,
-            }
-            shared_agent = agents_collection.find_one(query)
+            with db_readonly() as conn:
+                shared_agent = AgentsRepository(conn).find_by_shared_token(
+                    shared_token,
+                )
            if not shared_agent:
                return make_response(
                    jsonify({"success": False, "message": "Shared agent not found"}),
                    404,
                )
-            agent_id = str(shared_agent["_id"])
-            data = {
-                "id": agent_id,
-                "user": shared_agent.get("user", ""),
-                "name": shared_agent.get("name", ""),
-                "image": (
-                    generate_image_url(shared_agent["image"])
-                    if shared_agent.get("image")
-                    else ""
-                ),
-                "description": shared_agent.get("description", ""),
-                "source": (
-                    str(source_doc["_id"])
-                    if isinstance(shared_agent.get("source"), DBRef)
-                    and (source_doc := db.dereference(shared_agent.get("source")))
-                    else ""
-                ),
-                "chunks": shared_agent.get("chunks", "0"),
-                "retriever": shared_agent.get("retriever", "classic"),
-                "prompt_id": shared_agent.get("prompt_id", "default"),
-                "tools": shared_agent.get("tools", []),
-                "tool_details": resolve_tool_details(shared_agent.get("tools", [])),
-                "agent_type": shared_agent.get("agent_type", ""),
-                "status": shared_agent.get("status", ""),
-                "json_schema": shared_agent.get("json_schema"),
-                "limited_token_mode": shared_agent.get("limited_token_mode", False),
-                "token_limit": shared_agent.get("token_limit", settings.DEFAULT_AGENT_LIMITS["token_limit"]),
-                "limited_request_mode": shared_agent.get("limited_request_mode", False),
-                "request_limit": shared_agent.get("request_limit", settings.DEFAULT_AGENT_LIMITS["request_limit"]),
-                "created_at": shared_agent.get("createdAt", ""),
-                "updated_at": shared_agent.get("updatedAt", ""),
-                "shared": shared_agent.get("shared_publicly", False),
-                "shared_token": shared_agent.get("shared_token", ""),
-                "shared_metadata": shared_agent.get("shared_metadata", {}),
-            }
+            agent_id = str(shared_agent["id"])
+            data = _serialize_agent_basic(shared_agent)

            if data["tools"]:
                enriched_tools = []
-                for tool in data["tools"]:
-                    tool_data = user_tools_collection.find_one({"_id": ObjectId(tool)})
-                    if tool_data:
-                        enriched_tools.append(tool_data.get("name", ""))
+                for detail in data["tool_details"]:
+                    enriched_tools.append(detail.get("name", ""))
                data["tools"] = enriched_tools
            decoded_token = getattr(request, "decoded_token", None)
            if decoded_token:
                user_id = decoded_token.get("sub")
-                owner_id = shared_agent.get("user")
+                owner_id = shared_agent.get("user_id")

                if user_id != owner_id:
-                    ensure_user_doc(user_id)
-                    users_collection.update_one(
-                        {"user_id": user_id},
-                        {"$addToSet": {"agent_preferences.shared_with_me": agent_id}},
-                    )
-                    dual_write(UsersRepository,
-                        lambda repo, uid=user_id, aid=agent_id: repo.add_shared(uid, aid)
-                    )
+                    with db_session() as conn:
+                        users_repo = UsersRepository(conn)
+                        users_repo.upsert(user_id)
+                        users_repo.add_shared(user_id, agent_id)
            return make_response(jsonify(data), 200)
        except Exception as err:
            current_app.logger.error(f"Error retrieving shared agent: {err}")
@@ -126,55 +112,73 @@ class SharedAgents(Resource):
                return make_response(jsonify({"success": False}), 401)
            user_id = decoded_token.get("sub")

-            user_doc = ensure_user_doc(user_id)
-            shared_with_ids = user_doc.get("agent_preferences", {}).get(
-                "shared_with_me", []
-            )
-            shared_object_ids = [ObjectId(id) for id in shared_with_ids]
-
-            shared_agents_cursor = agents_collection.find(
-                {"_id": {"$in": shared_object_ids}, "shared_publicly": True}
-            )
-            shared_agents = list(shared_agents_cursor)
-
-            found_ids_set = {str(agent["_id"]) for agent in shared_agents}
-            stale_ids = [id for id in shared_with_ids if id not in found_ids_set]
-            if stale_ids:
-                users_collection.update_one(
-                    {"user_id": user_id},
-                    {"$pullAll": {"agent_preferences.shared_with_me": stale_ids}},
+            with db_session() as conn:
+                users_repo = UsersRepository(conn)
+                user_doc = users_repo.upsert(user_id)
+                shared_with_ids = (
+                    user_doc.get("agent_preferences", {}).get("shared_with_me", [])
+                    if isinstance(user_doc.get("agent_preferences"), dict)
+                    else []
                )
-                dual_write(UsersRepository,
-                    lambda repo, uid=user_id, ids=stale_ids: repo.remove_shared_bulk(uid, ids)
-                )
-            pinned_ids = set(user_doc.get("agent_preferences", {}).get("pinned", []))
+                # Keep only UUID-shaped ids; ObjectId leftovers are stripped below.
+                uuid_ids = [sid for sid in shared_with_ids if looks_like_uuid(sid)]
+                non_uuid_ids = [sid for sid in shared_with_ids if not looks_like_uuid(sid)]

-            list_shared_agents = [
-                {
-                    "id": str(agent["_id"]),
-                    "name": agent.get("name", ""),
-                    "description": agent.get("description", ""),
-                    "image": (
-                        generate_image_url(agent["image"]) if agent.get("image") else ""
-                    ),
-                    "tools": agent.get("tools", []),
-                    "tool_details": resolve_tool_details(agent.get("tools", [])),
-                    "agent_type": agent.get("agent_type", ""),
-                    "status": agent.get("status", ""),
-                    "json_schema": agent.get("json_schema"),
-                    "limited_token_mode": agent.get("limited_token_mode", False),
-                    "token_limit": agent.get("token_limit", settings.DEFAULT_AGENT_LIMITS["token_limit"]),
-                    "limited_request_mode": agent.get("limited_request_mode", False),
-                    "request_limit": agent.get("request_limit", settings.DEFAULT_AGENT_LIMITS["request_limit"]),
-                    "created_at": agent.get("createdAt", ""),
-                    "updated_at": agent.get("updatedAt", ""),
-                    "pinned": str(agent["_id"]) in pinned_ids,
-                    "shared": agent.get("shared_publicly", False),
-                    "shared_token": agent.get("shared_token", ""),
-                    "shared_metadata": agent.get("shared_metadata", {}),
-                }
-                for agent in shared_agents
-            ]
+                if uuid_ids:
+                    result = conn.execute(
+                        _sql_text(
+                            "SELECT * FROM agents "
+                            "WHERE id = ANY(CAST(:ids AS uuid[])) "
+                            "AND shared = true"
+                        ),
+                        {"ids": uuid_ids},
+                    )
+                    shared_agents = [dict(row._mapping) for row in result.fetchall()]
+                else:
+                    shared_agents = []
+
+                found_ids_set = {str(agent["id"]) for agent in shared_agents}
+                stale_ids = [sid for sid in uuid_ids if sid not in found_ids_set]
+                stale_ids.extend(non_uuid_ids)
+                if stale_ids:
+                    users_repo.remove_shared_bulk(user_id, stale_ids)
+
+                pinned_ids = set(
+                    user_doc.get("agent_preferences", {}).get("pinned", [])
+                    if isinstance(user_doc.get("agent_preferences"), dict)
+                    else []
+                )
+
+            list_shared_agents = []
+            for agent in shared_agents:
+                agent_id_str = str(agent["id"])
+                list_shared_agents.append(
+                    {
+                        "id": agent_id_str,
+                        "name": agent.get("name", ""),
+                        "description": agent.get("description", ""),
+                        "image": (
+                            generate_image_url(agent["image"]) if agent.get("image") else ""
+                        ),
+                        "tools": agent.get("tools", []) or [],
+                        "tool_details": resolve_tool_details(
+                            agent.get("tools", []) or []
+                        ),
+                        "agent_type": agent.get("agent_type", "") or "",
+                        "status": agent.get("status", "") or "",
+                        "json_schema": agent.get("json_schema"),
+                        "limited_token_mode": agent.get("limited_token_mode", False),
+                        "token_limit": agent.get("token_limit") or settings.DEFAULT_AGENT_LIMITS["token_limit"],
+                        "limited_request_mode": agent.get("limited_request_mode", False),
+                        "request_limit": agent.get("request_limit") or settings.DEFAULT_AGENT_LIMITS["request_limit"],
+                        "created_at": agent.get("created_at", ""),
+                        "updated_at": agent.get("updated_at", ""),
+                        "pinned": agent_id_str in pinned_ids,
+                        "shared": bool(agent.get("shared", False)),
+                        "shared_token": agent.get("shared_token", "") or "",
+                        "shared_metadata": agent.get("shared_metadata", {}) or {},
+                    }
+                )

            return make_response(jsonify(list_shared_agents), 200)
        except Exception as err:
@@ -228,44 +232,43 @@ class ShareAgent(Resource):
                ),
                400,
            )
+        shared_token = None
        try:
-            try:
-                agent_oid = ObjectId(agent_id)
-            except Exception:
-                return make_response(
-                    jsonify({"success": False, "message": "Invalid agent ID"}), 400
-                )
-            agent = agents_collection.find_one({"_id": agent_oid, "user": user})
-            if not agent:
-                return make_response(
-                    jsonify({"success": False, "message": "Agent not found"}), 404
-                )
-            if shared:
-                shared_metadata = {
-                    "shared_by": username,
-                    "shared_at": datetime.datetime.now(datetime.timezone.utc),
-                }
-                shared_token = secrets.token_urlsafe(32)
-                agents_collection.update_one(
-                    {"_id": agent_oid, "user": user},
-                    {
-                        "$set": {
-                            "shared_publicly": shared,
-                            "shared_metadata": shared_metadata,
+            with db_session() as conn:
+                repo = AgentsRepository(conn)
+                agent = repo.get_any(agent_id, user)
+                if not agent:
+                    return make_response(
+                        jsonify({"success": False, "message": "Agent not found"}), 404
+                    )
+                if shared:
+                    shared_metadata = {
+                        "shared_by": username,
+                        "shared_at": datetime.datetime.now(
+                            datetime.timezone.utc
+                        ).isoformat(),
+                    }
+                    shared_token = secrets.token_urlsafe(32)
+                    repo.update(
+                        str(agent["id"]), user,
+                        {
+                            "shared": True,
                            "shared_token": shared_token,
-                        }
-                    },
-                )
-            else:
-                agents_collection.update_one(
-                    {"_id": agent_oid, "user": user},
-                    {"$set": {"shared_publicly": shared, "shared_token": None}},
-                    {"$unset": {"shared_metadata": ""}},
-                )
+                            "shared_metadata": shared_metadata,
+                        },
+                    )
+                else:
+                    repo.update(
+                        str(agent["id"]), user,
+                        {
+                            "shared": False,
+                            "shared_token": None,
+                            "shared_metadata": None,
+                        },
+                    )
        except Exception as err:
            current_app.logger.error(f"Error sharing/unsharing agent: {err}", exc_info=True)
            return make_response(jsonify({"success": False, "error": "Failed to update agent sharing status"}), 400)
-        shared_token = shared_token if shared else None
        return make_response(
            jsonify({"success": True, "shared_token": shared_token}), 200
        )
--- a/application/api/user/agents/webhooks.py
+++ b/application/api/user/agents/webhooks.py
@@ -1,15 +1,20 @@
 """Agent management webhook handlers."""

 import secrets
+import uuid

-from bson.objectid import ObjectId
 from flask import current_app, jsonify, make_response, request
 from flask_restx import Namespace, Resource
+from sqlalchemy import text as sql_text

 from application.api import api
-from application.api.user.base import agents_collection, require_agent
+from application.api.user.base import require_agent
 from application.api.user.tasks import process_agent_webhook
 from application.core.settings import settings
+from application.storage.db.base_repository import looks_like_uuid
+from application.storage.db.repositories.agents import AgentsRepository
+from application.storage.db.repositories.idempotency import IdempotencyRepository
+from application.storage.db.session import db_readonly, db_session


 agents_webhooks_ns = Namespace(
@@ -17,6 +22,37 @@ agents_webhooks_ns = Namespace(
 )


+_IDEMPOTENCY_KEY_MAX_LEN = 256
+
+
+def _read_idempotency_key():
+    """Return (key, error_response). Empty header → (None, None); oversized → (None, 400)."""
+    key = request.headers.get("Idempotency-Key")
+    if not key:
+        return None, None
+    if len(key) > _IDEMPOTENCY_KEY_MAX_LEN:
+        return None, make_response(
+            jsonify(
+                {
+                    "success": False,
+                    "message": (
+                        f"Idempotency-Key exceeds maximum length of "
+                        f"{_IDEMPOTENCY_KEY_MAX_LEN} characters"
+                    ),
+                }
+            ),
+            400,
+        )
+    return key, None
+
+
+def _scoped_idempotency_key(idempotency_key, scope):
+    """``{scope}:{key}`` so different agents can't collide on the same key."""
+    if not idempotency_key or not scope:
+        return None
+    return f"{scope}:{idempotency_key}"
+
+
@agents_webhooks_ns.route("/agent_webhook")
 class AgentWebhook(Resource):
    @api.doc(
@@ -34,9 +70,8 @@ class AgentWebhook(Resource):
                jsonify({"success": False, "message": "ID is required"}), 400
            )
        try:
-            agent = agents_collection.find_one(
-                {"_id": ObjectId(agent_id), "user": user}
-            )
+            with db_readonly() as conn:
+                agent = AgentsRepository(conn).get_any(agent_id, user)
            if not agent:
                return make_response(
                    jsonify({"success": False, "message": "Agent not found"}), 404
@@ -44,10 +79,11 @@ class AgentWebhook(Resource):
            webhook_token = agent.get("incoming_webhook_token")
            if not webhook_token:
                webhook_token = secrets.token_urlsafe(32)
-                agents_collection.update_one(
-                    {"_id": ObjectId(agent_id), "user": user},
-                    {"$set": {"incoming_webhook_token": webhook_token}},
-                )
+                with db_session() as conn:
+                    AgentsRepository(conn).update(
+                        str(agent["id"]), user,
+                        {"incoming_webhook_token": webhook_token},
+                    )
            base_url = settings.API_URL.rstrip("/")
            full_webhook_url = f"{base_url}/api/webhooks/agents/{webhook_token}"
        except Exception as err:
@@ -67,7 +103,7 @@ class AgentWebhook(Resource):
 class AgentWebhookListener(Resource):
    method_decorators = [require_agent]

-    def _enqueue_webhook_task(self, agent_id_str, payload, source_method):
+    def _enqueue_webhook_task(self, agent_id_str, payload, source_method, agent=None):
        if not payload:
            current_app.logger.warning(
                f"Webhook ({source_method}) received for agent {agent_id_str} with empty payload."
@@ -76,26 +112,94 @@ class AgentWebhookListener(Resource):
            f"Incoming {source_method} webhook for agent {agent_id_str}. Enqueuing task with payload: {payload}"
        )

-        try:
-            task = process_agent_webhook.delay(
-                agent_id=agent_id_str,
-                payload=payload,
+        idempotency_key, key_error = _read_idempotency_key()
+        if key_error is not None:
+            return key_error
+        # Resolve to PG UUID first so dedup writes don't crash on legacy ids.
+        agent_uuid = None
+        if agent is not None:
+            candidate = str(agent.get("id") or "")
+            if looks_like_uuid(candidate):
+                agent_uuid = candidate
+        if idempotency_key and agent_uuid is None:
+            current_app.logger.warning(
+                "Skipping webhook idempotency dedup: agent %s has non-UUID id",
+                agent_id_str,
            )
+            idempotency_key = None
+        # Agent-scoped (webhooks have no user_id).
+        scoped_key = _scoped_idempotency_key(idempotency_key, agent_uuid)
+        # Claim before enqueue; the loser returns the winner's task_id.
+        predetermined_task_id = None
+        if scoped_key:
+            predetermined_task_id = str(uuid.uuid4())
+            with db_session() as conn:
+                claimed = IdempotencyRepository(conn).record_webhook(
+                    key=scoped_key,
+                    agent_id=agent_uuid,
+                    task_id=predetermined_task_id,
+                    response_json={
+                        "success": True, "task_id": predetermined_task_id,
+                    },
+                )
+            if claimed is None:
+                with db_readonly() as conn:
+                    cached = IdempotencyRepository(conn).get_webhook(scoped_key)
+                if cached is not None:
+                    return make_response(jsonify(cached["response_json"]), 200)
+                return make_response(
+                    jsonify({"success": True, "task_id": "deduplicated"}), 200
+                )
+
+        try:
+            apply_kwargs = dict(
+                kwargs={
+                    "agent_id": agent_id_str,
+                    "payload": payload,
+                    # Scoped so the worker dedup row matches the HTTP claim.
+                    "idempotency_key": scoped_key or idempotency_key,
+                },
+            )
+            if predetermined_task_id is not None:
+                apply_kwargs["task_id"] = predetermined_task_id
+            task = process_agent_webhook.apply_async(**apply_kwargs)
            current_app.logger.info(
                f"Task {task.id} enqueued for agent {agent_id_str} ({source_method})."
            )
-            return make_response(jsonify({"success": True, "task_id": task.id}), 200)
+            response_payload = {"success": True, "task_id": task.id}
+            return make_response(jsonify(response_payload), 200)
        except Exception as err:
            current_app.logger.error(
                f"Error enqueuing webhook task ({source_method}) for agent {agent_id_str}: {err}",
                exc_info=True,
            )
+            if scoped_key:
+                # Roll back the claim so a retry can succeed.
+                try:
+                    with db_session() as conn:
+                        conn.execute(
+                            sql_text(
+                                "DELETE FROM webhook_dedup "
+                                "WHERE idempotency_key = :k"
+                            ),
+                            {"k": scoped_key},
+                        )
+                except Exception:
+                    current_app.logger.exception(
+                        "Failed to release webhook_dedup claim for key=%s",
+                        scoped_key,
+                    )
            return make_response(
                jsonify({"success": False, "message": "Error processing webhook"}), 500
            )

    @api.doc(
-        description="Webhook listener for agent events (POST). Expects JSON payload, which is used to trigger processing.",
+        description=(
+            "Webhook listener for agent events (POST). Expects JSON payload, which "
+            "is used to trigger processing. Honors an optional ``Idempotency-Key`` "
+            "header: a repeat request with the same key within 24h returns the "
+            "original cached response and does not re-enqueue the task."
+        ),
    )
    def post(self, webhook_token, agent, agent_id_str):
        payload = request.get_json()
@@ -109,11 +213,20 @@ class AgentWebhookListener(Resource):
                ),
                400,
            )
-        return self._enqueue_webhook_task(agent_id_str, payload, source_method="POST")
+        return self._enqueue_webhook_task(
+            agent_id_str, payload, source_method="POST", agent=agent,
+        )

    @api.doc(
-        description="Webhook listener for agent events (GET). Uses URL query parameters as payload to trigger processing.",
+        description=(
+            "Webhook listener for agent events (GET). Uses URL query parameters as "
+            "payload to trigger processing. Honors an optional ``Idempotency-Key`` "
+            "header: a repeat request with the same key within 24h returns the "
+            "original cached response and does not re-enqueue the task."
+        ),
    )
    def get(self, webhook_token, agent, agent_id_str):
        payload = request.args.to_dict(flat=True)
-        return self._enqueue_webhook_task(agent_id_str, payload, source_method="GET")
+        return self._enqueue_webhook_task(
+            agent_id_str, payload, source_method="GET", agent=agent,
+        )
--- a/application/api/user/analytics/routes.py
+++ b/application/api/user/analytics/routes.py
@@ -2,26 +2,84 @@

 import datetime

-from bson.objectid import ObjectId
 from flask import current_app, jsonify, make_response, request
 from flask_restx import fields, Namespace, Resource
+from sqlalchemy import text as _sql_text

 from application.api import api
 from application.api.user.base import (
-    agents_collection,
-    conversations_collection,
    generate_date_range,
    generate_hourly_range,
    generate_minute_range,
-    token_usage_collection,
-    user_logs_collection,
 )
+from application.storage.db.repositories.agents import AgentsRepository
+from application.storage.db.repositories.token_usage import TokenUsageRepository
+from application.storage.db.repositories.user_logs import UserLogsRepository
+from application.storage.db.session import db_readonly
+

 analytics_ns = Namespace(
    "analytics", description="Analytics and reporting operations", path="/api"
 )


+_FILTER_BUCKETS = {
+    "last_hour": ("minute", "%Y-%m-%d %H:%M:00", "YYYY-MM-DD HH24:MI:00"),
+    "last_24_hour": ("hour", "%Y-%m-%d %H:00", "YYYY-MM-DD HH24:00"),
+    "last_7_days": ("day", "%Y-%m-%d", "YYYY-MM-DD"),
+    "last_15_days": ("day", "%Y-%m-%d", "YYYY-MM-DD"),
+    "last_30_days": ("day", "%Y-%m-%d", "YYYY-MM-DD"),
+}
+
+
+def _range_for_filter(filter_option: str):
+    """Return ``(start_date, end_date, bucket_unit, pg_fmt)`` for the filter.
+
+    Returns ``None`` on invalid filter.
+    """
+    if filter_option not in _FILTER_BUCKETS:
+        return None
+    end_date = datetime.datetime.now(datetime.timezone.utc)
+    bucket_unit, _py_fmt, pg_fmt = _FILTER_BUCKETS[filter_option]
+
+    if filter_option == "last_hour":
+        start_date = end_date - datetime.timedelta(hours=1)
+    elif filter_option == "last_24_hour":
+        start_date = end_date - datetime.timedelta(hours=24)
+    else:
+        days = {
+            "last_7_days": 6,
+            "last_15_days": 14,
+            "last_30_days": 29,
+        }[filter_option]
+        start_date = end_date - datetime.timedelta(days=days)
+        start_date = start_date.replace(hour=0, minute=0, second=0, microsecond=0)
+        end_date = end_date.replace(
+            hour=23, minute=59, second=59, microsecond=999999
+        )
+    return start_date, end_date, bucket_unit, pg_fmt
+
+
+def _intervals_for_filter(filter_option, start_date, end_date):
+    if filter_option == "last_hour":
+        return generate_minute_range(start_date, end_date)
+    if filter_option == "last_24_hour":
+        return generate_hourly_range(start_date, end_date)
+    return generate_date_range(start_date, end_date)
+
+
+def _resolve_api_key(conn, api_key_id, user_id):
+    """Look up the ``agents.key`` value for a given agent id.
+
+    Scoped by ``user_id`` so an authenticated caller can't probe another
+    user's agents. Accepts either UUID or legacy Mongo ObjectId shape.
+    """
+    if not api_key_id:
+        return None
+    agent = AgentsRepository(conn).get_any(api_key_id, user_id)
+    return (agent or {}).get("key") if agent else None
+
+
@analytics_ns.route("/get_message_analytics")
 class GetMessageAnalytics(Resource):
    get_message_analytics_model = api.model(
@@ -32,13 +90,7 @@ class GetMessageAnalytics(Resource):
                required=False,
                description="Filter option for analytics",
                default="last_30_days",
-                enum=[
-                    "last_hour",
-                    "last_24_hour",
-                    "last_7_days",
-                    "last_15_days",
-                    "last_30_days",
-                ],
+                enum=list(_FILTER_BUCKETS.keys()),
            ),
        },
    )
@@ -50,88 +102,54 @@ class GetMessageAnalytics(Resource):
        if not decoded_token:
            return make_response(jsonify({"success": False}), 401)
        user = decoded_token.get("sub")
-        data = request.get_json()
+        data = request.get_json() or {}
        api_key_id = data.get("api_key_id")
        filter_option = data.get("filter_option", "last_30_days")

+        window = _range_for_filter(filter_option)
+        if window is None:
+            return make_response(
+                jsonify({"success": False, "message": "Invalid option"}), 400
+            )
+        start_date, end_date, _bucket_unit, pg_fmt = window
+
        try:
-            api_key = (
-                agents_collection.find_one({"_id": ObjectId(api_key_id), "user": user})[
-                    "key"
+            with db_readonly() as conn:
+                api_key = _resolve_api_key(conn, api_key_id, user)
+
+                # Count messages per bucket, filtered by the conversation's
+                # owner (user_id) and optionally the agent api_key. The
+                # ``user_id`` filter is always applied post-cutover to
+                # prevent cross-tenant leakage on admin dashboards.
+                clauses = [
+                    "c.user_id = :user_id",
+                    "m.timestamp >= :start",
+                    "m.timestamp <= :end",
                ]
-                if api_key_id
-                else None
-            )
-        except Exception as err:
-            current_app.logger.error(f"Error getting API key: {err}", exc_info=True)
-            return make_response(jsonify({"success": False}), 400)
-        end_date = datetime.datetime.now(datetime.timezone.utc)
-
-        if filter_option == "last_hour":
-            start_date = end_date - datetime.timedelta(hours=1)
-            group_format = "%Y-%m-%d %H:%M:00"
-        elif filter_option == "last_24_hour":
-            start_date = end_date - datetime.timedelta(hours=24)
-            group_format = "%Y-%m-%d %H:00"
-        else:
-            if filter_option in ["last_7_days", "last_15_days", "last_30_days"]:
-                filter_days = (
-                    6
-                    if filter_option == "last_7_days"
-                    else 14 if filter_option == "last_15_days" else 29
-                )
-            else:
-                return make_response(
-                    jsonify({"success": False, "message": "Invalid option"}), 400
-                )
-            start_date = end_date - datetime.timedelta(days=filter_days)
-            start_date = start_date.replace(hour=0, minute=0, second=0, microsecond=0)
-            end_date = end_date.replace(
-                hour=23, minute=59, second=59, microsecond=999999
-            )
-            group_format = "%Y-%m-%d"
-        try:
-            match_stage = {
-                "$match": {
-                    "user": user,
+                params: dict = {
+                    "user_id": user,
+                    "start": start_date,
+                    "end": end_date,
+                    "fmt": pg_fmt,
                }
-            }
-            if api_key:
-                match_stage["$match"]["api_key"] = api_key
-            pipeline = [
-                match_stage,
-                {"$unwind": "$queries"},
-                {
-                    "$match": {
-                        "queries.timestamp": {"$gte": start_date, "$lte": end_date}
-                    }
-                },
-                {
-                    "$group": {
-                        "_id": {
-                            "$dateToString": {
-                                "format": group_format,
-                                "date": "$queries.timestamp",
-                            }
-                        },
-                        "count": {"$sum": 1},
-                    }
-                },
-                {"$sort": {"_id": 1}},
-            ]
+                if api_key:
+                    clauses.append("c.api_key = :api_key")
+                    params["api_key"] = api_key
+                where = " AND ".join(clauses)
+                sql = (
+                    "SELECT to_char(m.timestamp AT TIME ZONE 'UTC', :fmt) AS bucket, "
+                    "COUNT(*) AS count "
+                    "FROM conversation_messages m "
+                    "JOIN conversations c ON c.id = m.conversation_id "
+                    f"WHERE {where} "
+                    "GROUP BY bucket ORDER BY bucket ASC"
+                )
+                rows = conn.execute(_sql_text(sql), params).fetchall()

-            message_data = conversations_collection.aggregate(pipeline)
-
-            if filter_option == "last_hour":
-                intervals = generate_minute_range(start_date, end_date)
-            elif filter_option == "last_24_hour":
-                intervals = generate_hourly_range(start_date, end_date)
-            else:
-                intervals = generate_date_range(start_date, end_date)
+            intervals = _intervals_for_filter(filter_option, start_date, end_date)
            daily_messages = {interval: 0 for interval in intervals}
-
-            for entry in message_data:
-                daily_messages[entry["_id"]] = entry["count"]
+            for row in rows:
+                daily_messages[row._mapping["bucket"]] = int(row._mapping["count"])
        except Exception as err:
            current_app.logger.error(
                f"Error getting message analytics: {err}", exc_info=True
@@ -152,13 +170,7 @@ class GetTokenAnalytics(Resource):
                required=False,
                description="Filter option for analytics",
                default="last_30_days",
-                enum=[
-                    "last_hour",
-                    "last_24_hour",
-                    "last_7_days",
-                    "last_15_days",
-                    "last_30_days",
-                ],
+                enum=list(_FILTER_BUCKETS.keys()),
            ),
        },
    )
@@ -170,123 +182,36 @@ class GetTokenAnalytics(Resource):
        if not decoded_token:
            return make_response(jsonify({"success": False}), 401)
        user = decoded_token.get("sub")
-        data = request.get_json()
+        data = request.get_json() or {}
        api_key_id = data.get("api_key_id")
        filter_option = data.get("filter_option", "last_30_days")

-        try:
-            api_key = (
-                agents_collection.find_one({"_id": ObjectId(api_key_id), "user": user})[
-                    "key"
-                ]
-                if api_key_id
-                else None
+        window = _range_for_filter(filter_option)
+        if window is None:
+            return make_response(
+                jsonify({"success": False, "message": "Invalid option"}), 400
            )
-        except Exception as err:
-            current_app.logger.error(f"Error getting API key: {err}", exc_info=True)
-            return make_response(jsonify({"success": False}), 400)
-        end_date = datetime.datetime.now(datetime.timezone.utc)
+        start_date, end_date, bucket_unit, _pg_fmt = window

-        if filter_option == "last_hour":
-            start_date = end_date - datetime.timedelta(hours=1)
-            group_format = "%Y-%m-%d %H:%M:00"
-            group_stage = {
-                "$group": {
-                    "_id": {
-                        "minute": {
-                            "$dateToString": {
-                                "format": group_format,
-                                "date": "$timestamp",
-                            }
-                        }
-                    },
-                    "total_tokens": {
-                        "$sum": {"$add": ["$prompt_tokens", "$generated_tokens"]}
-                    },
-                }
-            }
-        elif filter_option == "last_24_hour":
-            start_date = end_date - datetime.timedelta(hours=24)
-            group_format = "%Y-%m-%d %H:00"
-            group_stage = {
-                "$group": {
-                    "_id": {
-                        "hour": {
-                            "$dateToString": {
-                                "format": group_format,
-                                "date": "$timestamp",
-                            }
-                        }
-                    },
-                    "total_tokens": {
-                        "$sum": {"$add": ["$prompt_tokens", "$generated_tokens"]}
-                    },
-                }
-            }
-        else:
-            if filter_option in ["last_7_days", "last_15_days", "last_30_days"]:
-                filter_days = (
-                    6
-                    if filter_option == "last_7_days"
-                    else (14 if filter_option == "last_15_days" else 29)
-                )
-            else:
-                return make_response(
-                    jsonify({"success": False, "message": "Invalid option"}), 400
-                )
-            start_date = end_date - datetime.timedelta(days=filter_days)
-            start_date = start_date.replace(hour=0, minute=0, second=0, microsecond=0)
-            end_date = end_date.replace(
-                hour=23, minute=59, second=59, microsecond=999999
-            )
-            group_format = "%Y-%m-%d"
-            group_stage = {
-                "$group": {
-                    "_id": {
-                        "day": {
-                            "$dateToString": {
-                                "format": group_format,
-                                "date": "$timestamp",
-                            }
-                        }
-                    },
-                    "total_tokens": {
-                        "$sum": {"$add": ["$prompt_tokens", "$generated_tokens"]}
-                    },
-                }
-            }
        try:
-            match_stage = {
-                "$match": {
-                    "user_id": user,
-                    "timestamp": {"$gte": start_date, "$lte": end_date},
-                }
-            }
-            if api_key:
-                match_stage["$match"]["api_key"] = api_key
-            token_usage_data = token_usage_collection.aggregate(
-                [
-                    match_stage,
-                    group_stage,
-                    {"$sort": {"_id": 1}},
-                ]
-            )
+            with db_readonly() as conn:
+                api_key = _resolve_api_key(conn, api_key_id, user)
+                # ``bucketed_totals`` applies user_id / api_key filters
+                # directly — no need to reshape a Mongo pipeline.
+                rows = TokenUsageRepository(conn).bucketed_totals(
+                    bucket_unit=bucket_unit,
+                    user_id=user,
+                    api_key=api_key,
+                    timestamp_gte=start_date,
+                    timestamp_lt=end_date,
+                )

-            if filter_option == "last_hour":
-                intervals = generate_minute_range(start_date, end_date)
-            elif filter_option == "last_24_hour":
-                intervals = generate_hourly_range(start_date, end_date)
-            else:
-                intervals = generate_date_range(start_date, end_date)
+            intervals = _intervals_for_filter(filter_option, start_date, end_date)
            daily_token_usage = {interval: 0 for interval in intervals}
-
-            for entry in token_usage_data:
-                if filter_option == "last_hour":
-                    daily_token_usage[entry["_id"]["minute"]] = entry["total_tokens"]
-                elif filter_option == "last_24_hour":
-                    daily_token_usage[entry["_id"]["hour"]] = entry["total_tokens"]
-                else:
-                    daily_token_usage[entry["_id"]["day"]] = entry["total_tokens"]
+            for entry in rows:
+                daily_token_usage[entry["bucket"]] = int(
+                    entry["prompt_tokens"] + entry["generated_tokens"]
+                )
        except Exception as err:
            current_app.logger.error(
                f"Error getting token analytics: {err}", exc_info=True
@@ -307,13 +232,7 @@ class GetFeedbackAnalytics(Resource):
                required=False,
                description="Filter option for analytics",
                default="last_30_days",
-                enum=[
-                    "last_hour",
-                    "last_24_hour",
-                    "last_7_days",
-                    "last_15_days",
-                    "last_30_days",
-                ],
+                enum=list(_FILTER_BUCKETS.keys()),
            ),
        },
    )
@@ -325,128 +244,64 @@ class GetFeedbackAnalytics(Resource):
        if not decoded_token:
            return make_response(jsonify({"success": False}), 401)
        user = decoded_token.get("sub")
-        data = request.get_json()
+        data = request.get_json() or {}
        api_key_id = data.get("api_key_id")
        filter_option = data.get("filter_option", "last_30_days")

+        window = _range_for_filter(filter_option)
+        if window is None:
+            return make_response(
+                jsonify({"success": False, "message": "Invalid option"}), 400
+            )
+        start_date, end_date, _bucket_unit, pg_fmt = window
+
        try:
-            api_key = (
-                agents_collection.find_one({"_id": ObjectId(api_key_id), "user": user})[
-                    "key"
+            with db_readonly() as conn:
+                api_key = _resolve_api_key(conn, api_key_id, user)
+
+                # Feedback lives inside the ``conversation_messages.feedback``
+                # JSONB as ``{"text": "like"|"dislike", "timestamp": "..."}``.
+                # There is no scalar ``feedback_timestamp`` column — extract
+                # the timestamp from the JSONB and cast it to timestamptz for
+                # the range filter + bucket grouping.
+                clauses = [
+                    "c.user_id = :user_id",
+                    "m.feedback IS NOT NULL",
+                    "(m.feedback->>'timestamp')::timestamptz >= :start",
+                    "(m.feedback->>'timestamp')::timestamptz <= :end",
                ]
-                if api_key_id
-                else None
-            )
-        except Exception as err:
-            current_app.logger.error(f"Error getting API key: {err}", exc_info=True)
-            return make_response(jsonify({"success": False}), 400)
-        end_date = datetime.datetime.now(datetime.timezone.utc)
-
-        if filter_option == "last_hour":
-            start_date = end_date - datetime.timedelta(hours=1)
-            group_format = "%Y-%m-%d %H:%M:00"
-            date_field = {
-                "$dateToString": {
-                    "format": group_format,
-                    "date": "$queries.feedback_timestamp",
+                params: dict = {
+                    "user_id": user,
+                    "start": start_date,
+                    "end": end_date,
+                    "fmt": pg_fmt,
                }
-            }
-        elif filter_option == "last_24_hour":
-            start_date = end_date - datetime.timedelta(hours=24)
-            group_format = "%Y-%m-%d %H:00"
-            date_field = {
-                "$dateToString": {
-                    "format": group_format,
-                    "date": "$queries.feedback_timestamp",
-                }
-            }
-        else:
-            if filter_option in ["last_7_days", "last_15_days", "last_30_days"]:
-                filter_days = (
-                    6
-                    if filter_option == "last_7_days"
-                    else (14 if filter_option == "last_15_days" else 29)
+                if api_key:
+                    clauses.append("c.api_key = :api_key")
+                    params["api_key"] = api_key
+                where = " AND ".join(clauses)
+                sql = (
+                    "SELECT to_char("
+                    "(m.feedback->>'timestamp')::timestamptz AT TIME ZONE 'UTC', :fmt"
+                    ") AS bucket, "
+                    "SUM(CASE WHEN m.feedback->>'text' = 'like' THEN 1 ELSE 0 END) AS positive, "
+                    "SUM(CASE WHEN m.feedback->>'text' = 'dislike' THEN 1 ELSE 0 END) AS negative "
+                    "FROM conversation_messages m "
+                    "JOIN conversations c ON c.id = m.conversation_id "
+                    f"WHERE {where} "
+                    "GROUP BY bucket ORDER BY bucket ASC"
                )
-            else:
-                return make_response(
-                    jsonify({"success": False, "message": "Invalid option"}), 400
-                )
-            start_date = end_date - datetime.timedelta(days=filter_days)
-            start_date = start_date.replace(hour=0, minute=0, second=0, microsecond=0)
-            end_date = end_date.replace(
-                hour=23, minute=59, second=59, microsecond=999999
-            )
-            group_format = "%Y-%m-%d"
-            date_field = {
-                "$dateToString": {
-                    "format": group_format,
-                    "date": "$queries.feedback_timestamp",
-                }
-            }
-        try:
-            match_stage = {
-                "$match": {
-                    "queries.feedback_timestamp": {
-                        "$gte": start_date,
-                        "$lte": end_date,
-                    },
-                    "queries.feedback": {"$exists": True},
-                }
-            }
-            if api_key:
-                match_stage["$match"]["api_key"] = api_key
-            pipeline = [
-                match_stage,
-                {"$unwind": "$queries"},
-                {"$match": {"queries.feedback": {"$exists": True}}},
-                {
-                    "$group": {
-                        "_id": {"time": date_field, "feedback": "$queries.feedback"},
-                        "count": {"$sum": 1},
-                    }
-                },
-                {
-                    "$group": {
-                        "_id": "$_id.time",
-                        "positive": {
-                            "$sum": {
-                                "$cond": [
-                                    {"$eq": ["$_id.feedback", "LIKE"]},
-                                    "$count",
-                                    0,
-                                ]
-                            }
-                        },
-                        "negative": {
-                            "$sum": {
-                                "$cond": [
-                                    {"$eq": ["$_id.feedback", "DISLIKE"]},
-                                    "$count",
-                                    0,
-                                ]
-                            }
-                        },
-                    }
-                },
-                {"$sort": {"_id": 1}},
-            ]
+                rows = conn.execute(_sql_text(sql), params).fetchall()

-            feedback_data = conversations_collection.aggregate(pipeline)
-
-            if filter_option == "last_hour":
-                intervals = generate_minute_range(start_date, end_date)
-            elif filter_option == "last_24_hour":
-                intervals = generate_hourly_range(start_date, end_date)
-            else:
-                intervals = generate_date_range(start_date, end_date)
+            intervals = _intervals_for_filter(filter_option, start_date, end_date)
            daily_feedback = {
                interval: {"positive": 0, "negative": 0} for interval in intervals
            }
-
-            for entry in feedback_data:
-                daily_feedback[entry["_id"]] = {
-                    "positive": entry["positive"],
-                    "negative": entry["negative"],
+            for row in rows:
+                bucket = row._mapping["bucket"]
+                daily_feedback[bucket] = {
+                    "positive": int(row._mapping["positive"] or 0),
+                    "negative": int(row._mapping["negative"] or 0),
                }
        except Exception as err:
            current_app.logger.error(
@@ -484,47 +339,89 @@ class GetUserLogs(Resource):
        if not decoded_token:
            return make_response(jsonify({"success": False}), 401)
        user = decoded_token.get("sub")
-        data = request.get_json()
+        data = request.get_json() or {}
        page = int(data.get("page", 1))
        api_key_id = data.get("api_key_id")
        page_size = int(data.get("page_size", 10))
-        skip = (page - 1) * page_size

        try:
-            api_key = (
-                agents_collection.find_one({"_id": ObjectId(api_key_id)})["key"]
-                if api_key_id
-                else None
-            )
+            with db_readonly() as conn:
+                api_key = _resolve_api_key(conn, api_key_id, user)
+                logs_repo = UserLogsRepository(conn)
+                if api_key:
+                    # ``find_by_api_key`` filters on ``data->>'api_key'``
+                    # — the PG shape of the legacy top-level ``api_key``
+                    # filter. Paginate client-side using offset/limit.
+                    all_rows = logs_repo.find_by_api_key(api_key)
+                    offset = (page - 1) * page_size
+                    window = all_rows[offset: offset + page_size + 1]
+                    items = window
+                else:
+                    items, has_more_flag = logs_repo.list_paginated(
+                        user_id=user,
+                        page=page,
+                        page_size=page_size,
+                    )
+                    # list_paginated already trims to page_size and
+                    # returns has_more separately.
+                    results = [
+                        {
+                            "id": str(item.get("id") or item.get("_id")),
+                            "action": (item.get("data") or {}).get("action"),
+                            "level": (item.get("data") or {}).get("level"),
+                            "user": item.get("user_id"),
+                            "question": (item.get("data") or {}).get("question"),
+                            "sources": (item.get("data") or {}).get("sources"),
+                            "retriever_params": (item.get("data") or {}).get(
+                                "retriever_params"
+                            ),
+                            "timestamp": (
+                                item["timestamp"].isoformat()
+                                if hasattr(item.get("timestamp"), "isoformat")
+                                else item.get("timestamp")
+                            ),
+                        }
+                        for item in items
+                    ]
+                    return make_response(
+                        jsonify(
+                            {
+                                "success": True,
+                                "logs": results,
+                                "page": page,
+                                "page_size": page_size,
+                                "has_more": has_more_flag,
+                            }
+                        ),
+                        200,
+                    )
+
+            has_more = len(items) > page_size
+            items = items[:page_size]
+            results = [
+                {
+                    "id": str(item.get("id") or item.get("_id")),
+                    "action": (item.get("data") or {}).get("action"),
+                    "level": (item.get("data") or {}).get("level"),
+                    "user": item.get("user_id"),
+                    "question": (item.get("data") or {}).get("question"),
+                    "sources": (item.get("data") or {}).get("sources"),
+                    "retriever_params": (item.get("data") or {}).get(
+                        "retriever_params"
+                    ),
+                    "timestamp": (
+                        item["timestamp"].isoformat()
+                        if hasattr(item.get("timestamp"), "isoformat")
+                        else item.get("timestamp")
+                    ),
+                }
+                for item in items
+            ]
        except Exception as err:
-            current_app.logger.error(f"Error getting API key: {err}", exc_info=True)
+            current_app.logger.error(
+                f"Error getting user logs: {err}", exc_info=True
+            )
            return make_response(jsonify({"success": False}), 400)
-        query = {"user": user}
-        if api_key:
-            query = {"api_key": api_key}
-        items_cursor = (
-            user_logs_collection.find(query)
-            .sort("timestamp", -1)
-            .skip(skip)
-            .limit(page_size + 1)
-        )
-        items = list(items_cursor)
-
-        results = [
-            {
-                "id": str(item.get("_id")),
-                "action": item.get("action"),
-                "level": item.get("level"),
-                "user": item.get("user"),
-                "question": item.get("question"),
-                "sources": item.get("sources"),
-                "retriever_params": item.get("retriever_params"),
-                "timestamp": item.get("timestamp"),
-            }
-            for item in items[:page_size]
-        ]
-
-        has_more = len(items) > page_size

        return make_response(
            jsonify(
--- a/application/api/user/attachments/routes.py
+++ b/application/api/user/attachments/routes.py
@@ -4,13 +4,16 @@ import os
 import tempfile
 from pathlib import Path

-from bson.objectid import ObjectId
+import uuid
+
 from flask import current_app, jsonify, make_response, request
 from flask_restx import fields, Namespace, Resource

 from application.api import api
 from application.cache import get_redis_instance
 from application.core.settings import settings
+from application.storage.db.repositories.agents import AgentsRepository
+from application.storage.db.session import db_readonly
 from application.stt.constants import (
    SUPPORTED_AUDIO_EXTENSIONS,
    SUPPORTED_AUDIO_MIME_TYPES,
@@ -48,14 +51,13 @@ def _resolve_authenticated_user():
        return safe_filename(decoded_token.get("sub"))

    if api_key:
-        from application.api.user.base import agents_collection
-
-        agent = agents_collection.find_one({"key": api_key})
+        with db_readonly() as conn:
+            agent = AgentsRepository(conn).find_by_key(api_key)
        if not agent:
            return make_response(
                jsonify({"success": False, "message": "Invalid API key"}), 401
            )
-        return safe_filename(agent.get("user"))
+        return safe_filename(agent.get("user_id"))

    return None

@@ -157,7 +159,7 @@ class StoreAttachment(Resource):
            
            for idx, file in enumerate(files):
                try:
-                    attachment_id = ObjectId()
+                    attachment_id = uuid.uuid4()
                    original_filename = safe_filename(os.path.basename(file.filename))
                    _enforce_uploaded_audio_size_limit(file, original_filename)
                    relative_path = f"{settings.UPLOAD_FOLDER}/{user}/attachments/{str(attachment_id)}/{original_filename}"
--- a/application/api/user/base.py
+++ b/application/api/user/base.py
@@ -8,15 +8,15 @@ import uuid
 from functools import wraps
 from typing import Optional, Tuple

-from bson.objectid import ObjectId
 from flask import current_app, jsonify, make_response, Response
-from pymongo import ReturnDocument
 from werkzeug.utils import secure_filename

-from application.core.mongo_db import MongoDB
+from sqlalchemy import text as _sql_text
+
 from application.core.settings import settings
-from application.storage.db.dual_write import dual_write
+from application.storage.db.base_repository import looks_like_uuid, row_to_dict
 from application.storage.db.repositories.users import UsersRepository
+from application.storage.db.session import db_readonly, db_session
 from application.storage.storage_creator import StorageCreator
 from application.vectorstore.vector_creator import VectorCreator

@@ -24,56 +24,6 @@ from application.vectorstore.vector_creator import VectorCreator
 storage = StorageCreator.get_storage()


-mongo = MongoDB.get_client()
-db = mongo[settings.MONGO_DB_NAME]
-
-
-conversations_collection = db["conversations"]
-sources_collection = db["sources"]
-prompts_collection = db["prompts"]
-feedback_collection = db["feedback"]
-agents_collection = db["agents"]
-agent_folders_collection = db["agent_folders"]
-token_usage_collection = db["token_usage"]
-shared_conversations_collections = db["shared_conversations"]
-users_collection = db["users"]
-user_logs_collection = db["user_logs"]
-user_tools_collection = db["user_tools"]
-attachments_collection = db["attachments"]
-workflow_runs_collection = db["workflow_runs"]
-workflows_collection = db["workflows"]
-workflow_nodes_collection = db["workflow_nodes"]
-workflow_edges_collection = db["workflow_edges"]
-
-
-try:
-    agents_collection.create_index(
-        [("shared", 1)],
-        name="shared_index",
-        background=True,
-    )
-    users_collection.create_index("user_id", unique=True)
-    workflows_collection.create_index(
-        [("user", 1)], name="workflow_user_index", background=True
-    )
-    workflow_nodes_collection.create_index(
-        [("workflow_id", 1)], name="node_workflow_index", background=True
-    )
-    workflow_nodes_collection.create_index(
-        [("workflow_id", 1), ("graph_version", 1)],
-        name="node_workflow_graph_version_index",
-        background=True,
-    )
-    workflow_edges_collection.create_index(
-        [("workflow_id", 1)], name="edge_workflow_index", background=True
-    )
-    workflow_edges_collection.create_index(
-        [("workflow_id", 1), ("graph_version", 1)],
-        name="edge_workflow_graph_version_index",
-        background=True,
-    )
-except Exception as e:
-    print("Error creating indexes:", e)
 current_dir = os.path.dirname(
    os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
 )
@@ -105,69 +55,95 @@ def generate_date_range(start_date, end_date):

 def ensure_user_doc(user_id):
    """
-    Ensure user document exists with proper agent preferences structure.
+    Ensure a Postgres ``users`` row exists for ``user_id``.
+
+    Returns the row as a dict with the shape legacy callers expect — in
+    particular ``user_id`` and ``agent_preferences`` (with ``pinned`` and
+    ``shared_with_me`` list keys always present).

    Args:
        user_id: The user ID to ensure

    Returns:
-        The user document
+        The user document as a dict.
    """
-    default_prefs = {
-        "pinned": [],
-        "shared_with_me": [],
-    }
-
-    user_doc = users_collection.find_one_and_update(
-        {"user_id": user_id},
-        {"$setOnInsert": {"agent_preferences": default_prefs}},
-        upsert=True,
-        return_document=ReturnDocument.AFTER,
-    )
-
-    prefs = user_doc.get("agent_preferences", {})
-    updates = {}
-    if "pinned" not in prefs:
-        updates["agent_preferences.pinned"] = []
-    if "shared_with_me" not in prefs:
-        updates["agent_preferences.shared_with_me"] = []
-    if updates:
-        users_collection.update_one({"user_id": user_id}, {"$set": updates})
-        user_doc = users_collection.find_one({"user_id": user_id})
-
-    dual_write(UsersRepository, lambda repo: repo.upsert(user_id))
+    with db_session() as conn:
+        user_doc = UsersRepository(conn).upsert(user_id)

+    prefs = user_doc.get("agent_preferences") or {}
+    if not isinstance(prefs, dict):
+        prefs = {}
+    prefs.setdefault("pinned", [])
+    prefs.setdefault("shared_with_me", [])
+    user_doc["agent_preferences"] = prefs
    return user_doc


 def resolve_tool_details(tool_ids):
    """
-    Resolve tool IDs to their details.
+    Resolve tool IDs to their display details.
+
+    Accepts either Postgres UUIDs or legacy Mongo ObjectId strings (mixed
+    lists are supported — each id is looked up via ``get_any``, which
+    resolves to whichever column matches). Unknown ids are silently
+    skipped.

    Args:
-        tool_ids: List of tool IDs
+        tool_ids: List of tool IDs (UUIDs or legacy Mongo ObjectId strings).

    Returns:
-        List of tool details with id, name, and display_name
+        List of tool details with ``id``, ``name``, and ``display_name``.
    """
-    valid_ids = []
+    if not tool_ids:
+        return []
+
+    uuid_ids: list[str] = []
+    legacy_ids: list[str] = []
    for tid in tool_ids:
-        try:
-            valid_ids.append(ObjectId(tid))
-        except Exception:
+        if not tid:
            continue
-    tools = user_tools_collection.find(
-        {"_id": {"$in": valid_ids}}
-    ) if valid_ids else []
+        tid_str = str(tid)
+        if looks_like_uuid(tid_str):
+            uuid_ids.append(tid_str)
+        else:
+            legacy_ids.append(tid_str)
+
+    if not uuid_ids and not legacy_ids:
+        return []
+
+    rows: list[dict] = []
+    with db_readonly() as conn:
+        if uuid_ids:
+            result = conn.execute(
+                _sql_text(
+                    "SELECT * FROM user_tools "
+                    "WHERE id = ANY(CAST(:ids AS uuid[]))"
+                ),
+                {"ids": uuid_ids},
+            )
+            rows.extend(row_to_dict(r) for r in result.fetchall())
+        if legacy_ids:
+            result = conn.execute(
+                _sql_text(
+                    "SELECT * FROM user_tools "
+                    "WHERE legacy_mongo_id = ANY(:ids)"
+                ),
+                {"ids": legacy_ids},
+            )
+            rows.extend(row_to_dict(r) for r in result.fetchall())
+
    return [
        {
-            "id": str(tool["_id"]),
-            "name": tool.get("name", ""),
-            "display_name": tool.get("customName")
-            or tool.get("displayName")
-            or tool.get("name", ""),
+            "id": str(tool.get("id") or tool.get("legacy_mongo_id") or ""),
+            "name": tool.get("name", "") or "",
+            "display_name": (
+                tool.get("custom_name")
+                or tool.get("display_name")
+                or tool.get("name", "")
+                or ""
+            ),
        }
-        for tool in tools
+        for tool in rows
    ]


@@ -237,14 +213,15 @@ def require_agent(func):

    @wraps(func)
    def wrapper(*args, **kwargs):
+        from application.storage.db.repositories.agents import AgentsRepository
+
        webhook_token = kwargs.get("webhook_token")
        if not webhook_token:
            return make_response(
                jsonify({"success": False, "message": "Webhook token missing"}), 400
            )
-        agent = agents_collection.find_one(
-            {"incoming_webhook_token": webhook_token}, {"_id": 1}
-        )
+        with db_readonly() as conn:
+            agent = AgentsRepository(conn).find_by_webhook_token(webhook_token)
        if not agent:
            current_app.logger.warning(
                f"Webhook attempt with invalid token: {webhook_token}"
@@ -253,7 +230,7 @@ def require_agent(func):
                jsonify({"success": False, "message": "Agent not found"}), 404
            )
        kwargs["agent"] = agent
-        kwargs["agent_id_str"] = str(agent["_id"])
+        kwargs["agent_id_str"] = str(agent["id"])
        return func(*args, **kwargs)

    return wrapper
--- a/application/api/user/conversations/routes.py
+++ b/application/api/user/conversations/routes.py
@@ -2,14 +2,15 @@

 import datetime

-from bson.objectid import ObjectId
 from flask import current_app, jsonify, make_response, request
 from flask_restx import fields, Namespace, Resource
+from sqlalchemy import text as sql_text

 from application.api import api
-from application.api.user.base import attachments_collection, conversations_collection
-from application.storage.db.dual_write import dual_write
+from application.storage.db.base_repository import looks_like_uuid, row_to_dict
+from application.storage.db.repositories.attachments import AttachmentsRepository
 from application.storage.db.repositories.conversations import ConversationsRepository
+from application.storage.db.session import db_readonly, db_session
 from application.utils import check_required_fields

 conversations_ns = Namespace(
@@ -34,21 +35,16 @@ class DeleteConversation(Resource):
            )
        user_id = decoded_token["sub"]
        try:
-            conversations_collection.delete_one(
-                {"_id": ObjectId(conversation_id), "user": user_id}
-            )
+            with db_session() as conn:
+                repo = ConversationsRepository(conn)
+                conv = repo.get_any(conversation_id, user_id)
+                if conv is not None:
+                    repo.delete(str(conv["id"]), user_id)
        except Exception as err:
            current_app.logger.error(
                f"Error deleting conversation: {err}", exc_info=True
            )
            return make_response(jsonify({"success": False}), 400)
-
-        def _pg_delete(repo: ConversationsRepository) -> None:
-            conv = repo.get_by_legacy_id(conversation_id)
-            if conv is not None:
-                repo.delete(conv["id"], user_id)
-
-        dual_write(ConversationsRepository, _pg_delete)
        return make_response(jsonify({"success": True}), 200)


@@ -63,17 +59,13 @@ class DeleteAllConversations(Resource):
            return make_response(jsonify({"success": False}), 401)
        user_id = decoded_token.get("sub")
        try:
-            conversations_collection.delete_many({"user": user_id})
+            with db_session() as conn:
+                ConversationsRepository(conn).delete_all_for_user(user_id)
        except Exception as err:
            current_app.logger.error(
                f"Error deleting all conversations: {err}", exc_info=True
            )
            return make_response(jsonify({"success": False}), 400)
-
-        dual_write(
-            ConversationsRepository,
-            lambda r, uid=user_id: r.delete_all_for_user(uid),
-        )
        return make_response(jsonify({"success": True}), 200)


@@ -86,26 +78,21 @@ class GetConversations(Resource):
        decoded_token = request.decoded_token
        if not decoded_token:
            return make_response(jsonify({"success": False}), 401)
+        user_id = decoded_token.get("sub")
        try:
-            conversations = (
-                conversations_collection.find(
-                    {
-                        "$or": [
-                            {"api_key": {"$exists": False}},
-                            {"agent_id": {"$exists": True}},
-                        ],
-                        "user": decoded_token.get("sub"),
-                    }
+            with db_readonly() as conn:
+                conversations = ConversationsRepository(conn).list_for_user(
+                    user_id, limit=30
                )
-                .sort("date", -1)
-                .limit(30)
-            )
-
            list_conversations = [
                {
-                    "id": str(conversation["_id"]),
+                    "id": str(conversation["id"]),
                    "name": conversation["name"],
-                    "agent_id": conversation.get("agent_id", None),
+                    "agent_id": (
+                        str(conversation["agent_id"])
+                        if conversation.get("agent_id")
+                        else None
+                    ),
                    "is_shared_usage": conversation.get("is_shared_usage", False),
                    "shared_token": conversation.get("shared_token", None),
                }
@@ -134,38 +121,74 @@ class GetSingleConversation(Resource):
            return make_response(
                jsonify({"success": False, "message": "ID is required"}), 400
            )
+        user_id = decoded_token.get("sub")
        try:
-            conversation = conversations_collection.find_one(
-                {"_id": ObjectId(conversation_id), "user": decoded_token.get("sub")}
-            )
-            if not conversation:
-                return make_response(jsonify({"status": "not found"}), 404)
-            # Process queries to include attachment names
+            with db_readonly() as conn:
+                repo = ConversationsRepository(conn)
+                conversation = repo.get_any(conversation_id, user_id)
+                if not conversation:
+                    return make_response(jsonify({"status": "not found"}), 404)
+                conv_pg_id = str(conversation["id"])
+                messages = repo.get_messages(conv_pg_id)

-            queries = conversation["queries"]
-            for query in queries:
-                if "attachments" in query and query["attachments"]:
-                    attachment_details = []
-                    for attachment_id in query["attachments"]:
-                        try:
-                            attachment = attachments_collection.find_one(
-                                {"_id": ObjectId(attachment_id)}
-                            )
-                            if attachment:
-                                attachment_details.append(
-                                    {
-                                        "id": str(attachment["_id"]),
-                                        "fileName": attachment.get(
-                                            "filename", "Unknown file"
-                                        ),
-                                    }
+                # Resolve attachment details (id, fileName) for each message.
+                attachments_repo = AttachmentsRepository(conn)
+                queries = []
+                for msg in messages:
+                    metadata = msg.get("metadata") or {}
+                    query = {
+                        "prompt": msg.get("prompt"),
+                        "response": msg.get("response"),
+                        "thought": msg.get("thought"),
+                        "sources": msg.get("sources") or [],
+                        "tool_calls": msg.get("tool_calls") or [],
+                        "timestamp": msg.get("timestamp"),
+                        "model_id": msg.get("model_id"),
+                        # Lets the client distinguish placeholder rows from
+                        # finalised answers and tail-poll in-flight ones.
+                        "message_id": str(msg["id"]) if msg.get("id") else None,
+                        "status": msg.get("status"),
+                        "request_id": msg.get("request_id"),
+                        "last_heartbeat_at": metadata.get("last_heartbeat_at"),
+                    }
+                    if metadata:
+                        query["metadata"] = metadata
+                    # Feedback on conversation_messages is a JSONB blob with
+                    # shape {"text": <str>, "timestamp": <iso>}. The legacy
+                    # frontend consumed a flat scalar feedback string, so
+                    # unwrap the ``text`` field for compat.
+                    feedback = msg.get("feedback")
+                    if feedback is not None:
+                        if isinstance(feedback, dict):
+                            query["feedback"] = feedback.get("text")
+                            if feedback.get("timestamp"):
+                                query["feedback_timestamp"] = feedback["timestamp"]
+                        else:
+                            query["feedback"] = feedback
+                    attachments = msg.get("attachments") or []
+                    if attachments:
+                        attachment_details = []
+                        for attachment_id in attachments:
+                            try:
+                                att = attachments_repo.get_any(
+                                    str(attachment_id), user_id
                                )
-                        except Exception as e:
-                            current_app.logger.error(
-                                f"Error retrieving attachment {attachment_id}: {e}",
-                                exc_info=True,
-                            )
-                    query["attachments"] = attachment_details
+                                if att:
+                                    attachment_details.append(
+                                        {
+                                            "id": str(att["id"]),
+                                            "fileName": att.get(
+                                                "filename", "Unknown file"
+                                            ),
+                                        }
+                                    )
+                            except Exception as e:
+                                current_app.logger.error(
+                                    f"Error retrieving attachment {attachment_id}: {e}",
+                                    exc_info=True,
+                                )
+                        query["attachments"] = attachment_details
+                    queries.append(query)
        except Exception as err:
            current_app.logger.error(
                f"Error retrieving conversation: {err}", exc_info=True
@@ -173,7 +196,9 @@ class GetSingleConversation(Resource):
            return make_response(jsonify({"success": False}), 400)
        data = {
            "queries": queries,
-            "agent_id": conversation.get("agent_id"),
+            "agent_id": (
+                str(conversation["agent_id"]) if conversation.get("agent_id") else None
+            ),
            "is_shared_usage": conversation.get("is_shared_usage", False),
            "shared_token": conversation.get("shared_token", None),
        }
@@ -207,22 +232,16 @@ class UpdateConversationName(Resource):
            return missing_fields
        user_id = decoded_token.get("sub")
        try:
-            conversations_collection.update_one(
-                {"_id": ObjectId(data["id"]), "user": user_id},
-                {"$set": {"name": data["name"]}},
-            )
+            with db_session() as conn:
+                repo = ConversationsRepository(conn)
+                conv = repo.get_any(data["id"], user_id)
+                if conv is not None:
+                    repo.rename(str(conv["id"]), user_id, data["name"])
        except Exception as err:
            current_app.logger.error(
                f"Error updating conversation name: {err}", exc_info=True
            )
            return make_response(jsonify({"success": False}), 400)
-
-        def _pg_rename(repo: ConversationsRepository) -> None:
-            conv = repo.get_by_legacy_id(data["id"])
-            if conv is not None:
-                repo.rename(conv["id"], user_id, data["name"])
-
-        dual_write(ConversationsRepository, _pg_rename)
        return make_response(jsonify({"success": True}), 200)


@@ -260,61 +279,92 @@ class SubmitFeedback(Resource):
        missing_fields = check_required_fields(data, required_fields)
        if missing_fields:
            return missing_fields
+        user_id = decoded_token.get("sub")
+        feedback_value = data["feedback"]
+        question_index = int(data["question_index"])
+        # Normalize string feedback to lowercase so analytics queries
+        # (which match 'like'/'dislike') count rows correctly. Tolerate
+        # legacy uppercase clients on ingest. Non-string values pass through.
+        if isinstance(feedback_value, str):
+            feedback_value = feedback_value.lower()
+        feedback_payload = (
+            None
+            if feedback_value is None
+            else {
+                "text": feedback_value,
+                "timestamp": datetime.datetime.now(
+                    datetime.timezone.utc
+                ).isoformat(),
+            }
+        )
        try:
-            if data["feedback"] is None:
-                # Remove feedback and feedback_timestamp if feedback is null
-
-                conversations_collection.update_one(
-                    {
-                        "_id": ObjectId(data["conversation_id"]),
-                        "user": decoded_token.get("sub"),
-                        f"queries.{data['question_index']}": {"$exists": True},
-                    },
-                    {
-                        "$unset": {
-                            f"queries.{data['question_index']}.feedback": "",
-                            f"queries.{data['question_index']}.feedback_timestamp": "",
-                        }
-                    },
-                )
-            else:
-                # Set feedback and feedback_timestamp if feedback has a value
-
-                conversations_collection.update_one(
-                    {
-                        "_id": ObjectId(data["conversation_id"]),
-                        "user": decoded_token.get("sub"),
-                        f"queries.{data['question_index']}": {"$exists": True},
-                    },
-                    {
-                        "$set": {
-                            f"queries.{data['question_index']}.feedback": data[
-                                "feedback"
-                            ],
-                            f"queries.{data['question_index']}.feedback_timestamp": datetime.datetime.now(
-                                datetime.timezone.utc
-                            ),
-                        }
-                    },
-                )
+            with db_session() as conn:
+                repo = ConversationsRepository(conn)
+                conv = repo.get_any(data["conversation_id"], user_id)
+                if conv is None:
+                    return make_response(
+                        jsonify({"success": False, "message": "Not found"}), 404
+                    )
+                repo.set_feedback(str(conv["id"]), question_index, feedback_payload)
        except Exception as err:
            current_app.logger.error(f"Error submitting feedback: {err}", exc_info=True)
            return make_response(jsonify({"success": False}), 400)
-
-        # Dual-write to Postgres: mirror the per-message feedback set/unset.
-        feedback_value = data["feedback"]
-        question_index = int(data["question_index"])
-        feedback_payload = (
-            None if feedback_value is None
-            else {"text": feedback_value, "timestamp": datetime.datetime.now(
-                datetime.timezone.utc
-            ).isoformat()}
-        )
-
-        def _pg_feedback(repo: ConversationsRepository) -> None:
-            conv = repo.get_by_legacy_id(data["conversation_id"])
-            if conv is not None:
-                repo.set_feedback(conv["id"], question_index, feedback_payload)
-
-        dual_write(ConversationsRepository, _pg_feedback)
        return make_response(jsonify({"success": True}), 200)
+
+
+@conversations_ns.route("/messages/<string:message_id>/tail")
+class GetMessageTail(Resource):
+    @api.doc(
+        description=(
+            "Current state of one conversation_messages row, scoped to the "
+            "authenticated user. Used to reconnect to an in-flight stream "
+            "after a refresh."
+        ),
+        params={"message_id": "Message UUID"},
+    )
+    def get(self, message_id):
+        decoded_token = request.decoded_token
+        if not decoded_token:
+            return make_response(jsonify({"success": False}), 401)
+        if not looks_like_uuid(message_id):
+            return make_response(
+                jsonify({"success": False, "message": "Invalid message id"}), 400
+            )
+        user_id = decoded_token.get("sub")
+        try:
+            with db_readonly() as conn:
+                # Owner-or-shared, matching ``ConversationsRepository.get``.
+                row = conn.execute(
+                    sql_text(
+                        "SELECT m.* FROM conversation_messages m "
+                        "JOIN conversations c ON c.id = m.conversation_id "
+                        "WHERE m.id = CAST(:mid AS uuid) "
+                        "AND (c.user_id = :uid OR :uid = ANY(c.shared_with))"
+                    ),
+                    {"mid": message_id, "uid": user_id},
+                ).fetchone()
+                if row is None:
+                    return make_response(jsonify({"status": "not found"}), 404)
+                msg = row_to_dict(row)
+        except Exception as err:
+            current_app.logger.error(
+                f"Error tailing message {message_id}: {err}", exc_info=True
+            )
+            return make_response(jsonify({"success": False}), 400)
+        metadata = msg.get("message_metadata") or {}
+        return make_response(
+            jsonify(
+                {
+                    "message_id": str(msg["id"]),
+                    "status": msg.get("status"),
+                    "response": msg.get("response"),
+                    "thought": msg.get("thought"),
+                    "sources": msg.get("sources") or [],
+                    "tool_calls": msg.get("tool_calls") or [],
+                    "request_id": msg.get("request_id"),
+                    "last_heartbeat_at": metadata.get("last_heartbeat_at"),
+                    "error": metadata.get("error"),
+                }
+            ),
+            200,
+        )
--- a/application/api/user/idempotency.py
+++ b/application/api/user/idempotency.py
@@ -0,0 +1,237 @@
+"""Per-Celery-task idempotency wrapper backed by ``task_dedup``."""
+
+from __future__ import annotations
+
+import functools
+import logging
+import threading
+import uuid
+from typing import Any, Callable, Optional
+
+from application.storage.db.repositories.idempotency import IdempotencyRepository
+from application.storage.db.session import db_readonly, db_session
+
+
+logger = logging.getLogger(__name__)
+
+
+# Poison-loop cap; transient-failure headroom without infinite retry.
+MAX_TASK_ATTEMPTS = 5
+
+# 30s heartbeat / 60s TTL → ~2 missed ticks of slack before reclaim.
+LEASE_TTL_SECONDS = 60
+LEASE_HEARTBEAT_INTERVAL = 30
+
+# 10 × 60s ≈ 5 min of deferral before giving up on a held lease.
+LEASE_RETRY_MAX = 10
+
+
+def with_idempotency(task_name: str) -> Callable[[Callable[..., Any]], Callable[..., Any]]:
+    """Short-circuit on completed key; gate concurrent runs via a lease.
+
+    Entry short-circuits:
+      - completed row → return cached result
+      - live lease held → retry(countdown=LEASE_TTL_SECONDS)
+      - attempt_count > MAX_TASK_ATTEMPTS → poison-loop alert
+    Success writes ``completed``; exceptions leave ``pending`` for
+    autoretry until the poison-loop guard trips.
+    """
+
+    def decorator(fn: Callable[..., Any]) -> Callable[..., Any]:
+        @functools.wraps(fn)
+        def wrapper(self, *args: Any, idempotency_key: Any = None, **kwargs: Any) -> Any:
+            key = idempotency_key if isinstance(idempotency_key, str) and idempotency_key else None
+            if key is None:
+                return fn(self, *args, idempotency_key=idempotency_key, **kwargs)
+
+            cached = _lookup_completed(key)
+            if cached is not None:
+                logger.info(
+                    "idempotency hit for task=%s key=%s — returning cached result",
+                    task_name, key,
+                )
+                return cached
+
+            owner_id = str(uuid.uuid4())
+            attempt = _try_claim_lease(
+                key, task_name, _safe_task_id(self), owner_id,
+            )
+            if attempt is None:
+                # Live lease held by another worker. Re-queue and bail
+                # quickly — by the time the retry fires (LEASE_TTL
+                # seconds), Worker 1 has either finalised (we'll hit
+                # ``_lookup_completed`` and return cached) or its lease
+                # has expired and we can claim.
+                logger.info(
+                    "idempotency: live lease held; deferring task=%s key=%s",
+                    task_name, key,
+                )
+                raise self.retry(
+                    countdown=LEASE_TTL_SECONDS,
+                    max_retries=LEASE_RETRY_MAX,
+                )
+
+            if attempt > MAX_TASK_ATTEMPTS:
+                logger.error(
+                    "idempotency poison-loop guard: task=%s key=%s attempts=%s",
+                    task_name, key, attempt,
+                    extra={
+                        "alert": "idempotency_poison_loop",
+                        "task_name": task_name,
+                        "idempotency_key": key,
+                        "attempts": attempt,
+                    },
+                )
+                poisoned = {
+                    "success": False,
+                    "error": "idempotency poison-loop guard tripped",
+                    "attempts": attempt,
+                }
+                _finalize(key, poisoned, status="failed")
+                return poisoned
+
+            heartbeat_thread, heartbeat_stop = _start_lease_heartbeat(
+                key, owner_id,
+            )
+            try:
+                result = fn(self, *args, idempotency_key=idempotency_key, **kwargs)
+                _finalize(key, result, status="completed")
+                return result
+            except Exception:
+                # Drop the lease so the next retry doesn't wait LEASE_TTL.
+                _release_lease(key, owner_id)
+                raise
+            finally:
+                _stop_lease_heartbeat(heartbeat_thread, heartbeat_stop)
+
+        return wrapper
+
+    return decorator
+
+
+def _lookup_completed(key: str) -> Any:
+    """Return cached ``result_json`` if a completed row exists for ``key``, else None."""
+    with db_readonly() as conn:
+        row = IdempotencyRepository(conn).get_task(key)
+    if row is None:
+        return None
+    if row.get("status") != "completed":
+        return None
+    return row.get("result_json")
+
+
+def _try_claim_lease(
+    key: str, task_name: str, task_id: str, owner_id: str,
+) -> Optional[int]:
+    """Atomic CAS; returns ``attempt_count`` or ``None`` when held.
+
+    DB outage → treated as ``attempt=1`` so transient failures don't
+    block all task execution; reconciler repairs the lease columns.
+    """
+    try:
+        with db_session() as conn:
+            return IdempotencyRepository(conn).try_claim_lease(
+                key=key,
+                task_name=task_name,
+                task_id=task_id,
+                owner_id=owner_id,
+                ttl_seconds=LEASE_TTL_SECONDS,
+            )
+    except Exception:
+        logger.exception(
+            "idempotency lease-claim failed for key=%s task=%s", key, task_name,
+        )
+        return 1
+
+
+def _finalize(key: str, result_json: Any, *, status: str) -> None:
+    """Best-effort terminal write. Never let DB outage fail the task."""
+    try:
+        with db_session() as conn:
+            IdempotencyRepository(conn).finalize_task(
+                key=key, result_json=result_json, status=status,
+            )
+    except Exception:
+        logger.exception(
+            "idempotency finalize failed for key=%s status=%s", key, status,
+        )
+
+
+def _release_lease(key: str, owner_id: str) -> None:
+    """Best-effort lease release on the wrapper's exception path."""
+    try:
+        with db_session() as conn:
+            IdempotencyRepository(conn).release_lease(key, owner_id)
+    except Exception:
+        logger.exception("idempotency release-lease failed for key=%s", key)
+
+
+def _start_lease_heartbeat(
+    key: str, owner_id: str,
+) -> tuple[threading.Thread, threading.Event]:
+    """Spawn a daemon thread that bumps ``lease_expires_at`` every
+    :data:`LEASE_HEARTBEAT_INTERVAL` seconds until ``stop_event`` fires.
+
+    Mirrors ``application.worker._start_ingest_heartbeat`` so the two
+    durability heartbeats share shape and cadence.
+    """
+    stop_event = threading.Event()
+    thread = threading.Thread(
+        target=_lease_heartbeat_loop,
+        args=(key, owner_id, stop_event, LEASE_HEARTBEAT_INTERVAL),
+        daemon=True,
+        name=f"idempotency-lease-heartbeat:{key[:32]}",
+    )
+    thread.start()
+    return thread, stop_event
+
+
+def _stop_lease_heartbeat(
+    thread: threading.Thread, stop_event: threading.Event,
+) -> None:
+    """Signal the heartbeat thread to exit and join with a short timeout."""
+    stop_event.set()
+    thread.join(timeout=10)
+
+
+def _lease_heartbeat_loop(
+    key: str,
+    owner_id: str,
+    stop_event: threading.Event,
+    interval: int,
+) -> None:
+    """Refresh the lease until ``stop_event`` is set or ownership is lost.
+
+    A failed refresh (rowcount 0) means another worker stole the lease
+    after expiry — at that point the damage is already possible, so we
+    log and keep ticking. Don't escalate to thread death; the main task
+    body needs to keep running so its outcome is at least *recorded*.
+    """
+    while not stop_event.wait(interval):
+        try:
+            with db_session() as conn:
+                still_owned = IdempotencyRepository(conn).refresh_lease(
+                    key=key, owner_id=owner_id, ttl_seconds=LEASE_TTL_SECONDS,
+                )
+            if not still_owned:
+                logger.warning(
+                    "idempotency lease lost mid-task for key=%s "
+                    "(another worker may have taken over)",
+                    key,
+                )
+        except Exception:
+            logger.exception(
+                "idempotency lease-heartbeat tick failed for key=%s", key,
+            )
+
+
+def _safe_task_id(task_self: Any) -> str:
+    """Best-effort extraction of ``self.request.id`` from a Celery task."""
+    try:
+        request = getattr(task_self, "request", None)
+        task_id: Optional[str] = (
+            getattr(request, "id", None) if request is not None else None
+        )
+    except Exception:
+        task_id = None
+    return task_id or "unknown"
--- a/application/api/user/models/routes.py
+++ b/application/api/user/models/routes.py
@@ -1,18 +1,135 @@
-from flask import current_app, jsonify, make_response
+"""Model routes.
+
+- ``GET /api/models`` — list available models for the current user.
+  Combines the built-in catalog with the user's BYOM records.
+- ``GET/POST/PATCH/DELETE /api/user/models[/<id>]`` — CRUD for the
+  user's own OpenAI-compatible model registrations (BYOM).
+- ``POST /api/user/models/<id>/test`` — sanity-check the upstream
+  endpoint with a tiny request.
+
+Every BYOM endpoint is user-scoped at the repository layer
+(every query filters on ``user_id`` from ``request.decoded_token``).
+"""
+
+from __future__ import annotations
+
+import logging
+
+import requests
+from flask import current_app, jsonify, make_response, request
 from flask_restx import Namespace, Resource

-from application.core.model_settings import ModelRegistry
+from application.api import api
+from application.core.model_registry import ModelRegistry
+from application.security.safe_url import (
+    UnsafeUserUrlError,
+    pinned_post,
+    validate_user_base_url,
+)
+from application.storage.db.repositories.user_custom_models import (
+    UserCustomModelsRepository,
+)
+from application.storage.db.session import db_readonly, db_session
+from application.utils import check_required_fields
+
+
+logger = logging.getLogger(__name__)
+

 models_ns = Namespace("models", description="Available models", path="/api")


+_CONTEXT_WINDOW_MIN = 1_000
+_CONTEXT_WINDOW_MAX = 10_000_000
+
+
+def _user_id_or_401():
+    decoded_token = request.decoded_token
+    if not decoded_token:
+        return None, make_response(jsonify({"success": False}), 401)
+    user_id = decoded_token.get("sub")
+    if not user_id:
+        return None, make_response(jsonify({"success": False}), 401)
+    return user_id, None
+
+
+def _normalize_capabilities(raw) -> dict:
+    """Coerce + bound the user-supplied capabilities payload."""
+    raw = raw or {}
+    out = {}
+    if "supports_tools" in raw:
+        out["supports_tools"] = bool(raw["supports_tools"])
+    if "supports_structured_output" in raw:
+        out["supports_structured_output"] = bool(raw["supports_structured_output"])
+    if "supports_streaming" in raw:
+        out["supports_streaming"] = bool(raw["supports_streaming"])
+    if "attachments" in raw:
+        atts = raw["attachments"] or []
+        if not isinstance(atts, list):
+            raise ValueError("'capabilities.attachments' must be a list")
+        coerced = [str(a) for a in atts]
+        # Reject unknown aliases at the API boundary so bad payloads
+        # never reach the registry layer (where lenient expansion just
+        # drops them). Raw MIME types (containing ``/``) pass through
+        # unchanged for parity with the built-in YAML schema.
+        from application.core.model_yaml import builtin_attachment_aliases
+
+        aliases = builtin_attachment_aliases()
+        for entry in coerced:
+            if "/" in entry:
+                continue
+            if entry not in aliases:
+                valid = ", ".join(sorted(aliases.keys())) or "<none defined>"
+                raise ValueError(
+                    f"unknown attachment alias '{entry}' in "
+                    f"'capabilities.attachments'. Valid aliases: {valid}, "
+                    f"or use a raw MIME type like 'image/png'."
+                )
+        out["attachments"] = coerced
+    if "context_window" in raw:
+        try:
+            cw = int(raw["context_window"])
+        except (TypeError, ValueError):
+            raise ValueError("'capabilities.context_window' must be an integer")
+        if not (_CONTEXT_WINDOW_MIN <= cw <= _CONTEXT_WINDOW_MAX):
+            raise ValueError(
+                f"'capabilities.context_window' must be between "
+                f"{_CONTEXT_WINDOW_MIN} and {_CONTEXT_WINDOW_MAX}"
+            )
+        out["context_window"] = cw
+    return out
+
+
+def _row_to_response(row: dict) -> dict:
+    """Wire-format projection — never includes the API key."""
+    return {
+        "id": str(row["id"]),
+        "upstream_model_id": row["upstream_model_id"],
+        "display_name": row["display_name"],
+        "description": row.get("description") or "",
+        "base_url": row["base_url"],
+        "capabilities": row.get("capabilities") or {},
+        "enabled": bool(row.get("enabled", True)),
+        "source": "user",
+    }
+
+
@models_ns.route("/models")
 class ModelsListResource(Resource):
    def get(self):
-        """Get list of available models with their capabilities."""
+        """Get list of available models with their capabilities.
+
+        When the request is authenticated, the response includes the
+        user's own BYOM registrations alongside the built-in catalog.
+        """
        try:
+            user_id = None
+            decoded_token = getattr(request, "decoded_token", None)
+            if decoded_token:
+                user_id = decoded_token.get("sub")
+
            registry = ModelRegistry.get_instance()
-            models = registry.get_enabled_models()
+            models = registry.get_enabled_models(user_id=user_id)

            response = {
                "models": [model.to_dict() for model in models],
@@ -23,3 +140,382 @@ class ModelsListResource(Resource):
            current_app.logger.error(f"Error fetching models: {err}", exc_info=True)
            return make_response(jsonify({"success": False}), 500)
        return make_response(jsonify(response), 200)
+
+
+@models_ns.route("/user/models")
+class UserModelsCollectionResource(Resource):
+    @api.doc(description="List the current user's BYOM custom models")
+    def get(self):
+        user_id, err = _user_id_or_401()
+        if err:
+            return err
+        try:
+            with db_readonly() as conn:
+                rows = UserCustomModelsRepository(conn).list_for_user(user_id)
+            return make_response(
+                jsonify({"models": [_row_to_response(r) for r in rows]}), 200
+            )
+        except Exception as e:
+            current_app.logger.error(
+                f"Error listing user custom models: {e}", exc_info=True
+            )
+            return make_response(jsonify({"success": False}), 500)
+
+    @api.doc(description="Register a new BYOM custom model")
+    def post(self):
+        user_id, err = _user_id_or_401()
+        if err:
+            return err
+
+        data = request.get_json() or {}
+        missing = check_required_fields(
+            data,
+            ["upstream_model_id", "display_name", "base_url", "api_key"],
+        )
+        if missing:
+            return missing
+
+        # SECURITY: reject blank api_key — would leak instance API key
+        # to the user-supplied base_url via LLMCreator fallback.
+        for required_nonblank in (
+            "upstream_model_id",
+            "display_name",
+            "base_url",
+            "api_key",
+        ):
+            value = data.get(required_nonblank)
+            if not isinstance(value, str) or not value.strip():
+                return make_response(
+                    jsonify(
+                        {
+                            "success": False,
+                            "error": f"'{required_nonblank}' must be a non-empty string",
+                        }
+                    ),
+                    400,
+                )
+
+        # SSRF guard at create time. Re-runs at dispatch time (LLMCreator)
+        # as defense in depth against DNS rebinding and pre-guard rows.
+        try:
+            validate_user_base_url(data["base_url"])
+        except UnsafeUserUrlError as e:
+            return make_response(
+                jsonify({"success": False, "error": str(e)}), 400
+            )
+
+        try:
+            capabilities = _normalize_capabilities(data.get("capabilities"))
+        except ValueError as e:
+            return make_response(
+                jsonify({"success": False, "error": str(e)}), 400
+            )
+
+        try:
+            with db_session() as conn:
+                row = UserCustomModelsRepository(conn).create(
+                    user_id=user_id,
+                    upstream_model_id=data["upstream_model_id"],
+                    display_name=data["display_name"],
+                    description=data.get("description") or "",
+                    base_url=data["base_url"],
+                    api_key_plaintext=data["api_key"],
+                    capabilities=capabilities,
+                    enabled=bool(data.get("enabled", True)),
+                )
+        except Exception as e:
+            current_app.logger.error(
+                f"Error creating user custom model: {e}", exc_info=True
+            )
+            return make_response(jsonify({"success": False}), 500)
+
+        ModelRegistry.invalidate_user(user_id)
+        return make_response(jsonify(_row_to_response(row)), 201)
+
+
+@models_ns.route("/user/models/<string:model_id>")
+class UserModelResource(Resource):
+    @api.doc(description="Get one BYOM custom model")
+    def get(self, model_id):
+        user_id, err = _user_id_or_401()
+        if err:
+            return err
+        try:
+            with db_readonly() as conn:
+                row = UserCustomModelsRepository(conn).get(model_id, user_id)
+        except Exception as e:
+            current_app.logger.error(
+                f"Error fetching user custom model: {e}", exc_info=True
+            )
+            return make_response(jsonify({"success": False}), 500)
+        if row is None:
+            return make_response(jsonify({"success": False}), 404)
+        return make_response(jsonify(_row_to_response(row)), 200)
+
+    @api.doc(description="Update a BYOM custom model (partial)")
+    def patch(self, model_id):
+        user_id, err = _user_id_or_401()
+        if err:
+            return err
+
+        data = request.get_json() or {}
+
+        # Reject present-but-blank values for fields where blank doesn't
+        # mean "no change". (The api_key special case — blank means "keep
+        # existing" — is handled below.)
+        for required_nonblank in (
+            "upstream_model_id",
+            "display_name",
+            "base_url",
+        ):
+            if required_nonblank in data:
+                value = data[required_nonblank]
+                if not isinstance(value, str) or not value.strip():
+                    return make_response(
+                        jsonify(
+                            {
+                                "success": False,
+                                "error": f"'{required_nonblank}' cannot be blank",
+                            }
+                        ),
+                        400,
+                    )
+
+        if "base_url" in data and data["base_url"]:
+            try:
+                validate_user_base_url(data["base_url"])
+            except UnsafeUserUrlError as e:
+                return make_response(
+                    jsonify({"success": False, "error": str(e)}), 400
+                )
+
+        update_fields: dict = {}
+        for k in (
+            "upstream_model_id",
+            "display_name",
+            "description",
+            "base_url",
+            "enabled",
+        ):
+            if k in data:
+                update_fields[k] = data[k]
+
+        if "capabilities" in data:
+            try:
+                update_fields["capabilities"] = _normalize_capabilities(
+                    data["capabilities"]
+                )
+            except ValueError as e:
+                return make_response(
+                    jsonify({"success": False, "error": str(e)}), 400
+                )
+
+        # PATCH semantics: blank/missing api_key → keep the existing
+        # ciphertext; non-empty api_key → re-encrypt and replace.
+        if data.get("api_key"):
+            update_fields["api_key_plaintext"] = data["api_key"]
+
+        if not update_fields:
+            return make_response(
+                jsonify({"success": False, "error": "no updatable fields"}), 400
+            )
+
+        try:
+            with db_session() as conn:
+                ok = UserCustomModelsRepository(conn).update(
+                    model_id, user_id, update_fields
+                )
+        except Exception as e:
+            current_app.logger.error(
+                f"Error updating user custom model: {e}", exc_info=True
+            )
+            return make_response(jsonify({"success": False}), 500)
+
+        if not ok:
+            return make_response(jsonify({"success": False}), 404)
+
+        ModelRegistry.invalidate_user(user_id)
+        with db_readonly() as conn:
+            row = UserCustomModelsRepository(conn).get(model_id, user_id)
+        return make_response(jsonify(_row_to_response(row)), 200)
+
+    @api.doc(description="Delete a BYOM custom model")
+    def delete(self, model_id):
+        user_id, err = _user_id_or_401()
+        if err:
+            return err
+        try:
+            with db_session() as conn:
+                ok = UserCustomModelsRepository(conn).delete(model_id, user_id)
+        except Exception as e:
+            current_app.logger.error(
+                f"Error deleting user custom model: {e}", exc_info=True
+            )
+            return make_response(jsonify({"success": False}), 500)
+        if not ok:
+            return make_response(jsonify({"success": False}), 404)
+
+        ModelRegistry.invalidate_user(user_id)
+        return make_response(jsonify({"success": True}), 200)
+
+
+def _run_connection_test(
+    base_url: str, api_key: str, upstream_model_id: str
+):
+    """Send a 1-token chat-completion to verify a BYOM endpoint.
+
+    Returns ``(body, http_status)``. Upstream errors return 200 with
+    ``ok=False`` so the UI can render inline errors; only local SSRF
+    rejection returns 400.
+    """
+    url = base_url.rstrip("/") + "/chat/completions"
+    payload = {
+        "model": upstream_model_id,
+        "messages": [{"role": "user", "content": "hi"}],
+        "max_tokens": 1,
+        "stream": False,
+    }
+    headers = {
+        "Authorization": f"Bearer {api_key}",
+        "Content-Type": "application/json",
+    }
+    try:
+        # pinned_post closes the DNS-rebinding window. Redirects off
+        # because 3xx could bounce to an internal address (the SSRF
+        # guard only validates the supplied URL).
+        resp = pinned_post(
+            url,
+            json=payload,
+            headers=headers,
+            timeout=5,
+            allow_redirects=False,
+        )
+    except UnsafeUserUrlError as e:
+        return {"ok": False, "error": str(e)}, 400
+    except requests.RequestException as e:
+        return {"ok": False, "error": f"connection error: {e}"}, 200
+
+    if 300 <= resp.status_code < 400:
+        return (
+            {
+                "ok": False,
+                "error": (
+                    f"upstream returned HTTP {resp.status_code} "
+                    "redirect; refusing to follow"
+                ),
+            },
+            200,
+        )
+
+    if resp.status_code >= 400:
+        # Cap and only reflect JSON to avoid body-exfil via non-API responses.
+        content_type = (resp.headers.get("Content-Type") or "").lower()
+        if "application/json" in content_type:
+            text = (resp.text or "")[:500]
+            error_msg = f"upstream returned HTTP {resp.status_code}: {text}"
+        else:
+            error_msg = f"upstream returned HTTP {resp.status_code}"
+        return {"ok": False, "error": error_msg}, 200
+
+    return {"ok": True}, 200
+
+
+@models_ns.route("/user/models/test")
+class UserModelTestPayloadResource(Resource):
+    @api.doc(
+        description=(
+            "Test an arbitrary BYOM payload (display_name / model id / "
+            "base_url / api_key) without saving. Used by the UI's 'Test "
+            "connection' button so the user can validate before they "
+            "Save. Same SSRF guard, same 1-token request, same 5s "
+            "timeout as the by-id variant."
+        )
+    )
+    def post(self):
+        user_id, err = _user_id_or_401()
+        if err:
+            return err
+
+        data = request.get_json() or {}
+        missing = check_required_fields(
+            data, ["base_url", "api_key", "upstream_model_id"]
+        )
+        if missing:
+            return missing
+
+        body, status = _run_connection_test(
+            data["base_url"], data["api_key"], data["upstream_model_id"]
+        )
+        return make_response(jsonify(body), status)
+
+
+@models_ns.route("/user/models/<string:model_id>/test")
+class UserModelTestResource(Resource):
+    @api.doc(
+        description=(
+            "Test a saved BYOM record. Defaults to the stored "
+            "base_url / upstream_model_id / encrypted api_key, but "
+            "any of those can be overridden via the request body so "
+            "the UI can test in-flight edits before saving. Used by "
+            "the 'Test connection' button in edit mode."
+        )
+    )
+    def post(self, model_id):
+        user_id, err = _user_id_or_401()
+        if err:
+            return err
+
+        data = request.get_json() or {}
+        # Per-field overrides; blank/missing falls back to stored value.
+        override_base_url = (data.get("base_url") or "").strip() or None
+        override_upstream_model_id = (
+            data.get("upstream_model_id") or ""
+        ).strip() or None
+        override_api_key = (data.get("api_key") or "").strip() or None
+
+        try:
+            with db_readonly() as conn:
+                repo = UserCustomModelsRepository(conn)
+                row = repo.get(model_id, user_id)
+                if row is None:
+                    return make_response(jsonify({"success": False}), 404)
+                stored_api_key = (
+                    repo._decrypt_api_key(
+                        row.get("api_key_encrypted", ""), user_id
+                    )
+                    if not override_api_key
+                    else None
+                )
+        except Exception as e:
+            current_app.logger.error(
+                f"Error loading user custom model for test: {e}", exc_info=True
+            )
+            return make_response(
+                jsonify({"ok": False, "error": "internal error loading model"}),
+                500,
+            )
+
+        api_key = override_api_key or stored_api_key
+        if not api_key:
+            return make_response(
+                jsonify(
+                    {
+                        "ok": False,
+                        "error": (
+                            "Stored API key could not be decrypted. The "
+                            "encryption secret may have rotated. Re-save "
+                            "the model with the API key to recover."
+                        ),
+                    }
+                ),
+                400,
+            )
+
+        base_url = override_base_url or row["base_url"]
+        upstream_model_id = (
+            override_upstream_model_id or row["upstream_model_id"]
+        )
+        body, status = _run_connection_test(
+            base_url, api_key, upstream_model_id
+        )
+        return make_response(jsonify(body), status)
--- a/application/api/user/prompts/routes.py
+++ b/application/api/user/prompts/routes.py
@@ -2,14 +2,13 @@

 import os

-from bson.objectid import ObjectId
 from flask import current_app, jsonify, make_response, request
 from flask_restx import fields, Namespace, Resource

 from application.api import api
-from application.api.user.base import current_dir, prompts_collection
-from application.storage.db.dual_write import dual_write
+from application.api.user.base import current_dir
 from application.storage.db.repositories.prompts import PromptsRepository
+from application.storage.db.session import db_readonly, db_session
 from application.utils import check_required_fields

 prompts_ns = Namespace(
@@ -42,21 +41,9 @@ class CreatePrompt(Resource):
            return missing_fields
        user = decoded_token.get("sub")
        try:
-
-            resp = prompts_collection.insert_one(
-                {
-                    "name": data["name"],
-                    "content": data["content"],
-                    "user": user,
-                }
-            )
-            new_id = str(resp.inserted_id)
-            dual_write(
-                PromptsRepository,
-                lambda repo, u=user, n=data["name"], c=data["content"], mid=new_id: repo.create(
-                    u, n, c, legacy_mongo_id=mid,
-                ),
-            )
+            with db_session() as conn:
+                prompt = PromptsRepository(conn).create(user, data["name"], data["content"])
+            new_id = str(prompt["id"])
        except Exception as err:
            current_app.logger.error(f"Error creating prompt: {err}", exc_info=True)
            return make_response(jsonify({"success": False}), 400)
@@ -72,17 +59,17 @@ class GetPrompts(Resource):
            return make_response(jsonify({"success": False}), 401)
        user = decoded_token.get("sub")
        try:
-            prompts = prompts_collection.find({"user": user})
+            with db_readonly() as conn:
+                prompts = PromptsRepository(conn).list_for_user(user)
            list_prompts = [
                {"id": "default", "name": "default", "type": "public"},
                {"id": "creative", "name": "creative", "type": "public"},
                {"id": "strict", "name": "strict", "type": "public"},
            ]
-
            for prompt in prompts:
                list_prompts.append(
                    {
-                        "id": str(prompt["_id"]),
+                        "id": str(prompt["id"]),
                        "name": prompt["name"],
                        "type": "private",
                    }
@@ -127,9 +114,12 @@ class GetSinglePrompt(Resource):
                ) as f:
                    chat_reduce_strict = f.read()
                return make_response(jsonify({"content": chat_reduce_strict}), 200)
-            prompt = prompts_collection.find_one(
-                {"_id": ObjectId(prompt_id), "user": user}
-            )
+            with db_readonly() as conn:
+                prompt = PromptsRepository(conn).get_any(prompt_id, user)
+            if not prompt:
+                return make_response(
+                    jsonify({"success": False, "message": "Prompt not found"}), 404
+                )
        except Exception as err:
            current_app.logger.error(f"Error retrieving prompt: {err}", exc_info=True)
            return make_response(jsonify({"success": False}), 400)
@@ -156,11 +146,15 @@ class DeletePrompt(Resource):
        if missing_fields:
            return missing_fields
        try:
-            prompts_collection.delete_one({"_id": ObjectId(data["id"]), "user": user})
-            dual_write(
-                PromptsRepository,
-                lambda repo, pid=data["id"], u=user: repo.delete_by_legacy_id(pid, u),
-            )
+            with db_session() as conn:
+                repo = PromptsRepository(conn)
+                prompt = repo.get_any(data["id"], user)
+                if not prompt:
+                    return make_response(
+                        jsonify({"success": False, "message": "Prompt not found"}),
+                        404,
+                    )
+                repo.delete(str(prompt["id"]), user)
        except Exception as err:
            current_app.logger.error(f"Error deleting prompt: {err}", exc_info=True)
            return make_response(jsonify({"success": False}), 400)
@@ -193,16 +187,15 @@ class UpdatePrompt(Resource):
        if missing_fields:
            return missing_fields
        try:
-            prompts_collection.update_one(
-                {"_id": ObjectId(data["id"]), "user": user},
-                {"$set": {"name": data["name"], "content": data["content"]}},
-            )
-            dual_write(
-                PromptsRepository,
-                lambda repo, pid=data["id"], u=user, n=data["name"], c=data["content"]: repo.update_by_legacy_id(
-                    pid, u, n, c,
-                ),
-            )
+            with db_session() as conn:
+                repo = PromptsRepository(conn)
+                prompt = repo.get_any(data["id"], user)
+                if not prompt:
+                    return make_response(
+                        jsonify({"success": False, "message": "Prompt not found"}),
+                        404,
+                    )
+                repo.update(str(prompt["id"]), user, data["name"], data["content"])
        except Exception as err:
            current_app.logger.error(f"Error updating prompt: {err}", exc_info=True)
            return make_response(jsonify({"success": False}), 400)
--- a/application/api/user/reconciliation.py
+++ b/application/api/user/reconciliation.py
@@ -0,0 +1,196 @@
+"""Reconciler tick: sweep stuck rows and escalate to terminal status + alert."""
+
+from __future__ import annotations
+
+import logging
+import uuid
+from typing import Any, Dict, Optional
+
+from sqlalchemy import Connection
+
+from application.api.user.idempotency import MAX_TASK_ATTEMPTS
+from application.core.settings import settings
+from application.storage.db.engine import get_engine
+from application.storage.db.repositories.reconciliation import (
+    ReconciliationRepository,
+)
+from application.storage.db.repositories.stack_logs import StackLogsRepository
+
+logger = logging.getLogger(__name__)
+
+
+MAX_MESSAGE_RECONCILE_ATTEMPTS = 3
+
+
+def run_reconciliation() -> Dict[str, Any]:
+    """Single tick of the reconciler. Five sweeps, FOR UPDATE SKIP LOCKED.
+
+    Stuck ``executed`` tool calls always flip to ``failed`` — operators
+    handle cleanup manually via the structured alert. The side effect is
+    assumed to have committed; no automated rollback is attempted.
+
+    Stuck ``task_dedup`` rows (lease expired AND attempts >= max)
+    promote to ``failed`` so a same-key retry can re-claim instead of
+    sitting in ``pending`` until 24 h TTL.
+    """
+    if not settings.POSTGRES_URI:
+        return {
+            "messages_failed": 0,
+            "tool_calls_failed": 0,
+            "skipped": "POSTGRES_URI not set",
+        }
+
+    engine = get_engine()
+    summary = {
+        "messages_failed": 0,
+        "tool_calls_failed": 0,
+        "ingests_stalled": 0,
+        "idempotency_pending_failed": 0,
+    }
+
+    with engine.begin() as conn:
+        repo = ReconciliationRepository(conn)
+        for msg in repo.find_and_lock_stuck_messages():
+            new_count = repo.increment_message_reconcile_attempts(msg["id"])
+            if new_count >= MAX_MESSAGE_RECONCILE_ATTEMPTS:
+                repo.mark_message_failed(
+                    msg["id"],
+                    error=(
+                        "reconciler: stuck in pending/streaming for >5 min "
+                        f"after {new_count} attempts"
+                    ),
+                )
+                summary["messages_failed"] += 1
+                _emit_alert(
+                    conn,
+                    name="reconciler_message_failed",
+                    user_id=msg.get("user_id"),
+                    detail={
+                        "message_id": str(msg["id"]),
+                        "attempts": new_count,
+                    },
+                )
+
+    with engine.begin() as conn:
+        repo = ReconciliationRepository(conn)
+        for row in repo.find_and_lock_proposed_tool_calls():
+            repo.mark_tool_call_failed(
+                row["call_id"],
+                error=(
+                    "reconciler: stuck in 'proposed' for >5 min; "
+                    "side effect status unknown"
+                ),
+            )
+            summary["tool_calls_failed"] += 1
+            _emit_alert(
+                conn,
+                name="reconciler_tool_call_failed_proposed",
+                user_id=None,
+                detail={
+                    "call_id": row["call_id"],
+                    "tool_name": row.get("tool_name"),
+                },
+            )
+
+    with engine.begin() as conn:
+        repo = ReconciliationRepository(conn)
+        for row in repo.find_and_lock_executed_tool_calls():
+            repo.mark_tool_call_failed(
+                row["call_id"],
+                error=(
+                    "reconciler: executed-not-confirmed; side effect "
+                    "assumed committed, manual cleanup required"
+                ),
+            )
+            summary["tool_calls_failed"] += 1
+            _emit_alert(
+                conn,
+                name="reconciler_tool_call_failed_executed",
+                user_id=None,
+                detail={
+                    "call_id": row["call_id"],
+                    "tool_name": row.get("tool_name"),
+                    "action_name": row.get("action_name"),
+                },
+            )
+
+    # Q4: ingest checkpoints whose heartbeat has gone silent. The
+    # reconciler only escalates (alerts) — it doesn't kill the worker
+    # or roll back the partial embed. The next dispatch resumes from
+    # ``last_index`` thanks to the per-chunk checkpoint, so this is an
+    # observability sweep, not a recovery action.
+    with engine.begin() as conn:
+        repo = ReconciliationRepository(conn)
+        for row in repo.find_and_lock_stalled_ingests():
+            summary["ingests_stalled"] += 1
+            _emit_alert(
+                conn,
+                name="reconciler_ingest_stalled",
+                user_id=None,
+                detail={
+                    "source_id": str(row.get("source_id")),
+                    "embedded_chunks": row.get("embedded_chunks"),
+                    "total_chunks": row.get("total_chunks"),
+                    "last_updated": str(row.get("last_updated")),
+                },
+            )
+            # Bump the heartbeat so we don't re-alert every tick.
+            repo.touch_ingest_progress(str(row["source_id"]))
+
+    # Q5: idempotency rows whose lease expired with attempts exhausted.
+    # The wrapper's poison-loop guard normally finalises these, but if
+    # the wrapper itself died mid-task (worker SIGKILL, OOM during
+    # heartbeat) the row sits in ``pending`` blocking same-key retries
+    # via ``_lookup_completed`` returning None for the whole 24 h TTL.
+    # Promote to ``failed`` so a retry can re-claim and either resume
+    # or fail loudly.
+    with engine.begin() as conn:
+        repo = ReconciliationRepository(conn)
+        for row in repo.find_stuck_idempotency_pending(
+            max_attempts=MAX_TASK_ATTEMPTS,
+        ):
+            error_msg = (
+                "reconciler: idempotency lease expired with attempts "
+                f"({row['attempt_count']}) >= {MAX_TASK_ATTEMPTS}; "
+                "task abandoned"
+            )
+            repo.mark_idempotency_pending_failed(
+                row["idempotency_key"], error=error_msg,
+            )
+            summary["idempotency_pending_failed"] += 1
+            _emit_alert(
+                conn,
+                name="reconciler_idempotency_pending_failed",
+                user_id=None,
+                detail={
+                    "idempotency_key": row["idempotency_key"],
+                    "task_name": row.get("task_name"),
+                    "task_id": row.get("task_id"),
+                    "attempts": row.get("attempt_count"),
+                },
+            )
+
+    return summary
+
+
+def _emit_alert(
+    conn: Connection,
+    *,
+    name: str,
+    user_id: Optional[str],
+    detail: Dict[str, Any],
+) -> None:
+    """Structured ``logger.error`` plus a ``stack_logs`` row for operators."""
+    extra = {"alert": name, **detail}
+    logger.error("reconciler alert: %s", name, extra=extra)
+    try:
+        StackLogsRepository(conn).insert(
+            activity_id=str(uuid.uuid4()),
+            endpoint="reconciliation_worker",
+            level="ERROR",
+            user_id=user_id,
+            query=name,
+            stacks=[extra],
+        )
+    except Exception:
+        logger.exception("reconciler: failed to write stack_logs row for %s", name)
--- a/application/api/user/sharing/routes.py
+++ b/application/api/user/sharing/routes.py
@@ -2,89 +2,126 @@

 import uuid

-from bson.binary import Binary, UuidRepresentation
-from bson.dbref import DBRef
-from bson.objectid import ObjectId
 from flask import current_app, jsonify, make_response, request
 from flask_restx import fields, inputs, Namespace, Resource
+from sqlalchemy import text as _sql_text

 from application.api import api
-from application.api.user.base import (
-    agents_collection,
-    attachments_collection,
-    conversations_collection,
-    shared_conversations_collections,
-)
-from application.storage.db.dual_write import dual_write
+from application.storage.db.base_repository import looks_like_uuid
+from application.storage.db.repositories.agents import AgentsRepository
+from application.storage.db.repositories.attachments import AttachmentsRepository
 from application.storage.db.repositories.conversations import ConversationsRepository
 from application.storage.db.repositories.shared_conversations import (
    SharedConversationsRepository,
 )
+from application.storage.db.session import db_readonly, db_session
 from application.utils import check_required_fields


-def _dual_write_share(
-    mongo_conv_id: str,
-    share_uuid: str,
-    user: str,
-    *,
-    is_promptable: bool,
-    first_n_queries: int,
-    api_key: str | None,
-    prompt_id: str | None = None,
-    chunks: int | None = None,
-) -> None:
-    """Mirror a Mongo share-record insert into Postgres.
-
-    Preserves the Mongo-generated UUID so public ``/shared/{uuid}`` URLs
-    resolve from both stores during cutover.
-    """
-    def _write(repo: SharedConversationsRepository) -> None:
-        conv = ConversationsRepository(repo._conn).get_by_legacy_id(
-            mongo_conv_id, user_id=user,
-        )
-        if conv is None:
-            return
-        # prompt_id / chunks are only meaningful for promptable shares;
-        # prompt_id is often the string "default" or an ObjectId that
-        # hasn't been migrated — pass as-is and let the repo drop
-        # non-UUID values. Scope the prompt lookup by user_id so an
-        # authenticated caller can't link another user's prompt into
-        # their share record.
-        resolved_prompt_id = None
-        if prompt_id and len(str(prompt_id)) == 24:
-            from sqlalchemy import text as _text
-            row = repo._conn.execute(
-                _text(
-                    "SELECT id FROM prompts "
-                    "WHERE legacy_mongo_id = :legacy_id AND user_id = :user_id"
-                ),
-                {"legacy_id": str(prompt_id), "user_id": user},
-            ).fetchone()
-            if row:
-                resolved_prompt_id = str(row[0])
-        # get_or_create is race-free on the PG side thanks to the
-        # composite partial unique index on the dedup tuple
-        # (migration 0008). It converges concurrent share requests to
-        # a single row.
-        repo.get_or_create(
-            conv["id"],
-            user,
-            is_promptable=is_promptable,
-            first_n_queries=first_n_queries,
-            api_key=api_key,
-            prompt_id=resolved_prompt_id,
-            chunks=chunks,
-            share_uuid=share_uuid,
-        )
-
-    dual_write(SharedConversationsRepository, _write)
-
 sharing_ns = Namespace(
    "sharing", description="Conversation sharing operations", path="/api"
 )


+def _resolve_prompt_pg_id(conn, prompt_id_raw, user_id):
+    """Translate an incoming prompt id (UUID or legacy Mongo ObjectId) to a PG UUID.
+
+    Scoped by ``user_id`` so a caller can't link another user's prompt
+    into their share record. Returns ``None`` for sentinel values
+    (``"default"``) or unresolved ids.
+    """
+    if not prompt_id_raw or prompt_id_raw == "default":
+        return None
+    value = str(prompt_id_raw)
+    # Already UUID — trust it but still require ownership. A shape-gate
+    # (rather than a loose ``len == 36 and '-' in value`` check) keeps
+    # non-UUID input out of ``CAST(:pid AS uuid)``; the cast would raise
+    # and poison the readonly transaction otherwise.
+    if looks_like_uuid(value):
+        row = conn.execute(
+            _sql_text(
+                "SELECT id FROM prompts WHERE id = CAST(:pid AS uuid) "
+                "AND user_id = :uid"
+            ),
+            {"pid": value, "uid": user_id},
+        ).fetchone()
+        return str(row[0]) if row else None
+    # Legacy Mongo ObjectId fallback.
+    row = conn.execute(
+        _sql_text(
+            "SELECT id FROM prompts WHERE legacy_mongo_id = :pid "
+            "AND user_id = :uid"
+        ),
+        {"pid": value, "uid": user_id},
+    ).fetchone()
+    return str(row[0]) if row else None
+
+
+def _resolve_source_pg_id(conn, source_raw):
+    """Translate a source id (UUID or legacy Mongo ObjectId) to a PG UUID."""
+    if not source_raw:
+        return None
+    value = str(source_raw)
+    # See ``_resolve_prompt_pg_id`` for the shape-gate rationale.
+    if looks_like_uuid(value):
+        row = conn.execute(
+            _sql_text(
+                "SELECT id FROM sources WHERE id = CAST(:sid AS uuid)"
+            ),
+            {"sid": value},
+        ).fetchone()
+        return str(row[0]) if row else None
+    row = conn.execute(
+        _sql_text("SELECT id FROM sources WHERE legacy_mongo_id = :sid"),
+        {"sid": value},
+    ).fetchone()
+    return str(row[0]) if row else None
+
+
+def _find_reusable_share_agent(
+    conn, user_id, *, prompt_pg_id, chunks, source_pg_id, retriever,
+):
+    """Find an existing share-as-agent key row matching these parameters.
+
+    Mirrors the legacy Mongo ``agents_collection.find_one`` pre-existence
+    check. Used to reuse an api key across repeated shares of the same
+    conversation with the same prompt/chunks/source/retriever.
+    """
+    clauses = ["user_id = :uid", "key IS NOT NULL"]
+    params: dict = {"uid": user_id}
+    if prompt_pg_id is None:
+        clauses.append("prompt_id IS NULL")
+    else:
+        clauses.append("prompt_id = CAST(:pid AS uuid)")
+        params["pid"] = prompt_pg_id
+    if chunks is None:
+        clauses.append("chunks IS NULL")
+    else:
+        clauses.append("chunks = :chunks")
+        params["chunks"] = int(chunks)
+    if source_pg_id is None:
+        clauses.append("source_id IS NULL")
+    else:
+        clauses.append("source_id = CAST(:sid AS uuid)")
+        params["sid"] = source_pg_id
+    if retriever is None:
+        clauses.append("retriever IS NULL")
+    else:
+        clauses.append("retriever = :retr")
+        params["retr"] = retriever
+    sql = (
+        "SELECT * FROM agents WHERE "
+        + " AND ".join(clauses)
+        + " LIMIT 1"
+    )
+    row = conn.execute(_sql_text(sql), params).fetchone()
+    if row is None:
+        return None
+    mapping = dict(row._mapping)
+    mapping["id"] = str(mapping["id"]) if mapping.get("id") else None
+    return mapping
+
+
@sharing_ns.route("/share")
 class ShareConversation(Resource):
    share_conversation_model = api.model(
@@ -119,173 +156,93 @@ class ShareConversation(Resource):
        conversation_id = data["conversation_id"]

        try:
-            conversation = conversations_collection.find_one(
-                {"_id": ObjectId(conversation_id), "user": user}
-            )
-            if conversation is None:
-                return make_response(
-                    jsonify(
-                        {
-                            "status": "error",
-                            "message": "Conversation does not exist",
-                        }
-                    ),
-                    404,
-                )
-            current_n_queries = len(conversation["queries"])
-            explicit_binary = Binary.from_uuid(
-                uuid.uuid4(), UuidRepresentation.STANDARD
-            )
+            with db_session() as conn:
+                conv_repo = ConversationsRepository(conn)
+                shared_repo = SharedConversationsRepository(conn)
+                agents_repo = AgentsRepository(conn)

-            if is_promptable:
-                prompt_id = data.get("prompt_id", "default")
-                chunks = data.get("chunks", "2")
-
-                name = conversation["name"] + "(shared)"
-                new_api_key_data = {
-                    "prompt_id": prompt_id,
-                    "chunks": chunks,
-                    "user": user,
-                }
-
-                if "source" in data and ObjectId.is_valid(data["source"]):
-                    new_api_key_data["source"] = DBRef(
-                        "sources", ObjectId(data["source"])
-                    )
-                if "retriever" in data:
-                    new_api_key_data["retriever"] = data["retriever"]
-                pre_existing_api_document = agents_collection.find_one(new_api_key_data)
-                if pre_existing_api_document:
-                    api_uuid = pre_existing_api_document["key"]
-                    pre_existing = shared_conversations_collections.find_one(
-                        {
-                            "conversation_id": ObjectId(conversation_id),
-                            "isPromptable": is_promptable,
-                            "first_n_queries": current_n_queries,
-                            "user": user,
-                            "api_key": api_uuid,
-                        }
-                    )
-                    if pre_existing is not None:
-                        return make_response(
-                            jsonify(
-                                {
-                                    "success": True,
-                                    "identifier": str(pre_existing["uuid"].as_uuid()),
-                                }
-                            ),
-                            200,
-                        )
-                    else:
-                        shared_conversations_collections.insert_one(
+                conversation = conv_repo.get_any(conversation_id, user)
+                if conversation is None:
+                    return make_response(
+                        jsonify(
                            {
-                                "uuid": explicit_binary,
-                                "conversation_id": ObjectId(conversation_id),
-                                "isPromptable": is_promptable,
-                                "first_n_queries": current_n_queries,
-                                "user": user,
-                                "api_key": api_uuid,
+                                "status": "error",
+                                "message": "Conversation does not exist",
                            }
-                        )
-                        _dual_write_share(
-                            conversation_id,
-                            str(explicit_binary.as_uuid()),
-                            user,
-                            is_promptable=is_promptable,
-                            first_n_queries=current_n_queries,
-                            api_key=api_uuid,
-                            prompt_id=prompt_id,
-                            chunks=int(chunks) if chunks else None,
-                        )
-                        return make_response(
-                            jsonify(
-                                {
-                                    "success": True,
-                                    "identifier": str(explicit_binary.as_uuid()),
-                                }
-                            ),
-                            201,
-                        )
-                else:
-                    api_uuid = str(uuid.uuid4())
-                    new_api_key_data["key"] = api_uuid
-                    new_api_key_data["name"] = name
-
-                    if "source" in data and ObjectId.is_valid(data["source"]):
-                        new_api_key_data["source"] = DBRef(
-                            "sources", ObjectId(data["source"])
-                        )
-                    if "retriever" in data:
-                        new_api_key_data["retriever"] = data["retriever"]
-                    agents_collection.insert_one(new_api_key_data)
-                    shared_conversations_collections.insert_one(
-                        {
-                            "uuid": explicit_binary,
-                            "conversation_id": ObjectId(conversation_id),
-                            "isPromptable": is_promptable,
-                            "first_n_queries": current_n_queries,
-                            "user": user,
-                            "api_key": api_uuid,
-                        }
+                        ),
+                        404,
                    )
-                    _dual_write_share(
-                        conversation_id,
-                        str(explicit_binary.as_uuid()),
+                conv_pg_id = str(conversation["id"])
+                current_n_queries = conv_repo.message_count(conv_pg_id)
+
+                if is_promptable:
+                    prompt_id_raw = data.get("prompt_id", "default")
+                    chunks_raw = data.get("chunks", "2")
+                    try:
+                        chunks_int = int(chunks_raw) if chunks_raw not in (None, "") else None
+                    except (TypeError, ValueError):
+                        chunks_int = None
+
+                    prompt_pg_id = _resolve_prompt_pg_id(conn, prompt_id_raw, user)
+                    source_pg_id = _resolve_source_pg_id(conn, data.get("source"))
+                    retriever = data.get("retriever")
+
+                    reusable = _find_reusable_share_agent(
+                        conn, user,
+                        prompt_pg_id=prompt_pg_id,
+                        chunks=chunks_int,
+                        source_pg_id=source_pg_id,
+                        retriever=retriever,
+                    )
+                    if reusable:
+                        api_uuid = reusable.get("key")
+                    else:
+                        api_uuid = str(uuid.uuid4())
+                        name = (conversation.get("name") or "") + "(shared)"
+                        agents_repo.create(
+                            user,
+                            name,
+                            "published",
+                            key=api_uuid,
+                            retriever=retriever,
+                            chunks=chunks_int,
+                            prompt_id=prompt_pg_id,
+                            source_id=source_pg_id,
+                        )
+
+                    share = shared_repo.get_or_create(
+                        conv_pg_id,
                        user,
-                        is_promptable=is_promptable,
+                        is_promptable=True,
                        first_n_queries=current_n_queries,
                        api_key=api_uuid,
-                        prompt_id=prompt_id,
-                        chunks=int(chunks) if chunks else None,
+                        prompt_id=prompt_pg_id,
+                        chunks=chunks_int,
                    )
                    return make_response(
                        jsonify(
                            {
                                "success": True,
-                                "identifier": str(explicit_binary.as_uuid()),
+                                "identifier": str(share["uuid"]),
                            }
                        ),
-                        201,
+                        201 if reusable is None else 200,
                    )
-            pre_existing = shared_conversations_collections.find_one(
-                {
-                    "conversation_id": ObjectId(conversation_id),
-                    "isPromptable": is_promptable,
-                    "first_n_queries": current_n_queries,
-                    "user": user,
-                }
-            )
-            if pre_existing is not None:
-                return make_response(
-                    jsonify(
-                        {
-                            "success": True,
-                            "identifier": str(pre_existing["uuid"].as_uuid()),
-                        }
-                    ),
-                    200,
-                )
-            else:
-                shared_conversations_collections.insert_one(
-                    {
-                        "uuid": explicit_binary,
-                        "conversation_id": ObjectId(conversation_id),
-                        "isPromptable": is_promptable,
-                        "first_n_queries": current_n_queries,
-                        "user": user,
-                    }
-                )
-                _dual_write_share(
-                    conversation_id,
-                    str(explicit_binary.as_uuid()),
+
+                # Non-promptable share path.
+                share = shared_repo.get_or_create(
+                    conv_pg_id,
                    user,
-                    is_promptable=is_promptable,
+                    is_promptable=False,
                    first_n_queries=current_n_queries,
                    api_key=None,
                )
                return make_response(
                    jsonify(
-                        {"success": True, "identifier": str(explicit_binary.as_uuid())}
+                        {
+                            "success": True,
+                            "identifier": str(share["uuid"]),
+                        }
                    ),
                    201,
                )
@@ -301,37 +258,13 @@ class GetPubliclySharedConversations(Resource):
    @api.doc(description="Get publicly shared conversations by identifier")
    def get(self, identifier: str):
        try:
-            query_uuid = Binary.from_uuid(
-                uuid.UUID(identifier), UuidRepresentation.STANDARD
-            )
-            shared = shared_conversations_collections.find_one({"uuid": query_uuid})
-            conversation_queries = []
+            with db_readonly() as conn:
+                shared_repo = SharedConversationsRepository(conn)
+                conv_repo = ConversationsRepository(conn)
+                attach_repo = AttachmentsRepository(conn)

-            if (
-                shared
-                and "conversation_id" in shared
-            ):
-                # Handle DBRef (legacy), ObjectId, dict, and string formats for conversation_id
-                conversation_id = shared["conversation_id"]
-                if isinstance(conversation_id, DBRef):
-                    conversation_id = conversation_id.id
-                elif isinstance(conversation_id, dict):
-                    # Handle dict representation of DBRef (e.g., {"$ref": "...", "$id": "..."})
-                    if "$id" in conversation_id:
-                        conv_id = conversation_id["$id"]
-                        # $id might be a dict like {"$oid": "..."} or a string
-                        if isinstance(conv_id, dict) and "$oid" in conv_id:
-                            conversation_id = ObjectId(conv_id["$oid"])
-                        else:
-                            conversation_id = ObjectId(conv_id)
-                    elif "_id" in conversation_id:
-                        conversation_id = ObjectId(conversation_id["_id"])
-                elif isinstance(conversation_id, str):
-                    conversation_id = ObjectId(conversation_id)
-                conversation = conversations_collection.find_one(
-                    {"_id": conversation_id}
-                )
-                if conversation is None:
+                shared = shared_repo.find_by_uuid(identifier)
+                if not shared or not shared.get("conversation_id"):
                    return make_response(
                        jsonify(
                            {
@@ -341,22 +274,60 @@ class GetPubliclySharedConversations(Resource):
                        ),
                        404,
                    )
-                conversation_queries = conversation["queries"][
-                    : (shared["first_n_queries"])
-                ]
+                conv_pg_id = str(shared["conversation_id"])
+                owner_user = shared.get("user_id")

-                for query in conversation_queries:
-                    if "attachments" in query and query["attachments"]:
+                conversation = conv_repo.get_owned(conv_pg_id, owner_user) if owner_user else None
+                if conversation is None:
+                    # Fall back to any-user lookup in case shared row's
+                    # user_id is missing — still keyed by PG UUID.
+                    row = conn.execute(
+                        _sql_text(
+                            "SELECT * FROM conversations WHERE id = CAST(:id AS uuid)"
+                        ),
+                        {"id": conv_pg_id},
+                    ).fetchone()
+                    if row is None:
+                        return make_response(
+                            jsonify(
+                                {
+                                    "success": False,
+                                    "error": "might have broken url or the conversation does not exist",
+                                }
+                            ),
+                            404,
+                        )
+                    conversation = dict(row._mapping)
+
+                messages = conv_repo.get_messages(conv_pg_id)
+                first_n = shared.get("first_n_queries") or 0
+                conversation_queries = []
+                for msg in messages[:first_n]:
+                    query = {
+                        "prompt": msg.get("prompt"),
+                        "response": msg.get("response"),
+                        "thought": msg.get("thought"),
+                        "sources": msg.get("sources") or [],
+                        "tool_calls": msg.get("tool_calls") or [],
+                        "timestamp": (
+                            msg["timestamp"].isoformat()
+                            if hasattr(msg.get("timestamp"), "isoformat")
+                            else msg.get("timestamp")
+                        ),
+                        "feedback": msg.get("feedback"),
+                    }
+                    attachments = msg.get("attachments") or []
+                    if attachments:
                        attachment_details = []
-                        for attachment_id in query["attachments"]:
+                        for attachment_id in attachments:
                            try:
-                                attachment = attachments_collection.find_one(
-                                    {"_id": ObjectId(attachment_id)}
-                                )
+                                attachment = attach_repo.get_any(
+                                    str(attachment_id), owner_user,
+                                ) if owner_user else None
                                if attachment:
                                    attachment_details.append(
                                        {
-                                            "id": str(attachment["_id"]),
+                                            "id": str(attachment["id"]),
                                            "fileName": attachment.get(
                                                "filename", "Unknown file"
                                            ),
@@ -368,26 +339,23 @@ class GetPubliclySharedConversations(Resource):
                                    exc_info=True,
                                )
                        query["attachments"] = attachment_details
-            else:
-                return make_response(
-                    jsonify(
-                        {
-                            "success": False,
-                            "error": "might have broken url or the conversation does not exist",
-                        }
-                    ),
-                    404,
+                    conversation_queries.append(query)
+
+                created = conversation.get("created_at") or conversation.get("date")
+                date_iso = (
+                    created.isoformat()
+                    if hasattr(created, "isoformat")
+                    else (str(created) if created is not None else None)
                )
-            date = conversation["_id"].generation_time.isoformat()
-            res = {
-                "success": True,
-                "queries": conversation_queries,
-                "title": conversation["name"],
-                "timestamp": date,
-            }
-            if shared["isPromptable"] and "api_key" in shared:
-                res["api_key"] = shared["api_key"]
-            return make_response(jsonify(res), 200)
+                res = {
+                    "success": True,
+                    "queries": conversation_queries,
+                    "title": conversation.get("name"),
+                    "timestamp": date_iso,
+                }
+                if shared.get("is_promptable") and shared.get("api_key"):
+                    res["api_key"] = shared["api_key"]
+                return make_response(jsonify(res), 200)
        except Exception as err:
            current_app.logger.error(
                f"Error getting shared conversation: {err}", exc_info=True
--- a/application/api/user/sources/chunks.py
+++ b/application/api/user/sources/chunks.py
@@ -1,11 +1,12 @@
 """Source document management chunk management."""

-from bson.objectid import ObjectId
 from flask import current_app, jsonify, make_response, request
 from flask_restx import fields, Namespace, Resource

 from application.api import api
-from application.api.user.base import get_vector_store, sources_collection
+from application.api.user.base import get_vector_store
+from application.storage.db.repositories.sources import SourcesRepository
+from application.storage.db.session import db_readonly
 from application.utils import check_required_fields, num_tokens_from_string

 sources_chunks_ns = Namespace(
@@ -13,6 +14,15 @@ sources_chunks_ns = Namespace(
 )


+def _resolve_source(doc_id: str, user: str):
+    """Resolve a source (UUID or legacy ObjectId) for the caller.
+
+    Returns the row dict (with PG UUID in ``id``) or ``None`` if missing.
+    """
+    with db_readonly() as conn:
+        return SourcesRepository(conn).get_any(doc_id, user)
+
+
@sources_chunks_ns.route("/get_chunks")
 class GetChunks(Resource):
    @api.doc(
@@ -36,36 +46,34 @@ class GetChunks(Resource):
        path = request.args.get("path")
        search_term = request.args.get("search", "").strip().lower()

-        if not ObjectId.is_valid(doc_id):
+        if not doc_id:
+            return make_response(jsonify({"error": "Invalid doc_id"}), 400)
+        try:
+            doc = _resolve_source(doc_id, user)
+        except Exception as e:
+            current_app.logger.error(f"Error resolving source: {e}", exc_info=True)
            return make_response(jsonify({"error": "Invalid doc_id"}), 400)
-        doc = sources_collection.find_one({"_id": ObjectId(doc_id), "user": user})
        if not doc:
            return make_response(
                jsonify({"error": "Document not found or access denied"}), 404
            )
+        resolved_id = str(doc["id"])
        try:
-            store = get_vector_store(doc_id)
+            store = get_vector_store(resolved_id)
            chunks = store.get_chunks()

            filtered_chunks = []
            for chunk in chunks:
                metadata = chunk.get("metadata", {})

-                # Filter by path if provided
-
                if path:
                    chunk_source = metadata.get("source", "")
                    chunk_file_path = metadata.get("file_path", "")
-                    # Check if the chunk matches the requested path
-                    # For file uploads: source ends with path (e.g., "inputs/.../file.pdf" ends with "file.pdf")
-                    # For crawlers: file_path ends with path (e.g., "guides/setup.md" ends with "setup.md")
                    source_match = chunk_source and chunk_source.endswith(path)
                    file_path_match = chunk_file_path and chunk_file_path.endswith(path)

                    if not (source_match or file_path_match):
                        continue
-                # Filter by search term if provided
-
                if search_term:
                    text_match = search_term in chunk.get("text", "").lower()
                    title_match = search_term in metadata.get("title", "").lower()
@@ -132,15 +140,17 @@ class AddChunk(Resource):
        token_count = num_tokens_from_string(text)
        metadata["token_count"] = token_count

-        if not ObjectId.is_valid(doc_id):
+        try:
+            doc = _resolve_source(doc_id, user)
+        except Exception as e:
+            current_app.logger.error(f"Error resolving source: {e}", exc_info=True)
            return make_response(jsonify({"error": "Invalid doc_id"}), 400)
-        doc = sources_collection.find_one({"_id": ObjectId(doc_id), "user": user})
        if not doc:
            return make_response(
                jsonify({"error": "Document not found or access denied"}), 404
            )
        try:
-            store = get_vector_store(doc_id)
+            store = get_vector_store(str(doc["id"]))
            chunk_id = store.add_chunk(text, metadata)
            return make_response(
                jsonify({"message": "Chunk added successfully", "chunk_id": chunk_id}),
@@ -165,15 +175,17 @@ class DeleteChunk(Resource):
        doc_id = request.args.get("id")
        chunk_id = request.args.get("chunk_id")

-        if not ObjectId.is_valid(doc_id):
+        try:
+            doc = _resolve_source(doc_id, user)
+        except Exception as e:
+            current_app.logger.error(f"Error resolving source: {e}", exc_info=True)
            return make_response(jsonify({"error": "Invalid doc_id"}), 400)
-        doc = sources_collection.find_one({"_id": ObjectId(doc_id), "user": user})
        if not doc:
            return make_response(
                jsonify({"error": "Document not found or access denied"}), 404
            )
        try:
-            store = get_vector_store(doc_id)
+            store = get_vector_store(str(doc["id"]))
            deleted = store.delete_chunk(chunk_id)
            if deleted:
                return make_response(
@@ -232,15 +244,17 @@ class UpdateChunk(Resource):
            if metadata is None:
                metadata = {}
            metadata["token_count"] = token_count
-        if not ObjectId.is_valid(doc_id):
+        try:
+            doc = _resolve_source(doc_id, user)
+        except Exception as e:
+            current_app.logger.error(f"Error resolving source: {e}", exc_info=True)
            return make_response(jsonify({"error": "Invalid doc_id"}), 400)
-        doc = sources_collection.find_one({"_id": ObjectId(doc_id), "user": user})
        if not doc:
            return make_response(
                jsonify({"error": "Document not found or access denied"}), 404
            )
        try:
-            store = get_vector_store(doc_id)
+            store = get_vector_store(str(doc["id"]))

            chunks = store.get_chunks()
            existing_chunk = next((c for c in chunks if c["doc_id"] == chunk_id), None)
--- a/application/api/user/sources/routes.py
+++ b/application/api/user/sources/routes.py
@@ -3,14 +3,14 @@
 import json
 import math

-from bson.objectid import ObjectId
 from flask import current_app, jsonify, make_response, redirect, request
 from flask_restx import fields, Namespace, Resource

 from application.api import api
-from application.api.user.base import sources_collection
 from application.api.user.tasks import sync_source
 from application.core.settings import settings
+from application.storage.db.repositories.sources import SourcesRepository
+from application.storage.db.session import db_readonly, db_session
 from application.storage.storage_creator import StorageCreator
 from application.utils import check_required_fields
 from application.vectorstore.vector_creator import VectorCreator
@@ -56,11 +56,20 @@ class CombinedJson(Resource):
        ]

        try:
-            for index in sources_collection.find({"user": user}).sort("date", -1):
+            with db_readonly() as conn:
+                indexes = SourcesRepository(conn).list_for_user(user)
+            # list_for_user sorts by created_at DESC; legacy shape sorted by
+            # "date" DESC. Both are monotonic on creation so the ordering is
+            # equivalent for dev; re-sort defensively.
+            indexes = sorted(
+                indexes, key=lambda r: r.get("date") or r.get("created_at") or "",
+                reverse=True,
+            )
+            for index in indexes:
                provider = _get_provider_from_remote_data(index.get("remote_data"))
                data.append(
                    {
-                        "id": str(index["_id"]),
+                        "id": str(index["id"]),
                        "name": index.get("name"),
                        "date": index.get("date"),
                        "model": settings.EMBEDDINGS_NAME,
@@ -70,9 +79,7 @@ class CombinedJson(Resource):
                        "syncFrequency": index.get("sync_frequency", ""),
                        "provider": provider,
                        "is_nested": bool(index.get("directory_structure")),
-                        "type": index.get(
-                            "type", "file"
-                        ),  # Add type field with default "file"
+                        "type": index.get("type", "file"),
                    }
                )
        except Exception as err:
@@ -89,61 +96,55 @@ class PaginatedSources(Resource):
        if not decoded_token:
            return make_response(jsonify({"success": False}), 401)
        user = decoded_token.get("sub")
-        sort_field = request.args.get("sort", "date")  # Default to 'date'
-        sort_order = request.args.get("order", "desc")  # Default to 'desc'
-        page = int(request.args.get("page", 1))  # Default to 1
-        rows_per_page = int(request.args.get("rows", 10))  # Default to 10
-        # add .strip() to remove leading and trailing whitespaces
-
-        search_term = request.args.get(
-            "search", ""
-        ).strip()  # add search for filter documents
-
-        # Prepare query for filtering
-
-        query = {"user": user}
-        if search_term:
-            query["name"] = {
-                "$regex": search_term,
-                "$options": "i",  # using case-insensitive search
-            }
-        total_documents = sources_collection.count_documents(query)
-        total_pages = max(1, math.ceil(total_documents / rows_per_page))
-        page = min(
-            max(1, page), total_pages
-        )  # add this to make sure page inbound is within the range
-        sort_order = 1 if sort_order == "asc" else -1
-        skip = (page - 1) * rows_per_page
+        sort_field = request.args.get("sort", "date")
+        sort_order = request.args.get("order", "desc")
+        page = max(1, int(request.args.get("page", 1)))
+        rows_per_page = max(1, int(request.args.get("rows", 10)))
+        search_term = request.args.get("search", "").strip() or None

        try:
-            documents = (
-                sources_collection.find(query)
-                .sort(sort_field, sort_order)
-                .skip(skip)
-                .limit(rows_per_page)
-            )
+            with db_readonly() as conn:
+                repo = SourcesRepository(conn)
+                total_documents = repo.count_for_user(
+                    user, search_term=search_term,
+                )
+                # Prior in-Python implementation returned ``totalPages = 1``
+                # for empty result sets (``max(1, ceil(0/rows))``); we
+                # preserve that contract so the frontend pager stays stable.
+                total_pages = max(1, math.ceil(total_documents / rows_per_page))
+                effective_page = min(page, total_pages)
+                offset = (effective_page - 1) * rows_per_page
+                window = repo.list_for_user(
+                    user,
+                    limit=rows_per_page,
+                    offset=offset,
+                    search_term=search_term,
+                    sort_field=sort_field,
+                    sort_order=sort_order,
+                )

            paginated_docs = []
-            for doc in documents:
+            for doc in window:
                provider = _get_provider_from_remote_data(doc.get("remote_data"))
-                doc_data = {
-                    "id": str(doc["_id"]),
-                    "name": doc.get("name", ""),
-                    "date": doc.get("date", ""),
-                    "model": settings.EMBEDDINGS_NAME,
-                    "location": "local",
-                    "tokens": doc.get("tokens", ""),
-                    "retriever": doc.get("retriever", "classic"),
-                    "syncFrequency": doc.get("sync_frequency", ""),
-                    "provider": provider,
-                    "isNested": bool(doc.get("directory_structure")),
-                    "type": doc.get("type", "file"),
-                }
-                paginated_docs.append(doc_data)
+                paginated_docs.append(
+                    {
+                        "id": str(doc["id"]),
+                        "name": doc.get("name", ""),
+                        "date": doc.get("date", ""),
+                        "model": settings.EMBEDDINGS_NAME,
+                        "location": "local",
+                        "tokens": doc.get("tokens", ""),
+                        "retriever": doc.get("retriever", "classic"),
+                        "syncFrequency": doc.get("sync_frequency", ""),
+                        "provider": provider,
+                        "isNested": bool(doc.get("directory_structure")),
+                        "type": doc.get("type", "file"),
+                    }
+                )
            response = {
                "total": total_documents,
                "totalPages": total_pages,
-                "currentPage": page,
+                "currentPage": effective_page,
                "paginated": paginated_docs,
            }
            return make_response(jsonify(response), 200)
@@ -154,28 +155,6 @@ class PaginatedSources(Resource):
            return make_response(jsonify({"success": False}), 400)


-@sources_ns.route("/delete_by_ids")
-class DeleteByIds(Resource):
-    @api.doc(
-        description="Deletes documents from the vector store by IDs",
-        params={"path": "Comma-separated list of IDs"},
-    )
-    def get(self):
-        ids = request.args.get("path")
-        if not ids:
-            return make_response(
-                jsonify({"success": False, "message": "Missing required fields"}), 400
-            )
-        try:
-            result = sources_collection.delete_index(ids=ids)
-            if result:
-                return make_response(jsonify({"success": True}), 200)
-        except Exception as err:
-            current_app.logger.error(f"Error deleting indexes: {err}", exc_info=True)
-            return make_response(jsonify({"success": False}), 400)
-        return make_response(jsonify({"success": False}), 400)
-
-
@sources_ns.route("/delete_old")
 class DeleteOldIndexes(Resource):
    @api.doc(
@@ -186,30 +165,33 @@ class DeleteOldIndexes(Resource):
        decoded_token = request.decoded_token
        if not decoded_token:
            return make_response(jsonify({"success": False}), 401)
+        user = decoded_token.get("sub")
        source_id = request.args.get("source_id")
        if not source_id:
            return make_response(
                jsonify({"success": False, "message": "Missing required fields"}), 400
            )
-        doc = sources_collection.find_one(
-            {"_id": ObjectId(source_id), "user": decoded_token.get("sub")}
-        )
+        try:
+            with db_readonly() as conn:
+                doc = SourcesRepository(conn).get_any(source_id, user)
+        except Exception as err:
+            current_app.logger.error(f"Error looking up source: {err}", exc_info=True)
+            return make_response(jsonify({"success": False}), 400)
        if not doc:
            return make_response(jsonify({"status": "not found"}), 404)
        storage = StorageCreator.get_storage()
+        resolved_id = str(doc["id"])

        try:
-            # Delete vector index
-
            if settings.VECTOR_STORE == "faiss":
-                index_path = f"indexes/{str(doc['_id'])}"
+                index_path = f"indexes/{resolved_id}"
                if storage.file_exists(f"{index_path}/index.faiss"):
                    storage.delete_file(f"{index_path}/index.faiss")
                if storage.file_exists(f"{index_path}/index.pkl"):
                    storage.delete_file(f"{index_path}/index.pkl")
            else:
                vectorstore = VectorCreator.create_vectorstore(
-                    settings.VECTOR_STORE, source_id=str(doc["_id"])
+                    settings.VECTOR_STORE, source_id=resolved_id
                )
                vectorstore.delete_index()
            if "file_path" in doc and doc["file_path"]:
@@ -227,7 +209,14 @@ class DeleteOldIndexes(Resource):
                f"Error deleting files and indexes: {err}", exc_info=True
            )
            return make_response(jsonify({"success": False}), 400)
-        sources_collection.delete_one({"_id": ObjectId(source_id)})
+        try:
+            with db_session() as conn:
+                SourcesRepository(conn).delete(resolved_id, user)
+        except Exception as err:
+            current_app.logger.error(
+                f"Error deleting source row: {err}", exc_info=True
+            )
+            return make_response(jsonify({"success": False}), 400)
        return make_response(jsonify({"success": True}), 200)


@@ -272,15 +261,16 @@ class ManageSync(Resource):
            return make_response(
                jsonify({"success": False, "message": "Invalid frequency"}), 400
            )
-        update_data = {"$set": {"sync_frequency": sync_frequency}}
        try:
-            sources_collection.update_one(
-                {
-                    "_id": ObjectId(source_id),
-                    "user": user,
-                },
-                update_data,
-            )
+            with db_session() as conn:
+                repo = SourcesRepository(conn)
+                doc = repo.get_any(source_id, user)
+                if doc is None:
+                    return make_response(
+                        jsonify({"success": False, "message": "Source not found"}),
+                        404,
+                    )
+                repo.update(str(doc["id"]), user, {"sync_frequency": sync_frequency})
        except Exception as err:
            current_app.logger.error(
                f"Error updating sync frequency: {err}", exc_info=True
@@ -309,19 +299,20 @@ class SyncSource(Resource):
        if missing_fields:
            return missing_fields
        source_id = data["source_id"]
-        if not ObjectId.is_valid(source_id):
+        try:
+            with db_readonly() as conn:
+                doc = SourcesRepository(conn).get_any(source_id, user)
+        except Exception as err:
+            current_app.logger.error(f"Error looking up source: {err}", exc_info=True)
            return make_response(
                jsonify({"success": False, "message": "Invalid source ID"}), 400
            )
-        doc = sources_collection.find_one(
-            {"_id": ObjectId(source_id), "user": user}
-        )
        if not doc:
            return make_response(
                jsonify({"success": False, "message": "Source not found"}), 404
            )
        source_type = doc.get("type", "")
-        if source_type.startswith("connector"):
+        if source_type and source_type.startswith("connector"):
            return make_response(
                jsonify(
                    {
@@ -344,7 +335,7 @@ class SyncSource(Resource):
                loader=source_type,
                sync_frequency=doc.get("sync_frequency", "never"),
                retriever=doc.get("retriever", "classic"),
-                doc_id=source_id,
+                doc_id=str(doc["id"]),
            )
        except Exception as err:
            current_app.logger.error(
@@ -370,10 +361,9 @@ class DirectoryStructure(Resource):

        if not doc_id:
            return make_response(jsonify({"error": "Document ID is required"}), 400)
-        if not ObjectId.is_valid(doc_id):
-            return make_response(jsonify({"error": "Invalid document ID"}), 400)
        try:
-            doc = sources_collection.find_one({"_id": ObjectId(doc_id), "user": user})
+            with db_readonly() as conn:
+                doc = SourcesRepository(conn).get_any(doc_id, user)
            if not doc:
                return make_response(
                    jsonify({"error": "Document not found or access denied"}), 404
@@ -387,6 +377,8 @@ class DirectoryStructure(Resource):
                if isinstance(remote_data, str) and remote_data:
                    remote_data_obj = json.loads(remote_data)
                    provider = remote_data_obj.get("provider")
+                elif isinstance(remote_data, dict):
+                    provider = remote_data.get("provider")
            except Exception as e:
                current_app.logger.warning(
                    f"Failed to parse remote_data for doc {doc_id}: {e}"
@@ -406,4 +398,7 @@ class DirectoryStructure(Resource):
            current_app.logger.error(
                f"Error retrieving directory structure: {e}", exc_info=True
            )
-            return make_response(jsonify({"success": False, "error": "Failed to retrieve directory structure"}), 500)
+            return make_response(
+                jsonify({"success": False, "error": "Failed to retrieve directory structure"}),
+                500,
+            )
--- a/application/api/user/sources/upload.py
+++ b/application/api/user/sources/upload.py
@@ -3,18 +3,21 @@
 import json
 import os
 import tempfile
+import uuid
 import zipfile

-from bson.objectid import ObjectId
 from flask import current_app, jsonify, make_response, request
 from flask_restx import fields, Namespace, Resource
+from sqlalchemy import text as sql_text

 from application.api import api
-from application.api.user.base import sources_collection
 from application.api.user.tasks import ingest, ingest_connector_task, ingest_remote
 from application.core.settings import settings
 from application.parser.connectors.connector_creator import ConnectorCreator
 from application.parser.file.constants import SUPPORTED_SOURCE_EXTENSIONS
+from application.storage.db.repositories.idempotency import IdempotencyRepository
+from application.storage.db.repositories.sources import SourcesRepository
+from application.storage.db.session import db_readonly, db_session
 from application.storage.storage_creator import StorageCreator
 from application.stt.upload_limits import (
    AudioFileTooLargeError,
@@ -30,6 +33,79 @@ sources_upload_ns = Namespace(
 )


+_IDEMPOTENCY_KEY_MAX_LEN = 256
+
+
+def _read_idempotency_key():
+    """Return (key, error_response). Empty header → (None, None); oversized → (None, 400)."""
+    key = request.headers.get("Idempotency-Key")
+    if not key:
+        return None, None
+    if len(key) > _IDEMPOTENCY_KEY_MAX_LEN:
+        return None, make_response(
+            jsonify(
+                {
+                    "success": False,
+                    "message": (
+                        f"Idempotency-Key exceeds maximum length of "
+                        f"{_IDEMPOTENCY_KEY_MAX_LEN} characters"
+                    ),
+                }
+            ),
+            400,
+        )
+    return key, None
+
+
+def _scoped_idempotency_key(idempotency_key, scope):
+    """``{scope}:{key}`` so different users can't collide on the same key."""
+    if not idempotency_key or not scope:
+        return None
+    return f"{scope}:{idempotency_key}"
+
+
+def _claim_task_or_get_cached(key, task_name):
+    """Claim ``key`` for this request OR return the winner's cached payload.
+
+    Pre-generates the celery task_id so a losing writer sees the same
+    id immediately. Returns ``(task_id, cached_response)``; non-None
+    cached means the caller should return without enqueuing.
+    """
+    predetermined_id = str(uuid.uuid4())
+    with db_session() as conn:
+        claimed = IdempotencyRepository(conn).claim_task(
+            key=key, task_name=task_name, task_id=predetermined_id,
+        )
+    if claimed is not None:
+        return claimed["task_id"], None
+    with db_readonly() as conn:
+        existing = IdempotencyRepository(conn).get_task(key)
+    cached_id = existing.get("task_id") if existing else None
+    return None, {
+        "success": True,
+        "task_id": cached_id or "deduplicated",
+    }
+
+
+def _release_claim(key):
+    """Drop a pending claim so a client retry can re-claim it."""
+    try:
+        with db_session() as conn:
+            conn.execute(
+                sql_text(
+                    "DELETE FROM task_dedup WHERE idempotency_key = :k "
+                    "AND status = 'pending'"
+                ),
+                {"k": key},
+            )
+    except Exception:
+        current_app.logger.exception(
+            "Failed to release task_dedup claim for key=%s", key,
+        )
+
+
+
+
 def _enforce_audio_path_size_limit(file_path: str, filename: str) -> None:
    if not is_audio_filename(filename):
        return
@@ -49,17 +125,38 @@ class UploadFile(Resource):
        )
    )
    @api.doc(
-        description="Uploads a file to be vectorized and indexed",
+        description=(
+            "Uploads a file to be vectorized and indexed. Honors an optional "
+            "``Idempotency-Key`` header: a repeat request with the same key "
+            "within 24h returns the original cached response without re-enqueuing."
+        ),
    )
    def post(self):
        decoded_token = request.decoded_token
        if not decoded_token:
            return make_response(jsonify({"success": False}), 401)
+        user = decoded_token.get("sub")
+        idempotency_key, key_error = _read_idempotency_key()
+        if key_error is not None:
+            return key_error
+        # User-scoped to avoid cross-user collisions; also feeds
+        # ``_derive_source_id`` so uuid5 stays user-disjoint.
+        scoped_key = _scoped_idempotency_key(idempotency_key, user)
+        # Claim before enqueue; the loser returns the winner's task_id.
+        predetermined_task_id = None
+        if scoped_key:
+            predetermined_task_id, cached = _claim_task_or_get_cached(
+                scoped_key, "ingest",
+            )
+            if cached is not None:
+                return make_response(jsonify(cached), 200)
        data = request.form
        files = request.files.getlist("file")
        required_fields = ["user", "name"]
        missing_fields = check_required_fields(data, required_fields)
        if missing_fields or not files or all(file.filename == "" for file in files):
+            if scoped_key:
+                _release_claim(scoped_key)
            return make_response(
                jsonify(
                    {
@@ -69,7 +166,6 @@ class UploadFile(Resource):
                ),
                400,
            )
-        user = decoded_token.get("sub")
        job_name = request.form["name"]

        # Create safe versions for filesystem operations
@@ -140,16 +236,27 @@ class UploadFile(Resource):
                        file_path = f"{base_path}/{safe_file}"
                        with open(temp_file_path, "rb") as f:
                            storage.save_file(f, file_path)
-            task = ingest.delay(
-                settings.UPLOAD_FOLDER,
-                list(SUPPORTED_SOURCE_EXTENSIONS),
-                job_name,
-                user,
-                file_path=base_path,
-                filename=dir_name,
-                file_name_map=file_name_map,
+            ingest_kwargs = dict(
+                args=(
+                    settings.UPLOAD_FOLDER,
+                    list(SUPPORTED_SOURCE_EXTENSIONS),
+                    job_name,
+                    user,
+                ),
+                kwargs={
+                    "file_path": base_path,
+                    "filename": dir_name,
+                    "file_name_map": file_name_map,
+                    # Scoped so the worker dedup row matches the HTTP claim.
+                    "idempotency_key": scoped_key or idempotency_key,
+                },
            )
+            if predetermined_task_id is not None:
+                ingest_kwargs["task_id"] = predetermined_task_id
+            task = ingest.apply_async(**ingest_kwargs)
        except AudioFileTooLargeError:
+            if scoped_key:
+                _release_claim(scoped_key)
            return make_response(
                jsonify(
                    {
@@ -161,8 +268,13 @@ class UploadFile(Resource):
            )
        except Exception as err:
            current_app.logger.error(f"Error uploading file: {err}", exc_info=True)
+            if scoped_key:
+                _release_claim(scoped_key)
            return make_response(jsonify({"success": False}), 400)
-        return make_response(jsonify({"success": True, "task_id": task.id}), 200)
+        # Predetermined id matches the dedup-claim row; loser GET sees same.
+        response_task_id = predetermined_task_id or task.id
+        response_payload = {"success": True, "task_id": response_task_id}
+        return make_response(jsonify(response_payload), 200)


@sources_upload_ns.route("/remote")
@@ -182,17 +294,38 @@ class UploadRemote(Resource):
        )
    )
    @api.doc(
-        description="Uploads remote source for vectorization",
+        description=(
+            "Uploads remote source for vectorization. Honors an optional "
+            "``Idempotency-Key`` header: a repeat request with the same key "
+            "within 24h returns the original cached response without re-enqueuing."
+        ),
    )
    def post(self):
        decoded_token = request.decoded_token
        if not decoded_token:
            return make_response(jsonify({"success": False}), 401)
+        user = decoded_token.get("sub")
+        idempotency_key, key_error = _read_idempotency_key()
+        if key_error is not None:
+            return key_error
+        scoped_key = _scoped_idempotency_key(idempotency_key, user)
        data = request.form
        required_fields = ["user", "source", "name", "data"]
        missing_fields = check_required_fields(data, required_fields)
        if missing_fields:
            return missing_fields
+        task_name_for_dedup = (
+            "ingest_connector_task"
+            if data.get("source") in ConnectorCreator.get_supported_connectors()
+            else "ingest_remote"
+        )
+        predetermined_task_id = None
+        if scoped_key:
+            predetermined_task_id, cached = _claim_task_or_get_cached(
+                scoped_key, task_name_for_dedup,
+            )
+            if cached is not None:
+                return make_response(jsonify(cached), 200)
        try:
            config = json.loads(data["data"])
            source_data = None
@@ -208,6 +341,8 @@ class UploadRemote(Resource):
            elif data["source"] in ConnectorCreator.get_supported_connectors():
                session_token = config.get("session_token")
                if not session_token:
+                    if scoped_key:
+                        _release_claim(scoped_key)
                    return make_response(
                        jsonify(
                            {
@@ -236,31 +371,47 @@ class UploadRemote(Resource):
                config["file_ids"] = file_ids
                config["folder_ids"] = folder_ids

-                task = ingest_connector_task.delay(
-                    job_name=data["name"],
-                    user=decoded_token.get("sub"),
-                    source_type=data["source"],
-                    session_token=session_token,
-                    file_ids=file_ids,
-                    folder_ids=folder_ids,
-                    recursive=config.get("recursive", False),
-                    retriever=config.get("retriever", "classic"),
-                )
-                return make_response(
-                    jsonify({"success": True, "task_id": task.id}), 200
-                )
-            task = ingest_remote.delay(
-                source_data=source_data,
-                job_name=data["name"],
-                user=decoded_token.get("sub"),
-                loader=data["source"],
-            )
+                connector_kwargs = {
+                    "kwargs": {
+                        "job_name": data["name"],
+                        "user": user,
+                        "source_type": data["source"],
+                        "session_token": session_token,
+                        "file_ids": file_ids,
+                        "folder_ids": folder_ids,
+                        "recursive": config.get("recursive", False),
+                        "retriever": config.get("retriever", "classic"),
+                        "idempotency_key": scoped_key or idempotency_key,
+                    },
+                }
+                if predetermined_task_id is not None:
+                    connector_kwargs["task_id"] = predetermined_task_id
+                task = ingest_connector_task.apply_async(**connector_kwargs)
+                response_task_id = predetermined_task_id or task.id
+                response_payload = {"success": True, "task_id": response_task_id}
+                return make_response(jsonify(response_payload), 200)
+            remote_kwargs = {
+                "kwargs": {
+                    "source_data": source_data,
+                    "job_name": data["name"],
+                    "user": user,
+                    "loader": data["source"],
+                    "idempotency_key": scoped_key or idempotency_key,
+                },
+            }
+            if predetermined_task_id is not None:
+                remote_kwargs["task_id"] = predetermined_task_id
+            task = ingest_remote.apply_async(**remote_kwargs)
        except Exception as err:
            current_app.logger.error(
                f"Error uploading remote source: {err}", exc_info=True
            )
+            if scoped_key:
+                _release_claim(scoped_key)
            return make_response(jsonify({"success": False}), 400)
-        return make_response(jsonify({"success": True, "task_id": task.id}), 200)
+        response_task_id = predetermined_task_id or task.id
+        response_payload = {"success": True, "task_id": response_task_id}
+        return make_response(jsonify(response_payload), 200)


@sources_upload_ns.route("/manage_source_files")
@@ -305,6 +456,10 @@ class ManageSourceFiles(Resource):
                jsonify({"success": False, "message": "Unauthorized"}), 401
            )
        user = decoded_token.get("sub")
+        idempotency_key, key_error = _read_idempotency_key()
+        if key_error is not None:
+            return key_error
+        scoped_key = _scoped_idempotency_key(idempotency_key, user)
        source_id = request.form.get("source_id")
        operation = request.form.get("operation")

@@ -329,15 +484,8 @@ class ManageSourceFiles(Resource):
                400,
            )
        try:
-            ObjectId(source_id)
-        except Exception:
-            return make_response(
-                jsonify({"success": False, "message": "Invalid source ID format"}), 400
-            )
-        try:
-            source = sources_collection.find_one(
-                {"_id": ObjectId(source_id), "user": user}
-            )
+            with db_readonly() as conn:
+                source = SourcesRepository(conn).get_any(source_id, user)
            if not source:
                return make_response(
                    jsonify(
@@ -353,6 +501,13 @@ class ManageSourceFiles(Resource):
            return make_response(
                jsonify({"success": False, "message": "Database error"}), 500
            )
+        resolved_source_id = str(source["id"])
+        # Flips to True after each branch's ``apply_async`` returns
+        # successfully — at that point the worker owns the predetermined
+        # task_id. The outer ``except`` only releases the claim while
+        # this is False, so a post-``apply_async`` failure (jsonify,
+        # make_response, etc.) doesn't double-enqueue on the next retry.
+        claim_transferred = False
        try:
            storage = StorageCreator.get_storage()
            source_file_path = source.get("file_path", "")
@@ -385,6 +540,21 @@ class ManageSourceFiles(Resource):
                        ),
                        400,
                    )
+
+                # Claim before any storage mutation so a duplicate request
+                # short-circuits without touching the filesystem. Mirrors
+                # the pattern in ``UploadFile.post`` / ``UploadRemote.post``
+                # — without it ``.delay()`` would enqueue twice for two
+                # racing same-key POSTs (the worker decorator only
+                # deduplicates *after* completion).
+                predetermined_task_id = None
+                if scoped_key:
+                    predetermined_task_id, cached = _claim_task_or_get_cached(
+                        scoped_key, "reingest_source_task",
+                    )
+                    if cached is not None:
+                        return make_response(jsonify(cached), 200)
+
                added_files = []
                map_updated = False

@@ -411,15 +581,24 @@ class ManageSourceFiles(Resource):
                            map_updated = True

                if map_updated:
-                    sources_collection.update_one(
-                        {"_id": ObjectId(source_id)},
-                        {"$set": {"file_name_map": file_name_map}},
-                    )
+                    with db_session() as conn:
+                        SourcesRepository(conn).update(
+                            resolved_source_id, user,
+                            {"file_name_map": dict(file_name_map)},
+                        )
                # Trigger re-ingestion pipeline

                from application.api.user.tasks import reingest_source_task

-                task = reingest_source_task.delay(source_id=source_id, user=user)
+                task = reingest_source_task.apply_async(
+                    kwargs={
+                        "source_id": resolved_source_id,
+                        "user": user,
+                        "idempotency_key": scoped_key or idempotency_key,
+                    },
+                    task_id=predetermined_task_id,
+                )
+                claim_transferred = True

                return make_response(
                    jsonify(
@@ -458,10 +637,8 @@ class ManageSourceFiles(Resource):
                        ),
                        400,
                    )
-                # Remove files from storage and directory structure
-
-                removed_files = []
-                map_updated = False
+                # Path-traversal guard runs *before* the claim so a 400
+                # for an invalid path doesn't leave a pending dedup row.
                for file_path in file_paths:
                    if ".." in str(file_path) or str(file_path).startswith("/"):
                        return make_response(
@@ -473,6 +650,22 @@ class ManageSourceFiles(Resource):
                            ),
                            400,
                        )
+
+                # Claim before any storage mutation. See ``add`` branch
+                # comment for rationale.
+                predetermined_task_id = None
+                if scoped_key:
+                    predetermined_task_id, cached = _claim_task_or_get_cached(
+                        scoped_key, "reingest_source_task",
+                    )
+                    if cached is not None:
+                        return make_response(jsonify(cached), 200)
+
+                # Remove files from storage and directory structure
+
+                removed_files = []
+                map_updated = False
+                for file_path in file_paths:
                    full_path = f"{source_file_path}/{file_path}"

                    # Remove from storage
@@ -485,15 +678,24 @@ class ManageSourceFiles(Resource):
                        map_updated = True

                if map_updated and isinstance(file_name_map, dict):
-                    sources_collection.update_one(
-                        {"_id": ObjectId(source_id)},
-                        {"$set": {"file_name_map": file_name_map}},
-                    )
+                    with db_session() as conn:
+                        SourcesRepository(conn).update(
+                            resolved_source_id, user,
+                            {"file_name_map": dict(file_name_map)},
+                        )
                # Trigger re-ingestion pipeline

                from application.api.user.tasks import reingest_source_task

-                task = reingest_source_task.delay(source_id=source_id, user=user)
+                task = reingest_source_task.apply_async(
+                    kwargs={
+                        "source_id": resolved_source_id,
+                        "user": user,
+                        "idempotency_key": scoped_key or idempotency_key,
+                    },
+                    task_id=predetermined_task_id,
+                )
+                claim_transferred = True

                return make_response(
                    jsonify(
@@ -552,6 +754,16 @@ class ManageSourceFiles(Resource):
                        ),
                        404,
                    )
+
+                # Claim before mutation. See ``add`` branch for rationale.
+                predetermined_task_id = None
+                if scoped_key:
+                    predetermined_task_id, cached = _claim_task_or_get_cached(
+                        scoped_key, "reingest_source_task",
+                    )
+                    if cached is not None:
+                        return make_response(jsonify(cached), 200)
+
                success = storage.remove_directory(full_directory_path)

                if not success:
@@ -560,6 +772,11 @@ class ManageSourceFiles(Resource):
                        f"User: {user}, Source ID: {source_id}, Directory path: {directory_path}, "
                        f"Full path: {full_directory_path}"
                    )
+                    # Release so a client retry can reclaim — otherwise
+                    # the next request would silently 200-cache to the
+                    # task_id that never enqueued.
+                    if scoped_key:
+                        _release_claim(scoped_key)
                    return make_response(
                        jsonify(
                            {"success": False, "message": "Failed to remove directory"}
@@ -581,16 +798,25 @@ class ManageSourceFiles(Resource):
                    if keys_to_remove:
                        for key in keys_to_remove:
                            file_name_map.pop(key, None)
-                        sources_collection.update_one(
-                            {"_id": ObjectId(source_id)},
-                            {"$set": {"file_name_map": file_name_map}},
-                        )
+                        with db_session() as conn:
+                            SourcesRepository(conn).update(
+                                resolved_source_id, user,
+                                {"file_name_map": dict(file_name_map)},
+                            )

                # Trigger re-ingestion pipeline

                from application.api.user.tasks import reingest_source_task

-                task = reingest_source_task.delay(source_id=source_id, user=user)
+                task = reingest_source_task.apply_async(
+                    kwargs={
+                        "source_id": resolved_source_id,
+                        "user": user,
+                        "idempotency_key": scoped_key or idempotency_key,
+                    },
+                    task_id=predetermined_task_id,
+                )
+                claim_transferred = True

                return make_response(
                    jsonify(
@@ -604,6 +830,14 @@ class ManageSourceFiles(Resource):
                    200,
                )
        except Exception as err:
+            # Release the dedup claim only if it wasn't transferred to
+            # a worker. Without this, a same-key retry within the 24h
+            # TTL would 200-cache to a predetermined task_id whose
+            # ``apply_async`` never ran (or ran but the response builder
+            # blew up afterward — only the first case matters in
+            # practice; the flag protects both).
+            if scoped_key and not claim_transferred:
+                _release_claim(scoped_key)
            error_context = f"operation={operation}, user={user}, source_id={source_id}"
            if operation == "remove_directory":
                directory_path = request.form.get("directory_path", "")
--- a/application/api/user/tasks.py
+++ b/application/api/user/tasks.py
@@ -1,5 +1,6 @@
 from datetime import timedelta

+from application.api.user.idempotency import with_idempotency
 from application.celery_init import celery
 from application.worker import (
    agent_webhook_worker,
@@ -13,9 +14,32 @@ from application.worker import (
 )


-@celery.task(bind=True)
+# Shared decorator config for long-running, side-effecting tasks. ``acks_late``
+# is also the celeryconfig default but stays explicit here so each task's
+# durability story is grep-able next to the body. Combined with
+# ``autoretry_for=(Exception,)`` and a bounded ``max_retries`` so a poison
+# message can't loop forever.
+DURABLE_TASK = dict(
+    bind=True,
+    acks_late=True,
+    autoretry_for=(Exception,),
+    retry_kwargs={"max_retries": 3, "countdown": 60},
+    retry_backoff=True,
+)
+
+
+@celery.task(**DURABLE_TASK)
+@with_idempotency(task_name="ingest")
 def ingest(
-    self, directory, formats, job_name, user, file_path, filename, file_name_map=None
+    self,
+    directory,
+    formats,
+    job_name,
+    user,
+    file_path,
+    filename,
+    file_name_map=None,
+    idempotency_key=None,
 ):
    resp = ingest_worker(
        self,
@@ -26,25 +50,35 @@ def ingest(
        filename,
        user,
        file_name_map=file_name_map,
+        idempotency_key=idempotency_key,
    )
    return resp


-@celery.task(bind=True)
-def ingest_remote(self, source_data, job_name, user, loader):
-    resp = remote_worker(self, source_data, job_name, user, loader)
+@celery.task(**DURABLE_TASK)
+@with_idempotency(task_name="ingest_remote")
+def ingest_remote(self, source_data, job_name, user, loader, idempotency_key=None):
+    resp = remote_worker(
+        self, source_data, job_name, user, loader,
+        idempotency_key=idempotency_key,
+    )
    return resp


-@celery.task(bind=True)
-def reingest_source_task(self, source_id, user):
+@celery.task(**DURABLE_TASK)
+@with_idempotency(task_name="reingest_source_task")
+def reingest_source_task(self, source_id, user, idempotency_key=None):
    from application.worker import reingest_source_worker

    resp = reingest_source_worker(self, source_id, user)
    return resp


-@celery.task(bind=True)
+# Beat-driven dispatch tasks default to ``acks_late=False``: a SIGKILL
+# of a beat tick is harmless to redeliver only if the dispatch itself is
+# idempotent. We keep these early-ACK so the broker doesn't replay a
+# dispatch that already enqueued downstream work.
+@celery.task(bind=True, acks_late=False)
 def schedule_syncs(self, frequency):
    resp = sync_worker(self, frequency)
    return resp
@@ -74,19 +108,22 @@ def sync_source(
    return resp


-@celery.task(bind=True)
-def store_attachment(self, file_info, user):
+@celery.task(**DURABLE_TASK)
+@with_idempotency(task_name="store_attachment")
+def store_attachment(self, file_info, user, idempotency_key=None):
    resp = attachment_worker(self, file_info, user)
    return resp


-@celery.task(bind=True)
-def process_agent_webhook(self, agent_id, payload):
+@celery.task(**DURABLE_TASK)
+@with_idempotency(task_name="process_agent_webhook")
+def process_agent_webhook(self, agent_id, payload, idempotency_key=None):
    resp = agent_webhook_worker(self, agent_id, payload)
    return resp


-@celery.task(bind=True)
+@celery.task(**DURABLE_TASK)
+@with_idempotency(task_name="ingest_connector_task")
 def ingest_connector_task(
    self,
    job_name,
@@ -100,6 +137,7 @@ def ingest_connector_task(
    operation_mode="upload",
    doc_id=None,
    sync_frequency="never",
+    idempotency_key=None,
 ):
    from application.worker import ingest_connector

@@ -116,6 +154,7 @@ def ingest_connector_task(
        operation_mode=operation_mode,
        doc_id=doc_id,
        sync_frequency=sync_frequency,
+        idempotency_key=idempotency_key,
    )
    return resp

@@ -140,6 +179,24 @@ def setup_periodic_tasks(sender, **kwargs):
        cleanup_pending_tool_state.s(),
        name="cleanup-pending-tool-state",
    )
+    # Pure housekeeping for ``task_dedup`` / ``webhook_dedup`` — the
+    # upsert paths already handle stale rows, so cadence only bounds
+    # table size. Hourly is plenty for typical traffic.
+    sender.add_periodic_task(
+        timedelta(hours=1),
+        cleanup_idempotency_dedup.s(),
+        name="cleanup-idempotency-dedup",
+    )
+    sender.add_periodic_task(
+        timedelta(seconds=30),
+        reconciliation_task.s(),
+        name="reconciliation",
+    )
+    sender.add_periodic_task(
+        timedelta(hours=7),
+        version_check_task.s(),
+        name="version-check",
+    )


@celery.task(bind=True)
@@ -154,18 +211,12 @@ def mcp_oauth_status_task(self, task_id):
    return resp


-@celery.task(bind=True)
+@celery.task(bind=True, acks_late=False)
 def cleanup_pending_tool_state(self):
-    """Delete pending_tool_state rows past their TTL.
-
-    Replaces Mongo's ``expireAfterSeconds=0`` TTL index — Postgres has
-    no native TTL, so this task runs every 60 seconds to keep
-    ``pending_tool_state`` bounded. No-ops if ``POSTGRES_URI`` isn't
-    configured (keeps the task runnable in Mongo-only environments).
-    """
+    """Revert stale ``resuming`` rows, then delete TTL-expired rows."""
    from application.core.settings import settings
    if not settings.POSTGRES_URI:
-        return {"deleted": 0, "skipped": "POSTGRES_URI not set"}
+        return {"deleted": 0, "reverted": 0, "skipped": "POSTGRES_URI not set"}

    from application.storage.db.engine import get_engine
    from application.storage.db.repositories.pending_tool_state import (
@@ -174,5 +225,54 @@ def cleanup_pending_tool_state(self):

    engine = get_engine()
    with engine.begin() as conn:
-        deleted = PendingToolStateRepository(conn).cleanup_expired()
-    return {"deleted": deleted}
+        repo = PendingToolStateRepository(conn)
+        reverted = repo.revert_stale_resuming(grace_seconds=600)
+        deleted = repo.cleanup_expired()
+    return {"deleted": deleted, "reverted": reverted}
+
+
+@celery.task(bind=True, acks_late=False)
+def cleanup_idempotency_dedup(self):
+    """Delete TTL-expired rows from ``task_dedup`` and ``webhook_dedup``.
+
+    Pure housekeeping — the upsert paths already ignore stale rows
+    (TTL-aware ``ON CONFLICT DO UPDATE``), so this only bounds table
+    growth and keeps SELECT planning tight on large deployments.
+    """
+    from application.core.settings import settings
+    if not settings.POSTGRES_URI:
+        return {
+            "task_dedup_deleted": 0,
+            "webhook_dedup_deleted": 0,
+            "skipped": "POSTGRES_URI not set",
+        }
+
+    from application.storage.db.engine import get_engine
+    from application.storage.db.repositories.idempotency import (
+        IdempotencyRepository,
+    )
+
+    engine = get_engine()
+    with engine.begin() as conn:
+        return IdempotencyRepository(conn).cleanup_expired()
+
+
+@celery.task(bind=True, acks_late=False)
+def reconciliation_task(self):
+    """Sweep stuck durability rows and escalate them to terminal status + alert."""
+    from application.api.user.reconciliation import run_reconciliation
+
+    return run_reconciliation()
+
+
+@celery.task(bind=True, acks_late=False)
+def version_check_task(self):
+    """Periodic anonymous version check.
+
+    Complements the ``worker_ready`` boot trigger so long-running
+    deployments (>6h cache TTL) still refresh advisories. ``run_check``
+    is fail-silent and coordinates across replicas via Redis lock +
+    cache (see ``application.updates.version_check``).
+    """
+    from application.updates.version_check import run_check
+    run_check()
--- a/application/api/user/tools/mcp.py
+++ b/application/api/user/tools/mcp.py
@@ -3,27 +3,24 @@
 import json
 from urllib.parse import urlencode, urlparse

-from bson.objectid import ObjectId
 from flask import current_app, jsonify, make_response, redirect, request
 from flask_restx import Namespace, Resource, fields

 from application.agents.tools.mcp_tool import MCPOAuthManager, MCPTool
 from application.api import api
-from application.api.user.base import user_tools_collection
 from application.api.user.tools.routes import transform_actions
 from application.cache import get_redis_instance
-from application.core.mongo_db import MongoDB
-from application.core.settings import settings
 from application.core.url_validation import SSRFError, validate_url
 from application.security.encryption import decrypt_credentials, encrypt_credentials
+from application.storage.db.repositories.connector_sessions import (
+    ConnectorSessionsRepository,
+)
+from application.storage.db.repositories.user_tools import UserToolsRepository
+from application.storage.db.session import db_readonly, db_session
 from application.utils import check_required_fields

 tools_mcp_ns = Namespace("tools", description="Tool management operations", path="/api")

-_mongo = MongoDB.get_client()
-_db = _mongo[settings.MONGO_DB_NAME]
-_connector_sessions = _db["connector_sessions"]
-
 _ALLOWED_TRANSPORTS = {"auto", "sse", "http"}


@@ -252,15 +249,18 @@ class MCPServerSave(Resource):
            storage_config = config.copy()

            tool_id = data.get("id")
+            existing_doc = None
            existing_encrypted = None
            if tool_id:
-                existing_doc = user_tools_collection.find_one(
-                    {"_id": ObjectId(tool_id), "user": user, "name": "mcp_tool"}
-                )
-                if existing_doc:
-                    existing_encrypted = existing_doc.get("config", {}).get(
+                with db_readonly() as conn:
+                    repo = UserToolsRepository(conn)
+                    existing_doc = repo.get_any(tool_id, user)
+                if existing_doc and existing_doc.get("name") == "mcp_tool":
+                    existing_encrypted = (existing_doc.get("config") or {}).get(
                        "encrypted_credentials"
                    )
+                else:
+                    existing_doc = None

            if auth_credentials:
                if existing_encrypted:
@@ -283,47 +283,88 @@ class MCPServerSave(Resource):
            ]:
                storage_config.pop(field, None)
            transformed_actions = transform_actions(actions_metadata)
-            tool_data = {
-                "name": "mcp_tool",
-                "displayName": data["displayName"],
-                "customName": data["displayName"],
-                "description": f"MCP Server: {storage_config.get('server_url', 'Unknown')}",
-                "config": storage_config,
-                "actions": transformed_actions,
-                "status": data.get("status", True),
-                "user": user,
-            }

-            if tool_id:
-                result = user_tools_collection.update_one(
-                    {"_id": ObjectId(tool_id), "user": user, "name": "mcp_tool"},
-                    {"$set": {k: v for k, v in tool_data.items() if k != "user"}},
-                )
-                if result.matched_count == 0:
-                    return make_response(
-                        jsonify(
-                            {
-                                "success": False,
-                                "error": "Tool not found or access denied",
-                            }
-                        ),
-                        404,
+            display_name = data["displayName"]
+            description = f"MCP Server: {storage_config.get('server_url', 'Unknown')}"
+            status_bool = bool(data.get("status", True))
+
+            with db_session() as conn:
+                repo = UserToolsRepository(conn)
+                if existing_doc:
+                    repo.update(
+                        str(existing_doc["id"]), user,
+                        {
+                            "display_name": display_name,
+                            "custom_name": display_name,
+                            "description": description,
+                            "config": storage_config,
+                            "actions": transformed_actions,
+                            "status": status_bool,
+                        },
                    )
-                response_data = {
-                    "success": True,
-                    "id": tool_id,
-                    "message": f"MCP server updated successfully! Discovered {len(transformed_actions)} tools.",
-                    "tools_count": len(transformed_actions),
-                }
-            else:
-                result = user_tools_collection.insert_one(tool_data)
-                tool_id = str(result.inserted_id)
-                response_data = {
-                    "success": True,
-                    "id": tool_id,
-                    "message": f"MCP server created successfully! Discovered {len(transformed_actions)} tools.",
-                    "tools_count": len(transformed_actions),
-                }
+                    saved_id = str(existing_doc["id"])
+                    response_data = {
+                        "success": True,
+                        "id": saved_id,
+                        "message": f"MCP server updated successfully! Discovered {len(transformed_actions)} tools.",
+                        "tools_count": len(transformed_actions),
+                    }
+                else:
+                    # Fall back to find_by_user_and_name — the original
+                    # dual-write path also ran an existence check before
+                    # deciding between insert and update.
+                    existing_by_name = repo.find_by_user_and_name(user, "mcp_tool")
+                    if tool_id is None and existing_by_name and (
+                        (existing_by_name.get("config") or {}).get("server_url")
+                        == storage_config.get("server_url")
+                    ):
+                        repo.update(
+                            str(existing_by_name["id"]), user,
+                            {
+                                "display_name": display_name,
+                                "custom_name": display_name,
+                                "description": description,
+                                "config": storage_config,
+                                "actions": transformed_actions,
+                                "status": status_bool,
+                            },
+                        )
+                        saved_id = str(existing_by_name["id"])
+                        response_data = {
+                            "success": True,
+                            "id": saved_id,
+                            "message": f"MCP server updated successfully! Discovered {len(transformed_actions)} tools.",
+                            "tools_count": len(transformed_actions),
+                        }
+                    else:
+                        created = repo.create(
+                            user, "mcp_tool",
+                            config=storage_config,
+                            custom_name=display_name,
+                            display_name=display_name,
+                            description=description,
+                            config_requirements={},
+                            actions=transformed_actions,
+                            status=status_bool,
+                        )
+                        saved_id = str(created["id"])
+                        response_data = {
+                            "success": True,
+                            "id": saved_id,
+                            "message": f"MCP server created successfully! Discovered {len(transformed_actions)} tools.",
+                            "tools_count": len(transformed_actions),
+                        }
+                    if tool_id and existing_doc is None:
+                        # Client requested update on a non-existent tool id.
+                        return make_response(
+                            jsonify(
+                                {
+                                    "success": False,
+                                    "error": "Tool not found or access denied",
+                                }
+                            ),
+                            404,
+                        )
            return make_response(jsonify(response_data), 200)
        except ValueError as e:
            current_app.logger.warning(f"Invalid MCP server save request: {e}")
@@ -459,49 +500,59 @@ class MCPAuthStatus(Resource):
            return make_response(jsonify({"success": False}), 401)
        user = decoded_token.get("sub")
        try:
-            mcp_tools = list(
-                user_tools_collection.find(
-                    {"user": user, "name": "mcp_tool"},
-                    {"_id": 1, "config": 1},
-                )
-            )
-            if not mcp_tools:
-                return make_response(jsonify({"success": True, "statuses": {}}), 200)
-
-            oauth_server_urls = {}
-            statuses = {}
-            for tool in mcp_tools:
-                tool_id = str(tool["_id"])
-                config = tool.get("config", {})
-                auth_type = config.get("auth_type", "none")
-                if auth_type == "oauth":
-                    server_url = config.get("server_url", "")
-                    if server_url:
-                        parsed = urlparse(server_url)
-                        base_url = f"{parsed.scheme}://{parsed.netloc}"
-                        oauth_server_urls[tool_id] = base_url
-                    else:
-                        statuses[tool_id] = "needs_auth"
-                else:
-                    statuses[tool_id] = "configured"
-
-            if oauth_server_urls:
-                unique_urls = list(set(oauth_server_urls.values()))
-                sessions = list(
-                    _connector_sessions.find(
-                        {"user_id": user, "server_url": {"$in": unique_urls}},
-                        {"server_url": 1, "tokens": 1},
+            with db_readonly() as conn:
+                tools_repo = UserToolsRepository(conn)
+                sessions_repo = ConnectorSessionsRepository(conn)
+                all_tools = tools_repo.list_for_user(user)
+                mcp_tools = [t for t in all_tools if t.get("name") == "mcp_tool"]
+                if not mcp_tools:
+                    return make_response(
+                        jsonify({"success": True, "statuses": {}}), 200
                    )
-                )
-                url_has_tokens = {
-                    doc["server_url"]: bool(doc.get("tokens", {}).get("access_token"))
-                    for doc in sessions
-                }
-                for tool_id, base_url in oauth_server_urls.items():
-                    if url_has_tokens.get(base_url):
-                        statuses[tool_id] = "connected"
+
+                oauth_server_urls: dict = {}
+                statuses: dict = {}
+                for tool in mcp_tools:
+                    tool_id = str(tool["id"])
+                    config = tool.get("config") or {}
+                    auth_type = config.get("auth_type", "none")
+                    if auth_type == "oauth":
+                        server_url = config.get("server_url", "")
+                        if server_url:
+                            parsed = urlparse(server_url)
+                            base_url = f"{parsed.scheme}://{parsed.netloc}"
+                            oauth_server_urls[tool_id] = base_url
+                        else:
+                            statuses[tool_id] = "needs_auth"
                    else:
-                        statuses[tool_id] = "needs_auth"
+                        statuses[tool_id] = "configured"
+
+                if oauth_server_urls:
+                    # Look up a session per distinct base URL. MCP sessions
+                    # are stored with ``provider = "mcp:<server_url>"``
+                    # and the URL in ``server_url``; reuse the repo's
+                    # per-URL accessor rather than an ad-hoc $in query.
+                    url_has_tokens: dict = {}
+                    for base_url in set(oauth_server_urls.values()):
+                        session = sessions_repo.get_by_user_and_server_url(
+                            user, base_url,
+                        )
+                        tokens = (
+                            (session or {}).get("session_data", {}) or {}
+                        ).get("tokens", {}) or {}
+                        # MCP code also stashes tokens into token_info on
+                        # the row; consider either present as "connected".
+                        token_info = (session or {}).get("token_info") or {}
+                        url_has_tokens[base_url] = bool(
+                            tokens.get("access_token")
+                            or token_info.get("access_token")
+                        )
+
+                    for tool_id, base_url in oauth_server_urls.items():
+                        if url_has_tokens.get(base_url):
+                            statuses[tool_id] = "connected"
+                        else:
+                            statuses[tool_id] = "needs_auth"

            return make_response(jsonify({"success": True, "statuses": statuses}), 200)
        except Exception as e:
--- a/application/api/user/tools/routes.py
+++ b/application/api/user/tools/routes.py
@@ -1,23 +1,59 @@
 """Tool management routes."""

-from bson.objectid import ObjectId
 from flask import current_app, jsonify, make_response, request
 from flask_restx import fields, Namespace, Resource

 from application.agents.tools.spec_parser import parse_spec
 from application.agents.tools.tool_manager import ToolManager
 from application.api import api
-from application.api.user.base import user_tools_collection
 from application.core.url_validation import SSRFError, validate_url
-from application.storage.db.dual_write import dual_write
-from application.storage.db.repositories.user_tools import UserToolsRepository
 from application.security.encryption import decrypt_credentials, encrypt_credentials
+from application.storage.db.repositories.notes import NotesRepository
+from application.storage.db.repositories.todos import TodosRepository
+from application.storage.db.repositories.user_tools import UserToolsRepository
+from application.storage.db.session import db_readonly, db_session
 from application.utils import check_required_fields, validate_function_name

 tool_config = {}
 tool_manager = ToolManager(config=tool_config)


+# ---------------------------------------------------------------------------
+# Shape translation helpers
+# ---------------------------------------------------------------------------
+# The frontend speaks camelCase (``displayName`` / ``customName`` /
+# ``configRequirements``). The PG ``user_tools`` table stores snake_case
+# (``display_name`` / ``custom_name`` / ``config_requirements``). Keep the
+# translation localized to this module so repositories stay pure.
+
+_CAMEL_TO_SNAKE = {
+    "displayName": "display_name",
+    "customName": "custom_name",
+    "configRequirements": "config_requirements",
+}
+_SNAKE_TO_CAMEL = {v: k for k, v in _CAMEL_TO_SNAKE.items()}
+
+
+def _row_to_api(row: dict) -> dict:
+    """Rename DB-native snake_case keys to the camelCase shape the frontend expects."""
+    out = dict(row)
+    for snake, camel in _SNAKE_TO_CAMEL.items():
+        if snake in out:
+            out[camel] = out.pop(snake)
+    # ``user_id`` is exposed as ``user`` in the legacy API shape.
+    if "user_id" in out:
+        out["user"] = out.pop("user_id")
+    return out
+
+
+def _api_to_update_fields(data: dict) -> dict:
+    """Rename incoming camelCase update keys to the repo's snake_case columns."""
+    fields_out: dict = {}
+    for key, value in data.items():
+        fields_out[_CAMEL_TO_SNAKE.get(key, key)] = value
+    return fields_out
+
+
 def _encrypt_secret_fields(config, config_requirements, user_id):
    secret_keys = [
        key for key, spec in config_requirements.items()
@@ -170,12 +206,11 @@ class GetTools(Resource):
            if not decoded_token:
                return make_response(jsonify({"success": False}), 401)
            user = decoded_token.get("sub")
-            tools = user_tools_collection.find({"user": user})
+            with db_readonly() as conn:
+                rows = UserToolsRepository(conn).list_for_user(user)
            user_tools = []
-            for tool in tools:
-                tool_copy = {**tool}
-                tool_copy["id"] = str(tool["_id"])
-                tool_copy.pop("_id", None)
+            for row in rows:
+                tool_copy = _row_to_api(row)

                config_req = tool_copy.get("configRequirements", {})
                if not config_req:
@@ -283,26 +318,19 @@ class CreateTool(Resource):
            storage_config = _encrypt_secret_fields(
                data["config"], config_requirements, user
            )
-            new_tool = {
-                "user": user,
-                "name": data["name"],
-                "displayName": data["displayName"],
-                "description": data["description"],
-                "customName": data.get("customName", ""),
-                "actions": transformed_actions,
-                "config": storage_config,
-                "configRequirements": config_requirements,
-                "status": data["status"],
-            }
-            resp = user_tools_collection.insert_one(new_tool)
-            new_id = str(resp.inserted_id)
-            dual_write(
-                UserToolsRepository,
-                lambda repo, u=user, t=new_tool: repo.create(
-                    u, t["name"], config=t.get("config"),
-                    custom_name=t.get("customName"), display_name=t.get("displayName"),
-                ),
-            )
+            with db_session() as conn:
+                created = UserToolsRepository(conn).create(
+                    user,
+                    data["name"],
+                    config=storage_config,
+                    custom_name=data.get("customName", ""),
+                    display_name=data["displayName"],
+                    description=data["description"],
+                    config_requirements=config_requirements,
+                    actions=transformed_actions,
+                    status=bool(data.get("status", True)),
+                )
+            new_id = str(created["id"])
        except Exception as err:
            current_app.logger.error(f"Error creating tool: {err}", exc_info=True)
            return make_response(jsonify({"success": False}), 400)
@@ -340,17 +368,10 @@ class UpdateTool(Resource):
        if missing_fields:
            return missing_fields
        try:
-            update_data = {}
-            if "name" in data:
-                update_data["name"] = data["name"]
-            if "displayName" in data:
-                update_data["displayName"] = data["displayName"]
-            if "customName" in data:
-                update_data["customName"] = data["customName"]
-            if "description" in data:
-                update_data["description"] = data["description"]
-            if "actions" in data:
-                update_data["actions"] = data["actions"]
+            update_data: dict = {}
+            for key in ("name", "displayName", "customName", "description", "actions"):
+                if key in data:
+                    update_data[key] = data[key]
            if "config" in data:
                if "actions" in data["config"]:
                    for action_name in list(data["config"]["actions"].keys()):
@@ -365,46 +386,61 @@ class UpdateTool(Resource):
                                ),
                                400,
                            )
-                tool_doc = user_tools_collection.find_one(
-                    {"_id": ObjectId(data["id"]), "user": user}
-                )
-                if not tool_doc:
-                    return make_response(
-                        jsonify({"success": False, "message": "Tool not found"}),
-                        404,
-                    )
-                tool_name = tool_doc.get("name", data.get("name"))
-                tool_instance = tool_manager.tools.get(tool_name)
-                config_requirements = (
-                    tool_instance.get_config_requirements() if tool_instance else {}
-                )
-                existing_config = tool_doc.get("config", {})
-                has_existing_secrets = "encrypted_credentials" in existing_config
-
-                if config_requirements:
-                    validation_errors = _validate_config(
-                        data["config"], config_requirements,
-                        has_existing_secrets=has_existing_secrets,
-                    )
-                    if validation_errors:
+                with db_session() as conn:
+                    repo = UserToolsRepository(conn)
+                    tool_doc = repo.get_any(data["id"], user)
+                    if not tool_doc:
                        return make_response(
-                            jsonify({
-                                "success": False,
-                                "message": "Validation failed",
-                                "errors": validation_errors,
-                            }),
-                            400,
+                            jsonify({"success": False, "message": "Tool not found"}),
+                            404,
                        )
+                    tool_name = tool_doc.get("name", data.get("name"))
+                    tool_instance = tool_manager.tools.get(tool_name)
+                    config_requirements = (
+                        tool_instance.get_config_requirements()
+                        if tool_instance
+                        else {}
+                    )
+                    existing_config = tool_doc.get("config", {}) or {}
+                    has_existing_secrets = "encrypted_credentials" in existing_config

-                update_data["config"] = _merge_secrets_on_update(
-                    data["config"], existing_config, config_requirements, user
-                )
-            if "status" in data:
-                update_data["status"] = data["status"]
-            user_tools_collection.update_one(
-                {"_id": ObjectId(data["id"]), "user": user},
-                {"$set": update_data},
-            )
+                    if config_requirements:
+                        validation_errors = _validate_config(
+                            data["config"], config_requirements,
+                            has_existing_secrets=has_existing_secrets,
+                        )
+                        if validation_errors:
+                            return make_response(
+                                jsonify({
+                                    "success": False,
+                                    "message": "Validation failed",
+                                    "errors": validation_errors,
+                                }),
+                                400,
+                            )
+
+                    update_data["config"] = _merge_secrets_on_update(
+                        data["config"], existing_config, config_requirements, user
+                    )
+                    if "status" in data:
+                        update_data["status"] = bool(data["status"])
+                    repo.update(
+                        str(tool_doc["id"]), user, _api_to_update_fields(update_data),
+                    )
+            else:
+                if "status" in data:
+                    update_data["status"] = bool(data["status"])
+                with db_session() as conn:
+                    repo = UserToolsRepository(conn)
+                    tool_doc = repo.get_any(data["id"], user)
+                    if not tool_doc:
+                        return make_response(
+                            jsonify({"success": False, "message": "Tool not found"}),
+                            404,
+                        )
+                    repo.update(
+                        str(tool_doc["id"]), user, _api_to_update_fields(update_data),
+                    )
        except Exception as err:
            current_app.logger.error(f"Error updating tool: {err}", exc_info=True)
            return make_response(jsonify({"success": False}), 400)
@@ -436,53 +472,50 @@ class UpdateToolConfig(Resource):
        if missing_fields:
            return missing_fields
        try:
-            tool_doc = user_tools_collection.find_one(
-                {"_id": ObjectId(data["id"]), "user": user}
-            )
-            if not tool_doc:
-                return make_response(jsonify({"success": False}), 404)
+            with db_session() as conn:
+                repo = UserToolsRepository(conn)
+                tool_doc = repo.get_any(data["id"], user)
+                if not tool_doc:
+                    return make_response(jsonify({"success": False}), 404)

-            tool_name = tool_doc.get("name")
-            if tool_name == "mcp_tool":
-                server_url = (data["config"].get("server_url") or "").strip()
-                if server_url:
-                    try:
-                        validate_url(server_url)
-                    except SSRFError:
+                tool_name = tool_doc.get("name")
+                if tool_name == "mcp_tool":
+                    server_url = (data["config"].get("server_url") or "").strip()
+                    if server_url:
+                        try:
+                            validate_url(server_url)
+                        except SSRFError:
+                            return make_response(
+                                jsonify({"success": False, "message": "Invalid server URL"}),
+                                400,
+                            )
+                tool_instance = tool_manager.tools.get(tool_name)
+                config_requirements = (
+                    tool_instance.get_config_requirements() if tool_instance else {}
+                )
+                existing_config = tool_doc.get("config", {}) or {}
+                has_existing_secrets = "encrypted_credentials" in existing_config
+
+                if config_requirements:
+                    validation_errors = _validate_config(
+                        data["config"], config_requirements,
+                        has_existing_secrets=has_existing_secrets,
+                    )
+                    if validation_errors:
                        return make_response(
-                            jsonify({"success": False, "message": "Invalid server URL"}),
+                            jsonify({
+                                "success": False,
+                                "message": "Validation failed",
+                                "errors": validation_errors,
+                            }),
                            400,
                        )
-            tool_instance = tool_manager.tools.get(tool_name)
-            config_requirements = (
-                tool_instance.get_config_requirements() if tool_instance else {}
-            )
-            existing_config = tool_doc.get("config", {})
-            has_existing_secrets = "encrypted_credentials" in existing_config

-            if config_requirements:
-                validation_errors = _validate_config(
-                    data["config"], config_requirements,
-                    has_existing_secrets=has_existing_secrets,
+                final_config = _merge_secrets_on_update(
+                    data["config"], existing_config, config_requirements, user
                )
-                if validation_errors:
-                    return make_response(
-                        jsonify({
-                            "success": False,
-                            "message": "Validation failed",
-                            "errors": validation_errors,
-                        }),
-                        400,
-                    )

-            final_config = _merge_secrets_on_update(
-                data["config"], existing_config, config_requirements, user
-            )
-
-            user_tools_collection.update_one(
-                {"_id": ObjectId(data["id"]), "user": user},
-                {"$set": {"config": final_config}},
-            )
+                repo.update(str(tool_doc["id"]), user, {"config": final_config})
        except Exception as err:
            current_app.logger.error(
                f"Error updating tool config: {err}", exc_info=True
@@ -518,10 +551,17 @@ class UpdateToolActions(Resource):
        if missing_fields:
            return missing_fields
        try:
-            user_tools_collection.update_one(
-                {"_id": ObjectId(data["id"]), "user": user},
-                {"$set": {"actions": data["actions"]}},
-            )
+            with db_session() as conn:
+                repo = UserToolsRepository(conn)
+                tool_doc = repo.get_any(data["id"], user)
+                if not tool_doc:
+                    return make_response(
+                        jsonify({"success": False, "message": "Tool not found"}),
+                        404,
+                    )
+                repo.update(
+                    str(tool_doc["id"]), user, {"actions": data["actions"]},
+                )
        except Exception as err:
            current_app.logger.error(
                f"Error updating tool actions: {err}", exc_info=True
@@ -555,10 +595,17 @@ class UpdateToolStatus(Resource):
        if missing_fields:
            return missing_fields
        try:
-            user_tools_collection.update_one(
-                {"_id": ObjectId(data["id"]), "user": user},
-                {"$set": {"status": data["status"]}},
-            )
+            with db_session() as conn:
+                repo = UserToolsRepository(conn)
+                tool_doc = repo.get_any(data["id"], user)
+                if not tool_doc:
+                    return make_response(
+                        jsonify({"success": False, "message": "Tool not found"}),
+                        404,
+                    )
+                repo.update(
+                    str(tool_doc["id"]), user, {"status": bool(data["status"])},
+                )
        except Exception as err:
            current_app.logger.error(
                f"Error updating tool status: {err}", exc_info=True
@@ -587,17 +634,14 @@ class DeleteTool(Resource):
        if missing_fields:
            return missing_fields
        try:
-            result = user_tools_collection.delete_one(
-                {"_id": ObjectId(data["id"]), "user": user}
-            )
-            dual_write(
-                UserToolsRepository,
-                lambda repo, tid=data["id"], u=user: repo.delete(tid, u),
-            )
-            if result.deleted_count == 0:
-                return make_response(
-                    jsonify({"success": False, "message": "Tool not found"}), 404
-                )
+            with db_session() as conn:
+                repo = UserToolsRepository(conn)
+                tool_doc = repo.get_any(data["id"], user)
+                if not tool_doc:
+                    return make_response(
+                        jsonify({"success": False, "message": "Tool not found"}), 404
+                    )
+                repo.delete(str(tool_doc["id"]), user)
        except Exception as err:
            current_app.logger.error(f"Error deleting tool: {err}", exc_info=True)
            return make_response(jsonify({"success": False}), 400)
@@ -666,70 +710,88 @@ class GetArtifact(Resource):
        user_id = decoded_token.get("sub")

        try:
-            obj_id = ObjectId(artifact_id)
-        except Exception:
-            return make_response(
-                jsonify({"success": False, "message": "Invalid artifact ID"}), 400
+            with db_readonly() as conn:
+                notes_repo = NotesRepository(conn)
+                todos_repo = TodosRepository(conn)
+
+                # Artifact IDs may be PG UUIDs (post-cutover) or legacy
+                # Mongo ObjectIds embedded in older conversation history.
+                # Both repos' ``get_any`` handles the id-shape branching
+                # internally so a non-UUID input never reaches
+                # ``CAST(:id AS uuid)`` (which would poison the readonly
+                # transaction and break the fallback below).
+                note_doc = notes_repo.get_any(artifact_id, user_id)
+
+                if note_doc:
+                    content = note_doc.get("note", "") or note_doc.get("content", "")
+                    line_count = len(content.split("\n")) if content else 0
+                    updated = note_doc.get("updated_at")
+                    artifact = {
+                        "artifact_type": "note",
+                        "data": {
+                            "content": content,
+                            "line_count": line_count,
+                            "updated_at": (
+                                updated.isoformat()
+                                if hasattr(updated, "isoformat")
+                                else updated
+                            ),
+                        },
+                    }
+                    return make_response(
+                        jsonify({"success": True, "artifact": artifact}), 200
+                    )
+
+                todo_doc = todos_repo.get_any(artifact_id, user_id)
+                if todo_doc:
+                    tool_id = todo_doc.get("tool_id")
+                    all_todos = todos_repo.list_for_tool(user_id, tool_id) if tool_id else []
+                    items = []
+                    open_count = 0
+                    completed_count = 0
+                    for t in all_todos:
+                        # PG ``todos`` stores a ``completed BOOLEAN`` column;
+                        # the legacy Mongo shape used a ``status`` string.
+                        # Keep the response shape stable by translating here.
+                        status = "completed" if t.get("completed") else "open"
+                        if status == "open":
+                            open_count += 1
+                        else:
+                            completed_count += 1
+                        created = t.get("created_at")
+                        updated = t.get("updated_at")
+                        items.append({
+                            "todo_id": t.get("todo_id"),
+                            "title": t.get("title", ""),
+                            "status": status,
+                            "created_at": (
+                                created.isoformat()
+                                if hasattr(created, "isoformat")
+                                else created
+                            ),
+                            "updated_at": (
+                                updated.isoformat()
+                                if hasattr(updated, "isoformat")
+                                else updated
+                            ),
+                        })
+                    artifact = {
+                        "artifact_type": "todo_list",
+                        "data": {
+                            "items": items,
+                            "total_count": len(items),
+                            "open_count": open_count,
+                            "completed_count": completed_count,
+                        },
+                    }
+                    return make_response(
+                        jsonify({"success": True, "artifact": artifact}), 200
+                    )
+        except Exception as err:
+            current_app.logger.error(
+                f"Error retrieving artifact: {err}", exc_info=True
            )
-
-        from application.core.mongo_db import MongoDB
-        from application.core.settings import settings
-
-        db = MongoDB.get_client()[settings.MONGO_DB_NAME]
-
-        note_doc = db["notes"].find_one({"_id": obj_id, "user_id": user_id})
-        if note_doc:
-            content = note_doc.get("note", "")
-            line_count = len(content.split("\n")) if content else 0
-            artifact = {
-                "artifact_type": "note",
-                "data": {
-                    "content": content,
-                    "line_count": line_count,
-                    "updated_at": (
-                        note_doc["updated_at"].isoformat()
-                        if note_doc.get("updated_at")
-                        else None
-                    ),
-                },
-            }
-            return make_response(jsonify({"success": True, "artifact": artifact}), 200)
-
-        todo_doc = db["todos"].find_one({"_id": obj_id, "user_id": user_id})
-        if todo_doc:
-            tool_id = todo_doc.get("tool_id")
-            query = {"user_id": user_id, "tool_id": tool_id}
-            all_todos = list(db["todos"].find(query))
-            items = []
-            open_count = 0
-            completed_count = 0
-            for t in all_todos:
-                status = t.get("status", "open")
-                if status == "open":
-                    open_count += 1
-                elif status == "completed":
-                    completed_count += 1
-                items.append({
-                    "todo_id": t.get("todo_id"),
-                    "title": t.get("title", ""),
-                    "status": status,
-                    "created_at": (
-                        t["created_at"].isoformat() if t.get("created_at") else None
-                    ),
-                    "updated_at": (
-                        t["updated_at"].isoformat() if t.get("updated_at") else None
-                    ),
-                })
-            artifact = {
-                "artifact_type": "todo_list",
-                "data": {
-                    "items": items,
-                    "total_count": len(items),
-                    "open_count": open_count,
-                    "completed_count": completed_count,
-                },
-            }
-            return make_response(jsonify({"success": True, "artifact": artifact}), 200)
+            return make_response(jsonify({"success": False}), 400)

        return make_response(
            jsonify({"success": False, "message": "Artifact not found"}), 404
--- a/application/api/user/utils.py
+++ b/application/api/user/utils.py
@@ -1,290 +1,61 @@
-"""Centralized utilities for API routes."""
+"""Centralized utilities for API routes.
+
+Post-Mongo-cutover slim: the old Mongo-shaped helpers (``validate_object_id``,
+``check_resource_ownership``, ``paginated_response``, ``serialize_object_id``,
+``safe_db_operation``, ``validate_enum``, ``extract_sort_params``) have been
+removed — they carried ``bson`` / ``pymongo`` imports and had zero callers.
+"""

 from functools import wraps
-from typing import Any, Callable, Dict, List, Optional, Tuple
+from typing import Callable, Optional

-from bson.errors import InvalidId
-from bson.objectid import ObjectId
 from flask import (
    Response,
-    current_app,
-    has_app_context,
    jsonify,
    make_response,
    request,
 )
-from pymongo.collection import Collection


 def get_user_id() -> Optional[str]:
-    """
-    Extract user ID from decoded JWT token.
-
-    Returns:
-        User ID string or None if not authenticated
-    """
+    """Extract user ID from decoded JWT token, or None if unauthenticated."""
    decoded_token = getattr(request, "decoded_token", None)
    return decoded_token.get("sub") if decoded_token else None


 def require_auth(func: Callable) -> Callable:
-    """
-    Decorator to require authentication for route handlers.
-
-    Usage:
-        @require_auth
-        def get(self):
-            user_id = get_user_id()
-            ...
-    """
+    """Decorator to require authentication. Returns 401 when absent."""

    @wraps(func)
    def wrapper(*args, **kwargs):
        user_id = get_user_id()
        if not user_id:
-            return error_response("Unauthorized", 401)
+            return make_response(jsonify({"success": False, "error": "Unauthorized"}), 401)
        return func(*args, **kwargs)

    return wrapper


 def success_response(
-    data: Optional[Dict[str, Any]] = None, status: int = 200
+    data=None, message: Optional[str] = None, status: int = 200
 ) -> Response:
-    """
-    Create a standardized success response.
-
-    Args:
-        data: Optional data dictionary to include in response
-        status: HTTP status code (default: 200)
-
-    Returns:
-        Flask Response object
-
-    Example:
-        return success_response({"users": [...], "total": 10})
-    """
-    response = {"success": True}
-    if data:
-        response.update(data)
-    return make_response(jsonify(response), status)
+    """Shape a successful JSON response."""
+    body = {"success": True}
+    if data is not None:
+        body["data"] = data
+    if message is not None:
+        body["message"] = message
+    return make_response(jsonify(body), status)


 def error_response(message: str, status: int = 400, **kwargs) -> Response:
-    """
-    Create a standardized error response.
-
-    Args:
-        message: Error message string
-        status: HTTP status code (default: 400)
-        **kwargs: Additional fields to include in response
-
-    Returns:
-        Flask Response object
-
-    Example:
-        return error_response("Resource not found", 404)
-        return error_response("Invalid input", 400, errors=["field1", "field2"])
-    """
-    response = {"success": False, "message": message}
-    response.update(kwargs)
-    return make_response(jsonify(response), status)
+    """Shape an error JSON response; any kwargs are merged into the body."""
+    body = {"success": False, "error": message, **kwargs}
+    return make_response(jsonify(body), status)


-def validate_object_id(
-    id_string: str, resource_name: str = "Resource"
-) -> Tuple[Optional[ObjectId], Optional[Response]]:
-    """
-    Validate and convert string to ObjectId.
-
-    Args:
-        id_string: String to convert
-        resource_name: Name of resource for error message
-
-    Returns:
-        Tuple of (ObjectId or None, error_response or None)
-
-    Example:
-        obj_id, error = validate_object_id(workflow_id, "Workflow")
-        if error:
-            return error
-    """
-    try:
-        return ObjectId(id_string), None
-    except (InvalidId, TypeError):
-        return None, error_response(f"Invalid {resource_name} ID format")
-
-
-def validate_pagination(
-    default_limit: int = 20, max_limit: int = 100
-) -> Tuple[int, int, Optional[Response]]:
-    """
-    Extract and validate pagination parameters from request.
-
-    Args:
-        default_limit: Default items per page
-        max_limit: Maximum allowed items per page
-
-    Returns:
-        Tuple of (limit, skip, error_response or None)
-
-    Example:
-        limit, skip, error = validate_pagination()
-        if error:
-            return error
-    """
-    try:
-        limit = min(int(request.args.get("limit", default_limit)), max_limit)
-        skip = int(request.args.get("skip", 0))
-        if limit < 1 or skip < 0:
-            return 0, 0, error_response("Invalid pagination parameters")
-        return limit, skip, None
-    except ValueError:
-        return 0, 0, error_response("Invalid pagination parameters")
-
-
-def check_resource_ownership(
-    collection: Collection,
-    resource_id: ObjectId,
-    user_id: str,
-    resource_name: str = "Resource",
-) -> Tuple[Optional[Dict], Optional[Response]]:
-    """
-    Check if resource exists and belongs to user.
-
-    Args:
-        collection: MongoDB collection
-        resource_id: Resource ObjectId
-        user_id: User ID string
-        resource_name: Name of resource for error messages
-
-    Returns:
-        Tuple of (resource_dict or None, error_response or None)
-
-    Example:
-        workflow, error = check_resource_ownership(
-            workflows_collection,
-            workflow_id,
-            user_id,
-            "Workflow"
-        )
-        if error:
-            return error
-    """
-    resource = collection.find_one({"_id": resource_id, "user": user_id})
-    if not resource:
-        return None, error_response(f"{resource_name} not found", 404)
-    return resource, None
-
-
-def serialize_object_id(
-    obj: Dict[str, Any], id_field: str = "_id", new_field: str = "id"
-) -> Dict[str, Any]:
-    """
-    Convert ObjectId to string in a dictionary.
-
-    Args:
-        obj: Dictionary containing ObjectId
-        id_field: Field name containing ObjectId
-        new_field: New field name for string ID
-
-    Returns:
-        Modified dictionary
-
-    Example:
-        user = serialize_object_id(user_doc)
-        # user["id"] = "507f1f77bcf86cd799439011"
-    """
-    if id_field in obj:
-        obj[new_field] = str(obj[id_field])
-        if id_field != new_field:
-            obj.pop(id_field, None)
-    return obj
-
-
-def serialize_list(items: List[Dict], serializer: Callable[[Dict], Dict]) -> List[Dict]:
-    """
-    Apply serializer function to list of items.
-
-    Args:
-        items: List of dictionaries
-        serializer: Function to apply to each item
-
-    Returns:
-        List of serialized items
-
-    Example:
-        workflows = serialize_list(workflow_docs, serialize_workflow)
-    """
-    return [serializer(item) for item in items]
-
-
-def paginated_response(
-    collection: Collection,
-    query: Dict[str, Any],
-    serializer: Callable[[Dict], Dict],
-    limit: int,
-    skip: int,
-    sort_field: str = "created_at",
-    sort_order: int = -1,
-    response_key: str = "items",
-) -> Response:
-    """
-    Create paginated response for collection query.
-
-    Args:
-        collection: MongoDB collection
-        query: Query dictionary
-        serializer: Function to serialize each item
-        limit: Items per page
-        skip: Number of items to skip
-        sort_field: Field to sort by
-        sort_order: Sort order (1=asc, -1=desc)
-        response_key: Key name for items in response
-
-    Returns:
-        Flask Response with paginated data
-
-    Example:
-        return paginated_response(
-            workflows_collection,
-            {"user": user_id},
-            serialize_workflow,
-            limit, skip,
-            response_key="workflows"
-        )
-    """
-    items = list(
-        collection.find(query).sort(sort_field, sort_order).skip(skip).limit(limit)
-    )
-    total = collection.count_documents(query)
-
-    return success_response(
-        {
-            response_key: serialize_list(items, serializer),
-            "total": total,
-            "limit": limit,
-            "skip": skip,
-        }
-    )
-
-
-def require_fields(required: List[str]) -> Callable:
-    """
-    Decorator to validate required fields in request JSON.
-
-    Args:
-        required: List of required field names
-
-    Returns:
-        Decorator function
-
-    Example:
-        @require_fields(["name", "description"])
-        def post(self):
-            data = request.get_json()
-            ...
-    """
+def require_fields(required: list) -> Callable:
+    """Decorator: return 400 if any listed field is missing/falsy in the JSON body."""

    def decorator(func: Callable) -> Callable:
        @wraps(func)
@@ -294,94 +65,11 @@ def require_fields(required: List[str]) -> Callable:
                return error_response("Request body required")
            missing = [field for field in required if not data.get(field)]
            if missing:
-                return error_response(f"Missing required fields: {', '.join(missing)}")
+                return error_response(
+                    f"Missing required fields: {', '.join(missing)}"
+                )
            return func(*args, **kwargs)

        return wrapper

    return decorator
-
-
-def safe_db_operation(
-    operation: Callable, error_message: str = "Database operation failed"
-) -> Tuple[Any, Optional[Response]]:
-    """
-    Safely execute database operation with error handling.
-
-    Args:
-        operation: Function to execute
-        error_message: Error message if operation fails
-
-    Returns:
-        Tuple of (result or None, error_response or None)
-
-    Example:
-        result, error = safe_db_operation(
-            lambda: collection.insert_one(doc),
-            "Failed to create resource"
-        )
-        if error:
-            return error
-    """
-    try:
-        result = operation()
-        return result, None
-    except Exception as err:
-        if has_app_context():
-            current_app.logger.error(f"{error_message}: {err}", exc_info=True)
-        return None, error_response(error_message)
-
-
-def validate_enum(
-    value: Any, allowed: List[Any], field_name: str
-) -> Optional[Response]:
-    """
-    Validate that value is in allowed list.
-
-    Args:
-        value: Value to validate
-        allowed: List of allowed values
-        field_name: Field name for error message
-
-    Returns:
-        error_response if invalid, None if valid
-
-    Example:
-        error = validate_enum(status, ["draft", "published"], "status")
-        if error:
-            return error
-    """
-    if value not in allowed:
-        allowed_str = ", ".join(f"'{v}'" for v in allowed)
-        return error_response(f"Invalid {field_name}. Must be one of: {allowed_str}")
-    return None
-
-
-def extract_sort_params(
-    default_field: str = "created_at",
-    default_order: str = "desc",
-    allowed_fields: Optional[List[str]] = None,
-) -> Tuple[str, int]:
-    """
-    Extract and validate sort parameters from request.
-
-    Args:
-        default_field: Default sort field
-        default_order: Default sort order ("asc" or "desc")
-        allowed_fields: List of allowed sort fields (None = no validation)
-
-    Returns:
-        Tuple of (sort_field, sort_order)
-
-    Example:
-        sort_field, sort_order = extract_sort_params(
-            allowed_fields=["name", "date", "status"]
-        )
-    """
-    sort_field = request.args.get("sort", default_field)
-    sort_order_str = request.args.get("order", default_order).lower()
-
-    if allowed_fields and sort_field not in allowed_fields:
-        sort_field = default_field
-    sort_order = -1 if sort_order_str == "desc" else 1
-    return sort_field, sort_order
--- a/application/api/user/workflows/routes.py
+++ b/application/api/user/workflows/routes.py
@@ -1,34 +1,26 @@
 """Workflow management routes."""

-from datetime import datetime, timezone
 from typing import Any, Dict, List, Optional, Set

 from flask import current_app, request
 from flask_restx import Namespace, Resource

-from application.api.user.base import (
-    workflow_edges_collection,
-    workflow_nodes_collection,
-    workflows_collection,
-)
-from application.storage.db.dual_write import dual_write
+from application.storage.db.base_repository import looks_like_uuid
 from application.storage.db.repositories.workflow_edges import WorkflowEdgesRepository
 from application.storage.db.repositories.workflow_nodes import WorkflowNodesRepository
 from application.storage.db.repositories.workflows import WorkflowsRepository
+from application.storage.db.session import db_readonly, db_session
 from application.core.json_schema_utils import (
    JsonSchemaValidationError,
    normalize_json_schema_payload,
 )
 from application.core.model_utils import get_model_capabilities
 from application.api.user.utils import (
-    check_resource_ownership,
    error_response,
    get_user_id,
    require_auth,
    require_fields,
-    safe_db_operation,
    success_response,
-    validate_object_id,
 )

 workflows_ns = Namespace("workflows", path="/api")
@@ -39,109 +31,15 @@ def _workflow_error_response(message: str, err: Exception):
    return error_response(message)


-# ---------------------------------------------------------------------------
-# Postgres dual-write helpers
-#
-# Workflows are unusual relative to other Phase 3 tables: a single user
-# action (create / update) writes to three collections in concert
-# (workflows + workflow_nodes + workflow_edges) and the edges reference
-# nodes by user-provided string ids. The Postgres mirror needs to:
-#
-# 1. Run all three writes inside one PG transaction (so the just-created
-#    nodes are visible when we resolve their UUIDs for the edge insert).
-# 2. Translate edge source_id/target_id strings → workflow_nodes.id UUIDs
-#    after the bulk_create returns them.
-#
-# Each helper opens exactly one ``dual_write`` call (one PG txn) and uses
-# the connection from whichever repo it was instantiated with to spin up
-# any sibling repos it needs.
-# ---------------------------------------------------------------------------
-
-
-def _dual_write_workflow_create(
-    mongo_workflow_id: str,
-    user_id: str,
-    name: str,
-    description: str,
-    nodes_data: List[Dict],
-    edges_data: List[Dict],
-    graph_version: int = 1,
-) -> None:
-    """Mirror a Mongo workflow create into Postgres."""
-
-    def _do(repo: WorkflowsRepository) -> None:
-        conn = repo._conn
-        wf = repo.create(
-            user_id,
-            name,
-            description=description,
-            legacy_mongo_id=mongo_workflow_id,
-        )
-        _write_graph(conn, wf["id"], graph_version, nodes_data, edges_data)
-
-    dual_write(WorkflowsRepository, _do)
-
-
-def _dual_write_workflow_update(
-    mongo_workflow_id: str,
-    user_id: str,
-    name: str,
-    description: str,
-    nodes_data: List[Dict],
-    edges_data: List[Dict],
-    next_graph_version: int,
-) -> None:
-    """Mirror a Mongo workflow update into Postgres.
-
-    Mirrors the Mongo route: insert the new graph_version's nodes/edges,
-    bump the workflow's name/description/current_graph_version, then drop
-    every other graph_version's nodes/edges.
-    """
-
-    def _do(repo: WorkflowsRepository) -> None:
-        conn = repo._conn
-        wf = _resolve_pg_workflow(conn, mongo_workflow_id)
-        if wf is None:
-            return
-        _write_graph(conn, wf["id"], next_graph_version, nodes_data, edges_data)
-        repo.update(wf["id"], user_id, {
-            "name": name,
-            "description": description,
-            "current_graph_version": next_graph_version,
-        })
-        WorkflowNodesRepository(conn).delete_other_versions(
-            wf["id"], next_graph_version,
-        )
-        WorkflowEdgesRepository(conn).delete_other_versions(
-            wf["id"], next_graph_version,
-        )
-
-    dual_write(WorkflowsRepository, _do)
-
-
-def _dual_write_workflow_delete(mongo_workflow_id: str, user_id: str) -> None:
-    """Mirror a Mongo workflow delete into Postgres.
-
-    The CASCADE on workflows.id → workflow_nodes/workflow_edges takes
-    care of the children automatically.
-    """
-
-    def _do(repo: WorkflowsRepository) -> None:
-        wf = _resolve_pg_workflow(repo._conn, mongo_workflow_id)
-        if wf is not None:
-            repo.delete(wf["id"], user_id)
-
-    dual_write(WorkflowsRepository, _do)
-
-
-def _resolve_pg_workflow(conn, mongo_workflow_id: str) -> Optional[Dict]:
-    """Look up a Postgres workflow by its Mongo ObjectId string."""
-    from sqlalchemy import text as _text
-    row = conn.execute(
-        _text("SELECT id FROM workflows WHERE legacy_mongo_id = :legacy_id"),
-        {"legacy_id": mongo_workflow_id},
-    ).fetchone()
-    return {"id": str(row[0])} if row else None
+def _resolve_workflow(repo: WorkflowsRepository, workflow_id: str, user_id: str):
+    """Resolve a workflow by UUID or legacy Mongo id, scoped to user."""
+    if not workflow_id:
+        return None
+    if looks_like_uuid(workflow_id):
+        row = repo.get(workflow_id, user_id)
+        if row is not None:
+            return row
+    return repo.get_by_legacy_id(workflow_id, user_id)


 def _write_graph(
@@ -150,14 +48,13 @@ def _write_graph(
    graph_version: int,
    nodes_data: List[Dict],
    edges_data: List[Dict],
-) -> None:
-    """Bulk-create nodes + edges for one graph version inside one txn.
+) -> List[Dict]:
+    """Bulk-create nodes + edges for one graph version. Uses ON CONFLICT upsert.

-    Edges arrive with source/target as user-provided node-id strings
-    (the same shape the Mongo route stores). We bulk-insert nodes first,
-    capture their ``node_id → UUID`` map from the returned rows, then
-    translate edge source/target strings to those UUIDs before the edge
-    bulk insert. Edges referencing missing nodes are dropped (logged).
+    Edges arrive with source/target as user-provided node-id strings. We
+    insert nodes first, capture their ``node_id → UUID`` map, then
+    translate edges before insertion. Edges referencing missing nodes are
+    dropped with a warning.
    """
    nodes_repo = WorkflowNodesRepository(conn)
    edges_repo = WorkflowEdgesRepository(conn)
@@ -173,13 +70,13 @@ def _write_graph(
                    "description": n.get("description", ""),
                    "position": n.get("position", {"x": 0, "y": 0}),
                    "config": n.get("data", {}),
-                    "legacy_mongo_id": n.get("legacy_mongo_id"),
                }
                for n in nodes_data
            ],
        )
        node_uuid_by_str = {n["node_id"]: n["id"] for n in created_nodes}
    else:
+        created_nodes = []
        node_uuid_by_str = {}

    if edges_data:
@@ -191,7 +88,7 @@ def _write_graph(
            to_uuid = node_uuid_by_str.get(tgt)
            if not from_uuid or not to_uuid:
                current_app.logger.warning(
-                    "PG dual-write: dropping edge %s; node refs unresolved "
+                    "Workflow graph write: dropping edge %s; node refs unresolved "
                    "(source=%s, target=%s)",
                    e.get("id"), src, tgt,
                )
@@ -204,36 +101,42 @@ def _write_graph(
                "target_handle": e.get("targetHandle"),
            })
        if translated_edges:
-            edges_repo.bulk_create(pg_workflow_id, graph_version, translated_edges)
+            edges_repo.bulk_create(
+                pg_workflow_id, graph_version, translated_edges,
+            )
+
+    return created_nodes


 def serialize_workflow(w: Dict) -> Dict:
-    """Serialize workflow document to API response format."""
+    """Serialize workflow row to API response format."""
+    created_at = w.get("created_at")
+    updated_at = w.get("updated_at")
    return {
-        "id": str(w["_id"]),
+        "id": str(w["id"]),
        "name": w.get("name"),
        "description": w.get("description"),
-        "created_at": w["created_at"].isoformat() if w.get("created_at") else None,
-        "updated_at": w["updated_at"].isoformat() if w.get("updated_at") else None,
+        "created_at": created_at.isoformat() if hasattr(created_at, "isoformat") else created_at,
+        "updated_at": updated_at.isoformat() if hasattr(updated_at, "isoformat") else updated_at,
    }


 def serialize_node(n: Dict) -> Dict:
-    """Serialize workflow node document to API response format."""
+    """Serialize workflow node row to API response format."""
    return {
-        "id": n["id"],
-        "type": n["type"],
+        "id": n["node_id"],
+        "type": n["node_type"],
        "title": n.get("title"),
        "description": n.get("description"),
        "position": n.get("position"),
-        "data": n.get("config", {}),
+        "data": n.get("config", {}) or {},
    }


 def serialize_edge(e: Dict) -> Dict:
-    """Serialize workflow edge document to API response format."""
+    """Serialize workflow edge row to API response format."""
    return {
-        "id": e["id"],
+        "id": e["edge_id"],
        "source": e.get("source_id"),
        "target": e.get("target_id"),
        "sourceHandle": e.get("source_handle"),
@@ -242,7 +145,7 @@ def serialize_edge(e: Dict) -> Dict:


 def get_workflow_graph_version(workflow: Dict) -> int:
-    """Get current graph version with legacy fallback."""
+    """Get current graph version with fallback."""
    raw_version = workflow.get("current_graph_version", 1)
    try:
        version = int(raw_version)
@@ -251,22 +154,6 @@ def get_workflow_graph_version(workflow: Dict) -> int:
        return 1


-def fetch_graph_documents(collection, workflow_id: str, graph_version: int) -> List[Dict]:
-    """Fetch graph docs for active version, with fallback for legacy unversioned data."""
-    docs = list(
-        collection.find({"workflow_id": workflow_id, "graph_version": graph_version})
-    )
-    if docs:
-        return docs
-    if graph_version == 1:
-        return list(
-            collection.find(
-                {"workflow_id": workflow_id, "graph_version": {"$exists": False}}
-            )
-        )
-    return docs
-
-
 def validate_json_schema_payload(
    json_schema: Any,
 ) -> tuple[Optional[Dict[str, Any]], Optional[str]]:
@@ -311,8 +198,14 @@ def normalize_agent_node_json_schemas(nodes: List[Dict]) -> List[Dict]:
    return normalized_nodes


-def validate_workflow_structure(nodes: List[Dict], edges: List[Dict]) -> List[str]:
-    """Validate workflow graph structure."""
+def validate_workflow_structure(
+    nodes: List[Dict], edges: List[Dict], user_id: str | None = None
+) -> List[str]:
+    """Validate workflow graph structure.
+
+    ``user_id`` is required so per-user BYOM custom-model UUIDs resolve
+    when checking each agent node's structured-output capability.
+    """
    errors = []

    if not nodes:
@@ -456,7 +349,7 @@ def validate_workflow_structure(nodes: List[Dict], edges: List[Dict]) -> List[st

        model_id = raw_config.get("model_id")
        if has_json_schema and isinstance(model_id, str) and model_id.strip():
-            capabilities = get_model_capabilities(model_id.strip())
+            capabilities = get_model_capabilities(model_id.strip(), user_id=user_id)
            if capabilities and not capabilities.get("supports_structured_output", False):
                errors.append(
                    f"Agent node '{agent_title}' selected model does not support structured output"
@@ -487,53 +380,6 @@ def _can_reach_end(
    return any(_can_reach_end(t, edges, node_map, end_ids, visited) for t in outgoing if t)


-def create_workflow_nodes(
-    workflow_id: str, nodes_data: List[Dict], graph_version: int
-) -> List[Dict]:
-    """Insert workflow nodes into Mongo and return rows with Mongo ids."""
-    if nodes_data:
-        mongo_nodes = [
-            {
-                "id": n["id"],
-                "workflow_id": workflow_id,
-                "graph_version": graph_version,
-                "type": n["type"],
-                "title": n.get("title", ""),
-                "description": n.get("description", ""),
-                "position": n.get("position", {"x": 0, "y": 0}),
-                "config": n.get("data", {}),
-            }
-            for n in nodes_data
-        ]
-        result = workflow_nodes_collection.insert_many(mongo_nodes)
-        return [
-            {**node, "legacy_mongo_id": str(inserted_id)}
-            for node, inserted_id in zip(nodes_data, result.inserted_ids)
-        ]
-    return []
-
-
-def create_workflow_edges(
-    workflow_id: str, edges_data: List[Dict], graph_version: int
-) -> None:
-    """Insert workflow edges into database."""
-    if edges_data:
-        workflow_edges_collection.insert_many(
-            [
-                {
-                    "id": e["id"],
-                    "workflow_id": workflow_id,
-                    "graph_version": graph_version,
-                    "source_id": e.get("source"),
-                    "target_id": e.get("target"),
-                    "source_handle": e.get("sourceHandle"),
-                    "target_handle": e.get("targetHandle"),
-                }
-                for e in edges_data
-            ]
-        )
-
-
@workflows_ns.route("/workflows")
 class WorkflowList(Resource):

@@ -545,54 +391,29 @@ class WorkflowList(Resource):
        data = request.get_json()

        name = data.get("name", "").strip()
+        description = data.get("description", "")
        nodes_data = data.get("nodes", [])
        edges_data = data.get("edges", [])

-        validation_errors = validate_workflow_structure(nodes_data, edges_data)
+        validation_errors = validate_workflow_structure(
+            nodes_data, edges_data, user_id=user_id
+        )
        if validation_errors:
            return error_response(
                "Workflow validation failed", errors=validation_errors
            )
        nodes_data = normalize_agent_node_json_schemas(nodes_data)

-        now = datetime.now(timezone.utc)
-        workflow_doc = {
-            "name": name,
-            "description": data.get("description", ""),
-            "user": user_id,
-            "created_at": now,
-            "updated_at": now,
-            "current_graph_version": 1,
-        }
-
-        result, error = safe_db_operation(
-            lambda: workflows_collection.insert_one(workflow_doc),
-            "Failed to create workflow",
-        )
-        if error:
-            return error
-
-        workflow_id = str(result.inserted_id)
-
        try:
-            created_nodes = create_workflow_nodes(workflow_id, nodes_data, 1)
-            create_workflow_edges(workflow_id, edges_data, 1)
+            with db_session() as conn:
+                repo = WorkflowsRepository(conn)
+                workflow = repo.create(user_id, name, description=description)
+                pg_workflow_id = str(workflow["id"])
+                _write_graph(conn, pg_workflow_id, 1, nodes_data, edges_data)
        except Exception as err:
-            workflow_nodes_collection.delete_many({"workflow_id": workflow_id})
-            workflow_edges_collection.delete_many({"workflow_id": workflow_id})
-            workflows_collection.delete_one({"_id": result.inserted_id})
-            return _workflow_error_response("Failed to create workflow structure", err)
+            return _workflow_error_response("Failed to create workflow", err)

-        _dual_write_workflow_create(
-            workflow_id,
-            user_id,
-            name,
-            data.get("description", ""),
-            created_nodes,
-            edges_data,
-        )
-
-        return success_response({"id": workflow_id}, 201)
+        return success_response({"id": pg_workflow_id}, 201)


@workflows_ns.route("/workflows/<string:workflow_id>")
@@ -602,23 +423,22 @@ class WorkflowDetail(Resource):
    def get(self, workflow_id: str):
        """Get workflow details with nodes and edges."""
        user_id = get_user_id()
-        obj_id, error = validate_object_id(workflow_id, "Workflow")
-        if error:
-            return error
-
-        workflow, error = check_resource_ownership(
-            workflows_collection, obj_id, user_id, "Workflow"
-        )
-        if error:
-            return error
-
-        graph_version = get_workflow_graph_version(workflow)
-        nodes = fetch_graph_documents(
-            workflow_nodes_collection, workflow_id, graph_version
-        )
-        edges = fetch_graph_documents(
-            workflow_edges_collection, workflow_id, graph_version
-        )
+        try:
+            with db_readonly() as conn:
+                repo = WorkflowsRepository(conn)
+                workflow = _resolve_workflow(repo, workflow_id, user_id)
+                if workflow is None:
+                    return error_response("Workflow not found", 404)
+                pg_workflow_id = str(workflow["id"])
+                graph_version = get_workflow_graph_version(workflow)
+                nodes = WorkflowNodesRepository(conn).find_by_version(
+                    pg_workflow_id, graph_version,
+                )
+                edges = WorkflowEdgesRepository(conn).find_by_version(
+                    pg_workflow_id, graph_version,
+                )
+        except Exception as err:
+            return _workflow_error_response("Failed to fetch workflow", err)

        return success_response(
            {
@@ -633,89 +453,51 @@ class WorkflowDetail(Resource):
    def put(self, workflow_id: str):
        """Update workflow and replace nodes/edges."""
        user_id = get_user_id()
-        obj_id, error = validate_object_id(workflow_id, "Workflow")
-        if error:
-            return error
-
-        workflow, error = check_resource_ownership(
-            workflows_collection, obj_id, user_id, "Workflow"
-        )
-        if error:
-            return error
-
        data = request.get_json()
        name = data.get("name", "").strip()
+        description = data.get("description", "")
        nodes_data = data.get("nodes", [])
        edges_data = data.get("edges", [])

-        validation_errors = validate_workflow_structure(nodes_data, edges_data)
+        validation_errors = validate_workflow_structure(
+            nodes_data, edges_data, user_id=user_id
+        )
        if validation_errors:
            return error_response(
                "Workflow validation failed", errors=validation_errors
            )
        nodes_data = normalize_agent_node_json_schemas(nodes_data)

-        current_graph_version = get_workflow_graph_version(workflow)
-        next_graph_version = current_graph_version + 1
        try:
-            created_nodes = create_workflow_nodes(
-                workflow_id, nodes_data, next_graph_version,
-            )
-            create_workflow_edges(workflow_id, edges_data, next_graph_version)
-        except Exception as err:
-            workflow_nodes_collection.delete_many(
-                {"workflow_id": workflow_id, "graph_version": next_graph_version}
-            )
-            workflow_edges_collection.delete_many(
-                {"workflow_id": workflow_id, "graph_version": next_graph_version}
-            )
-            return _workflow_error_response("Failed to update workflow structure", err)
+            with db_session() as conn:
+                repo = WorkflowsRepository(conn)
+                workflow = _resolve_workflow(repo, workflow_id, user_id)
+                if workflow is None:
+                    return error_response("Workflow not found", 404)
+                pg_workflow_id = str(workflow["id"])
+                current_graph_version = get_workflow_graph_version(workflow)
+                next_graph_version = current_graph_version + 1

-        now = datetime.now(timezone.utc)
-        _, error = safe_db_operation(
-            lambda: workflows_collection.update_one(
-                {"_id": obj_id},
-                {
-                    "$set": {
+                _write_graph(
+                    conn, pg_workflow_id, next_graph_version,
+                    nodes_data, edges_data,
+                )
+                repo.update(
+                    pg_workflow_id, user_id,
+                    {
                        "name": name,
-                        "description": data.get("description", ""),
-                        "updated_at": now,
+                        "description": description,
                        "current_graph_version": next_graph_version,
-                    }
-                },
-            ),
-            "Failed to update workflow",
-        )
-        if error:
-            workflow_nodes_collection.delete_many(
-                {"workflow_id": workflow_id, "graph_version": next_graph_version}
-            )
-            workflow_edges_collection.delete_many(
-                {"workflow_id": workflow_id, "graph_version": next_graph_version}
-            )
-            return error
-
-        try:
-            workflow_nodes_collection.delete_many(
-                {"workflow_id": workflow_id, "graph_version": {"$ne": next_graph_version}}
-            )
-            workflow_edges_collection.delete_many(
-                {"workflow_id": workflow_id, "graph_version": {"$ne": next_graph_version}}
-            )
-        except Exception as cleanup_err:
-            current_app.logger.warning(
-                f"Failed to clean old workflow graph versions for {workflow_id}: {cleanup_err}"
-            )
-
-        _dual_write_workflow_update(
-            workflow_id,
-            user_id,
-            name,
-            data.get("description", ""),
-            created_nodes,
-            edges_data,
-            next_graph_version,
-        )
+                    },
+                )
+                WorkflowNodesRepository(conn).delete_other_versions(
+                    pg_workflow_id, next_graph_version,
+                )
+                WorkflowEdgesRepository(conn).delete_other_versions(
+                    pg_workflow_id, next_graph_version,
+                )
+        except Exception as err:
+            return _workflow_error_response("Failed to update workflow", err)

        return success_response()

@@ -723,23 +505,15 @@ class WorkflowDetail(Resource):
    def delete(self, workflow_id: str):
        """Delete workflow and its graph."""
        user_id = get_user_id()
-        obj_id, error = validate_object_id(workflow_id, "Workflow")
-        if error:
-            return error
-
-        workflow, error = check_resource_ownership(
-            workflows_collection, obj_id, user_id, "Workflow"
-        )
-        if error:
-            return error
-
        try:
-            workflow_nodes_collection.delete_many({"workflow_id": workflow_id})
-            workflow_edges_collection.delete_many({"workflow_id": workflow_id})
-            workflows_collection.delete_one({"_id": workflow["_id"], "user": user_id})
+            with db_session() as conn:
+                repo = WorkflowsRepository(conn)
+                workflow = _resolve_workflow(repo, workflow_id, user_id)
+                if workflow is None:
+                    return error_response("Workflow not found", 404)
+                # ON DELETE CASCADE on workflow_nodes/edges cleans children.
+                repo.delete(str(workflow["id"]), user_id)
        except Exception as err:
            return _workflow_error_response("Failed to delete workflow", err)

-        _dual_write_workflow_delete(workflow_id, user_id)
-
        return success_response()
--- a/application/api/v1/routes.py
+++ b/application/api/v1/routes.py
@@ -9,6 +9,7 @@ import json
 import logging
 import time
 import traceback
+from datetime import datetime
 from typing import Any, Dict, Generator, Optional

 from flask import Blueprint, jsonify, make_response, request, Response
@@ -20,8 +21,8 @@ from application.api.v1.translator import (
    translate_response,
    translate_stream_event,
 )
-from application.core.mongo_db import MongoDB
-from application.core.settings import settings
+from application.storage.db.repositories.agents import AgentsRepository
+from application.storage.db.session import db_readonly

 logger = logging.getLogger(__name__)

@@ -39,9 +40,8 @@ def _extract_bearer_token() -> Optional[str]:
 def _lookup_agent(api_key: str) -> Optional[Dict]:
    """Look up the agent document for this API key."""
    try:
-        mongo = MongoDB.get_client()
-        db = mongo[settings.MONGO_DB_NAME]
-        return db["agents"].find_one({"key": api_key})
+        with db_readonly() as conn:
+            return AgentsRepository(conn).find_by_key(api_key)
    except Exception:
        logger.warning("Failed to look up agent for API key", exc_info=True)
        return None
@@ -90,8 +90,14 @@ def chat_completions():
        )

    # Link decoded_token to the agent's owner so continuation state,
-    # logs, and tool execution use the correct user identity.
-    agent_user = agent_doc.get("user") if agent_doc else None
+    # logs, and tool execution use the correct user identity. The PG
+    # ``agents`` row exposes the owner via ``user_id`` (``user`` is the
+    # legacy Mongo field name kept in ``row_to_dict`` only for the
+    # mapping ``id``/``_id``).
+    agent_user = (
+        (agent_doc.get("user_id") or agent_doc.get("user"))
+        if agent_doc else None
+    )
    decoded_token = {"sub": agent_user or "api_key_user"}

    try:
@@ -208,6 +214,7 @@ def _stream_response(
        decoded_token=processor.decoded_token,
        agent_id=processor.agent_id,
        model_id=processor.model_id,
+        model_user_id=processor.model_user_id,
        should_save_conversation=should_save_conversation,
        _continuation=continuation,
    )
@@ -252,6 +259,7 @@ def _non_stream_response(
        decoded_token=processor.decoded_token,
        agent_id=processor.agent_id,
        model_id=processor.model_id,
+        model_user_id=processor.model_user_id,
        should_save_conversation=should_save_conversation,
        _continuation=continuation,
    )
@@ -290,39 +298,41 @@ def list_models():
        )

    try:
-        mongo = MongoDB.get_client()
-        db = mongo[settings.MONGO_DB_NAME]
-        agents_collection = db["agents"]
+        with db_readonly() as conn:
+            agents_repo = AgentsRepository(conn)
+            agent = agents_repo.find_by_key(api_key)
+            if not agent:
+                return make_response(
+                    jsonify({"error": {"message": "Invalid API key", "type": "auth_error"}}),
+                    401,
+                )

-        # Find the agent for this api_key
-        agent = agents_collection.find_one({"key": api_key})
-        if not agent:
-            return make_response(
-                jsonify({"error": {"message": "Invalid API key", "type": "auth_error"}}),
-                401,
-            )
-
-        user = agent.get("user")
-
-        # Return all agents belonging to this user
-        user_agents = list(agents_collection.find({"user": user}))
-
-        models = []
-        for ag in user_agents:
-            created = ag.get("createdAt")
-            created_ts = int(created.timestamp()) if created else int(time.time())
-            model_id = str(ag.get("_id") or ag.get("id") or "")
-            models.append({
-                "id": model_id,
-                "object": "model",
-                "created": created_ts,
-                "owned_by": "docsgpt",
-                "name": ag.get("name", ""),
-                "description": ag.get("description", ""),
-            })
+        # Repository rows now go through ``coerce_pg_native`` at SELECT
+        # time, so timestamps arrive as ISO 8601 strings. Parse before
+        # taking ``.timestamp()``; fall back to ``time.time()`` only when
+        # the value is genuinely missing or unparseable.
+        created = agent.get("created_at") or agent.get("createdAt")
+        if isinstance(created, str):
+            try:
+                created = datetime.fromisoformat(created)
+            except (ValueError, TypeError):
+                created = None
+        created_ts = (
+            int(created.timestamp()) if hasattr(created, "timestamp")
+            else int(time.time())
+        )
+        model_id = str(agent.get("id") or agent.get("_id") or "")
+        model = {
+            "id": model_id,
+            "object": "model",
+            "created": created_ts,
+            "owned_by": "docsgpt",
+            "name": agent.get("name", ""),
+            "description": agent.get("description", ""),
+        }

        return make_response(
-            jsonify({"object": "list", "data": models}),
+            jsonify({"object": "list", "data": [model]}),
            200,
        )
    except Exception as e:
--- a/application/app.py
+++ b/application/app.py
@@ -1,13 +1,15 @@
+import logging
 import os
 import platform
 import uuid

 import dotenv
-from flask import Flask, jsonify, redirect, request
+from flask import Flask, Response, jsonify, redirect, request
 from jose import jwt

 from application.auth import handle_auth

+from application.core import log_context
 from application.core.logging_config import setup_logging

 setup_logging()
@@ -20,6 +22,7 @@ from application.api.connector.routes import connector  # noqa: E402
 from application.api.v1 import v1_bp  # noqa: E402
 from application.celery_init import celery  # noqa: E402
 from application.core.settings import settings  # noqa: E402
+from application.storage.db.bootstrap import ensure_database_ready  # noqa: E402
 from application.stt.upload_limits import (  # noqa: E402
    build_stt_file_size_limit_message,
    should_reject_stt_request,
@@ -32,6 +35,17 @@ if platform.system() == "Windows":
    pathlib.PosixPath = pathlib.WindowsPath
 dotenv.load_dotenv()

+# Self-bootstrap the user-data Postgres DB. Runs before any blueprint or
+# repository touches the engine, so the first request can't race the
+# schema being created. Gated by AUTO_CREATE_DB / AUTO_MIGRATE settings
+# (default ON for dev; disable in prod if schema is managed out-of-band).
+ensure_database_ready(
+    settings.POSTGRES_URI,
+    create_db=settings.AUTO_CREATE_DB,
+    migrate=settings.AUTO_MIGRATE,
+    logger=logging.getLogger("application.app"),
+)
+
 app = Flask(__name__)
 app.register_blueprint(user)
 app.register_blueprint(answer)
@@ -99,6 +113,38 @@ def generate_token():
    return jsonify({"error": "Token generation not allowed in current auth mode"}), 400


+_LOG_CTX_TOKEN_ATTR = "_log_ctx_token"
+
+
+@app.before_request
+def _bind_log_context():
+    """Bind activity_id + endpoint for the duration of this request.
+
+    Runs before ``authenticate_request``; ``user_id`` is overlaid in a
+    follow-up handler once the JWT has been decoded.
+    """
+    if request.method == "OPTIONS":
+        return None
+    activity_id = str(uuid.uuid4())
+    request.activity_id = activity_id
+    token = log_context.bind(
+        activity_id=activity_id,
+        endpoint=request.endpoint,
+    )
+    setattr(request, _LOG_CTX_TOKEN_ATTR, token)
+    return None
+
+
+@app.teardown_request
+def _reset_log_context(_exc):
+    # SSE streams keep yielding after teardown fires, but a2wsgi runs each
+    # request inside ``copy_context().run(...)``, so this reset doesn't
+    # leak into the stream's view of the context.
+    token = getattr(request, _LOG_CTX_TOKEN_ATTR, None)
+    if token is not None:
+        log_context.reset(token)
+
+
@app.before_request
 def enforce_stt_request_size_limits():
    if request.method == "OPTIONS":
@@ -120,6 +166,12 @@ def enforce_stt_request_size_limits():
 def authenticate_request():
    if request.method == "OPTIONS":
        return "", 200
+    # OpenAI-compatible routes authenticate via opaque agent API keys in the
+    # Authorization header, which the JWT decoder below would reject. Defer
+    # auth to the route handlers (see application/api/v1/routes.py).
+    if request.path.startswith("/v1/"):
+        request.decoded_token = None
+        return None
    decoded_token = handle_auth(request)
    if not decoded_token:
        request.decoded_token = None
@@ -129,13 +181,29 @@ def authenticate_request():
        request.decoded_token = decoded_token


+@app.before_request
+def _bind_user_id_to_log_context():
+    # Registered after ``authenticate_request`` (Flask runs before_request
+    # handlers in registration order), so ``request.decoded_token`` is
+    # populated by the time we read it. ``teardown_request`` unwinds the
+    # whole request-level bind, so no separate reset token is needed here.
+    if request.method == "OPTIONS":
+        return None
+    decoded_token = getattr(request, "decoded_token", None)
+    user_id = decoded_token.get("sub") if isinstance(decoded_token, dict) else None
+    if user_id:
+        log_context.bind(user_id=user_id)
+    return None
+
+
@app.after_request
-def after_request(response):
-    response.headers.add("Access-Control-Allow-Origin", "*")
-    response.headers.add("Access-Control-Allow-Headers", "Content-Type, Authorization")
-    response.headers.add(
-        "Access-Control-Allow-Methods", "GET, POST, PUT, DELETE, OPTIONS"
+def after_request(response: Response) -> Response:
+    """Add CORS headers for the pure Flask development entrypoint."""
+    response.headers["Access-Control-Allow-Origin"] = "*"
+    response.headers["Access-Control-Allow-Headers"] = (
+        "Content-Type, Authorization, Idempotency-Key"
    )
+    response.headers["Access-Control-Allow-Methods"] = "GET, POST, PUT, PATCH, DELETE, OPTIONS"
    return response


--- a/application/asgi.py
+++ b/application/asgi.py
@@ -0,0 +1,38 @@
+"""ASGI entrypoint: Flask (WSGI) + FastMCP on the same process."""
+
+from __future__ import annotations
+
+from a2wsgi import WSGIMiddleware
+from starlette.applications import Starlette
+from starlette.middleware import Middleware
+from starlette.middleware.cors import CORSMiddleware
+from starlette.routing import Mount
+
+from application.app import app as flask_app
+from application.mcp_server import mcp
+
+_WSGI_THREADPOOL = 32
+
+mcp_app = mcp.http_app(path="/")
+
+asgi_app = Starlette(
+    routes=[
+        Mount("/mcp", app=mcp_app),
+        Mount("/", app=WSGIMiddleware(flask_app, workers=_WSGI_THREADPOOL)),
+    ],
+    middleware=[
+        Middleware(
+            CORSMiddleware,
+            allow_origins=["*"],
+            allow_methods=["GET", "POST", "PUT", "PATCH", "DELETE", "OPTIONS"],
+            allow_headers=[
+                "Content-Type",
+                "Authorization",
+                "Mcp-Session-Id",
+                "Idempotency-Key",
+            ],
+            expose_headers=["Mcp-Session-Id"],
+        ),
+    ],
+    lifespan=mcp_app.lifespan,
+)
--- a/application/cache.py
+++ b/application/cache.py
@@ -1,3 +1,4 @@
+import hashlib
 import json
 import logging
 import time
@@ -10,6 +11,14 @@ from application.utils import get_hash

 logger = logging.getLogger(__name__)

+
+def _cache_default(value):
+    # Image attachments arrive inline as bytes (see GoogleLLM.prepare_messages_with_attachments);
+    # hash so the cache key stays bounded in size and stable across identical content.
+    if isinstance(value, (bytes, bytearray, memoryview)):
+        return f"<bytes:sha256:{hashlib.sha256(bytes(value)).hexdigest()}>"
+    return repr(value)
+
 _redis_instance = None
 _redis_creation_failed = False
 _instance_lock = Lock()
@@ -36,7 +45,7 @@ def get_redis_instance():
 def gen_cache_key(messages, model="docgpt", tools=None):
    if not all(isinstance(msg, dict) for msg in messages):
        raise ValueError("All messages must be dictionaries.")
-    messages_str = json.dumps(messages)
+    messages_str = json.dumps(messages, default=_cache_default)
    tools_str = json.dumps(str(tools)) if tools else ""
    combined = f"{model}_{messages_str}_{tools_str}"
    cache_key = get_hash(combined)
--- a/application/celery_init.py
+++ b/application/celery_init.py
@@ -1,6 +1,17 @@
+import inspect
+import logging
+import threading
+
 from celery import Celery
+from application.core import log_context
 from application.core.settings import settings
-from celery.signals import setup_logging, worker_process_init
+from celery.signals import (
+    setup_logging,
+    task_postrun,
+    task_prerun,
+    worker_process_init,
+    worker_ready,
+)


 def make_celery(app_name=__name__):
@@ -39,5 +50,73 @@ def _dispose_db_engine_on_fork(*args, **kwargs):
    dispose_engine()


+# Most tasks in this repo accept ``user`` where the log context wants
+# ``user_id``; map task parameter names to context keys explicitly.
+_TASK_PARAM_TO_CTX_KEY: dict[str, str] = {
+    "user": "user_id",
+    "user_id": "user_id",
+    "agent_id": "agent_id",
+    "conversation_id": "conversation_id",
+}
+
+_task_log_tokens: dict[str, object] = {}
+
+
+@task_prerun.connect
+def _bind_task_log_context(task_id, task, args, kwargs, **_):
+    # Resolve task args by parameter name — nearly every task in this repo
+    # is called positionally, so ``kwargs.get('user')`` would bind nothing.
+    ctx = {"activity_id": task_id}
+    try:
+        sig = inspect.signature(task.run)
+        bound = sig.bind_partial(*args, **kwargs).arguments
+    except (TypeError, ValueError):
+        bound = dict(kwargs)
+    for param_name, value in bound.items():
+        ctx_key = _TASK_PARAM_TO_CTX_KEY.get(param_name)
+        if ctx_key and value:
+            ctx[ctx_key] = value
+    _task_log_tokens[task_id] = log_context.bind(**ctx)
+
+
+@task_postrun.connect
+def _unbind_task_log_context(task_id, **_):
+    # ``task_postrun`` fires on both success and failure. Required for
+    # Celery: unlike the Flask path, tasks aren't isolated in their own
+    # ``copy_context().run(...)``, so a missing reset would leak the
+    # bind onto the next task on the same worker.
+    token = _task_log_tokens.pop(task_id, None)
+    if token is None:
+        return
+    try:
+        log_context.reset(token)
+    except ValueError:
+        # task_prerun and task_postrun ran on different threads (non-default
+        # Celery pool); the token isn't valid in this context. Drop it.
+        logging.getLogger(__name__).debug(
+            "log_context reset skipped for task %s", task_id
+        )
+
+
+@worker_ready.connect
+def _run_version_check(*args, **kwargs):
+    """Kick off the anonymous version check on worker startup.
+
+    Runs in a daemon thread so a slow endpoint or bad DNS never holds
+    up the worker becoming ready for tasks. The check itself is
+    fail-silent (see ``application.updates.version_check.run_check``);
+    this handler's only job is to launch it and get out of the way.
+
+    Import is lazy so the symbol resolution never fires at module
+    import time — consistent with the ``_dispose_db_engine_on_fork``
+    pattern above.
+    """
+    try:
+        from application.updates.version_check import run_check
+    except Exception:
+        return
+    threading.Thread(target=run_check, name="version-check", daemon=True).start()
+
+
 celery = make_celery()
 celery.config_from_object("application.celeryconfig")
--- a/application/celeryconfig.py
+++ b/application/celeryconfig.py
@@ -1,7 +1,10 @@
-import os
+from application.core.settings import settings

-broker_url = os.getenv("CELERY_BROKER_URL")
-result_backend = os.getenv("CELERY_RESULT_BACKEND")
+# Pydantic loads .env into ``settings`` but does not inject values into
+# ``os.environ`` — read directly from settings so beat startup (which
+# imports this module before any explicit env load) sees a real URL.
+broker_url = settings.CELERY_BROKER_URL
+result_backend = settings.CELERY_RESULT_BACKEND

 task_serializer = 'json'
 result_serializer = 'json'
@@ -9,3 +12,22 @@ accept_content = ['json']

 # Autodiscover tasks
 imports = ('application.api.user.tasks',)
+
+# Project-scoped queue so a stray sibling worker on the same broker
+# (other repo, same default ``celery`` queue) can't grab DocsGPT tasks.
+task_default_queue = "docsgpt"
+task_default_exchange = "docsgpt"
+task_default_routing_key = "docsgpt"
+
+beat_scheduler = "redbeat.RedBeatScheduler"
+redbeat_redis_url = broker_url
+redbeat_key_prefix = "redbeat:docsgpt:"
+redbeat_lock_timeout = 90
+
+# Survive worker SIGKILL/OOM without silently dropping in-flight tasks.
+task_acks_late = True
+task_reject_on_worker_lost = True
+worker_prefetch_multiplier = settings.CELERY_WORKER_PREFETCH_MULTIPLIER
+broker_transport_options = {"visibility_timeout": settings.CELERY_VISIBILITY_TIMEOUT}
+result_expires = 86400 * 7
+task_track_started = True
--- a/application/core/log_context.py
+++ b/application/core/log_context.py
@@ -0,0 +1,57 @@
+"""Per-activity logging context backed by ``contextvars``.
+
+The ``_ContextFilter`` installed by ``logging_config.setup_logging`` stamps
+every ``LogRecord`` emitted inside a ``bind`` block with the bound keys, so
+they land as first-class attributes on the OTLP log export rather than being
+buried inside formatted message bodies.
+
+A single ``ContextVar`` holds a dict so nested binds reset atomically (LIFO)
+via the token returned by ``bind``.
+"""
+
+from __future__ import annotations
+
+from contextvars import ContextVar, Token
+from typing import Mapping
+
+
+_CTX_KEYS: frozenset[str] = frozenset(
+    {
+        "activity_id",
+        "parent_activity_id",
+        "user_id",
+        "agent_id",
+        "conversation_id",
+        "endpoint",
+        "model",
+    }
+)
+
+_ctx: ContextVar[Mapping[str, str]] = ContextVar("log_ctx", default={})
+
+
+def bind(**kwargs: object) -> Token:
+    """Overlay the given keys onto the current context.
+
+    Returns a ``Token`` so the caller can ``reset`` in a ``finally`` block.
+    Keys outside :data:`_CTX_KEYS` are silently dropped (so a typo can't
+    stamp a stray field name onto every record), as are ``None`` values
+    (a missing attribute is more useful than the literal string ``"None"``).
+    """
+    overlay = {
+        k: str(v)
+        for k, v in kwargs.items()
+        if k in _CTX_KEYS and v is not None
+    }
+    new = {**_ctx.get(), **overlay}
+    return _ctx.set(new)
+
+
+def reset(token: Token) -> None:
+    """Restore the context to the snapshot captured by the matching ``bind``."""
+    _ctx.reset(token)
+
+
+def snapshot() -> Mapping[str, str]:
+    """Return the current context dict. Treat as read-only; use :func:`bind`."""
+    return _ctx.get()
--- a/application/core/logging_config.py
+++ b/application/core/logging_config.py
@@ -1,11 +1,75 @@
+import logging
+import os
 from logging.config import dictConfig

-def setup_logging():
+from application.core.log_context import snapshot as _ctx_snapshot
+
+
+# Loggers with ``propagate=False`` don't share root's handlers, so the
+# context filter has to be installed on their handlers directly.
+_NON_PROPAGATING_LOGGERS: tuple[str, ...] = (
+    "uvicorn",
+    "uvicorn.access",
+    "uvicorn.error",
+    "celery.app.trace",
+    "celery.worker.strategy",
+    "gunicorn.error",
+    "gunicorn.access",
+)
+
+
+class _ContextFilter(logging.Filter):
+    """Stamp the current ``log_context`` snapshot onto every ``LogRecord``.
+
+    Must be installed on **handlers**, not loggers: Python skips logger-level
+    filters when a child logger's record propagates up. The ``hasattr`` guard
+    keeps an explicit ``logger.info(..., extra={...})`` from being overwritten.
+    """
+
+    def filter(self, record: logging.LogRecord) -> bool:
+        for key, value in _ctx_snapshot().items():
+            if not hasattr(record, key):
+                setattr(record, key, value)
+        return True
+
+
+def _otlp_logs_enabled() -> bool:
+    """Return True when the user has opted in to OTLP log export.
+
+    Gated by the standard OTEL env vars so no project-specific knob is needed:
+    set ``OTEL_LOGS_EXPORTER=otlp`` (and leave ``OTEL_SDK_DISABLED`` unset or
+    false) to flip it on. When false, ``setup_logging`` keeps its original
+    console-only behavior.
+    """
+    exporter = os.getenv("OTEL_LOGS_EXPORTER", "").strip().lower()
+    disabled = os.getenv("OTEL_SDK_DISABLED", "false").strip().lower() == "true"
+    return exporter == "otlp" and not disabled
+
+
+def setup_logging() -> None:
+    """Configure the root logger with a stdout console handler.
+
+    When OTLP log export is enabled, ``opentelemetry-instrument`` attaches a
+    ``LoggingHandler`` to the root logger before this function runs. The
+    ``dictConfig`` call below replaces ``root.handlers`` with the console
+    handler, which would silently drop the OTEL handler. To make OTLP log
+    export work without forcing every contributor to opt in, snapshot the
+    OTEL handlers up front and re-attach them after ``dictConfig``.
+    """
+    preserved_handlers: list[logging.Handler] = []
+    if _otlp_logs_enabled():
+        preserved_handlers = [
+            h
+            for h in logging.getLogger().handlers
+            if h.__class__.__module__.startswith("opentelemetry")
+        ]
+
    dictConfig({
-        'version': 1,
-        'formatters': {
-            'default': {
-                'format': '[%(asctime)s] %(levelname)s in %(module)s: %(message)s',
+        "version": 1,
+        "disable_existing_loggers": False,
+        "formatters": {
+            "default": {
+                "format": "[%(asctime)s] %(levelname)s in %(module)s: %(message)s",
            }
        },
        "handlers": {
@@ -15,8 +79,34 @@ def setup_logging():
                "formatter": "default",
            }
        },
-        'root': {
-            'level': 'INFO',
-            'handlers': ['console'],
+        "root": {
+            "level": "INFO",
+            "handlers": ["console"],
        },
-    })
+    })
+
+    if preserved_handlers:
+        root = logging.getLogger()
+        for handler in preserved_handlers:
+            if handler not in root.handlers:
+                root.addHandler(handler)
+
+    _install_context_filter()
+
+
+def _install_context_filter() -> None:
+    """Attach :class:`_ContextFilter` to root's handlers + every handler on
+    the known non-propagating loggers. Skipping handlers that already carry
+    one keeps repeat ``setup_logging`` calls from stacking filters.
+    """
+
+    def _has_ctx_filter(handler: logging.Handler) -> bool:
+        return any(isinstance(f, _ContextFilter) for f in handler.filters)
+
+    for handler in logging.getLogger().handlers:
+        if not _has_ctx_filter(handler):
+            handler.addFilter(_ContextFilter())
+    for name in _NON_PROPAGATING_LOGGERS:
+        for handler in logging.getLogger(name).handlers:
+            if not _has_ctx_filter(handler):
+                handler.addFilter(_ContextFilter())
--- a/application/core/model_configs.py
+++ b/application/core/model_configs.py
@@ -1,266 +0,0 @@
-"""
-Model configurations for all supported LLM providers.
-"""
-
-from application.core.model_settings import (
-    AvailableModel,
-    ModelCapabilities,
-    ModelProvider,
-)
-
-# Base image attachment types supported by most vision-capable LLMs
-IMAGE_ATTACHMENTS = [
-    "image/png",
-    "image/jpeg",
-    "image/jpg",
-    "image/webp",
-    "image/gif",
-]
-
-# PDF excluded: most OpenAI-compatible endpoints don't support native PDF uploads.
-# When excluded, PDFs are synthetically processed by converting pages to images.
-OPENAI_ATTACHMENTS = IMAGE_ATTACHMENTS
-
-GOOGLE_ATTACHMENTS = ["application/pdf"] + IMAGE_ATTACHMENTS
-
-ANTHROPIC_ATTACHMENTS = IMAGE_ATTACHMENTS
-
-OPENROUTER_ATTACHMENTS = IMAGE_ATTACHMENTS
-
-NOVITA_ATTACHMENTS = IMAGE_ATTACHMENTS
-
-
-OPENAI_MODELS = [
-    AvailableModel(
-        id="gpt-5.1",
-        provider=ModelProvider.OPENAI,
-        display_name="GPT-5.1",
-        description="Flagship model with enhanced reasoning, coding, and agentic capabilities",
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            supports_structured_output=True,
-            supported_attachment_types=OPENAI_ATTACHMENTS,
-            context_window=200000,
-        ),
-    ),
-    AvailableModel(
-        id="gpt-5-mini",
-        provider=ModelProvider.OPENAI,
-        display_name="GPT-5 Mini",
-        description="Faster, cost-effective variant of GPT-5.1",
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            supports_structured_output=True,
-            supported_attachment_types=OPENAI_ATTACHMENTS,
-            context_window=200000,
-        ),
-    )
-]
-
-
-ANTHROPIC_MODELS = [
-    AvailableModel(
-        id="claude-3-5-sonnet-20241022",
-        provider=ModelProvider.ANTHROPIC,
-        display_name="Claude 3.5 Sonnet (Latest)",
-        description="Latest Claude 3.5 Sonnet with enhanced capabilities",
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            supported_attachment_types=ANTHROPIC_ATTACHMENTS,
-            context_window=200000,
-        ),
-    ),
-    AvailableModel(
-        id="claude-3-5-sonnet",
-        provider=ModelProvider.ANTHROPIC,
-        display_name="Claude 3.5 Sonnet",
-        description="Balanced performance and capability",
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            supported_attachment_types=ANTHROPIC_ATTACHMENTS,
-            context_window=200000,
-        ),
-    ),
-    AvailableModel(
-        id="claude-3-opus",
-        provider=ModelProvider.ANTHROPIC,
-        display_name="Claude 3 Opus",
-        description="Most capable Claude model",
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            supported_attachment_types=ANTHROPIC_ATTACHMENTS,
-            context_window=200000,
-        ),
-    ),
-    AvailableModel(
-        id="claude-3-haiku",
-        provider=ModelProvider.ANTHROPIC,
-        display_name="Claude 3 Haiku",
-        description="Fastest Claude model",
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            supported_attachment_types=ANTHROPIC_ATTACHMENTS,
-            context_window=200000,
-        ),
-    ),
-]
-
-
-GOOGLE_MODELS = [
-    AvailableModel(
-        id="gemini-flash-latest",
-        provider=ModelProvider.GOOGLE,
-        display_name="Gemini Flash (Latest)",
-        description="Latest experimental Gemini model",
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            supports_structured_output=True,
-            supported_attachment_types=GOOGLE_ATTACHMENTS,
-            context_window=int(1e6),
-        ),
-    ),
-    AvailableModel(
-        id="gemini-flash-lite-latest",
-        provider=ModelProvider.GOOGLE,
-        display_name="Gemini Flash Lite (Latest)",
-        description="Fast with huge context window",
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            supports_structured_output=True,
-            supported_attachment_types=GOOGLE_ATTACHMENTS,
-            context_window=int(1e6),
-        ),
-    ),
-    AvailableModel(
-        id="gemini-3-pro-preview",
-        provider=ModelProvider.GOOGLE,
-        display_name="Gemini 3 Pro",
-        description="Most capable Gemini model",
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            supports_structured_output=True,
-            supported_attachment_types=GOOGLE_ATTACHMENTS,
-            context_window=2000000,
-        ),
-    ),
-]
-
-
-GROQ_MODELS = [
-    AvailableModel(
-        id="llama-3.3-70b-versatile",
-        provider=ModelProvider.GROQ,
-        display_name="Llama 3.3 70B",
-        description="Latest Llama model with high-speed inference",
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            context_window=128000,
-        ),
-    ),
-    AvailableModel(
-        id="openai/gpt-oss-120b",
-        provider=ModelProvider.GROQ,
-        display_name="GPT-OSS 120B",
-        description="Open-source GPT model optimized for speed",
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            context_window=128000,
-        ),
-    ),
-]
-
-
-OPENROUTER_MODELS = [
-    AvailableModel(
-        id="qwen/qwen3-coder:free",
-        provider=ModelProvider.OPENROUTER,
-        display_name="Qwen 3 Coder",
-        description="Latest Qwen model with high-speed inference",
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            context_window=128000,
-            supported_attachment_types=OPENROUTER_ATTACHMENTS
-        ),
-    ),
-    AvailableModel(
-        id="google/gemma-3-27b-it:free",
-        provider=ModelProvider.OPENROUTER,
-        display_name="Gemma 3 27B",
-        description="Latest Gemma model with high-speed inference",
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            context_window=128000,
-            supported_attachment_types=OPENROUTER_ATTACHMENTS
-        ),
-    ),
-]
-
-NOVITA_MODELS = [
-    AvailableModel(
-        id="moonshotai/kimi-k2.5",
-        provider=ModelProvider.NOVITA,
-        display_name="Kimi K2.5",
-        description="MoE model with function calling, structured output, reasoning, and vision",
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            supports_structured_output=True,
-            supported_attachment_types=NOVITA_ATTACHMENTS,
-            context_window=262144,
-        ),
-    ),
-    AvailableModel(
-        id="zai-org/glm-5",
-        provider=ModelProvider.NOVITA,
-        display_name="GLM-5",
-        description="MoE model with function calling, structured output, and reasoning",
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            supports_structured_output=True,
-            supported_attachment_types=[],
-            context_window=202800,
-        ),
-    ),
-    AvailableModel(
-        id="minimax/minimax-m2.5",
-        provider=ModelProvider.NOVITA,
-        display_name="MiniMax M2.5",
-        description="MoE model with function calling, structured output, and reasoning",
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            supports_structured_output=True,
-            supported_attachment_types=[],
-            context_window=204800,
-        ),
-    ),
-]
-
-
-AZURE_OPENAI_MODELS = [
-    AvailableModel(
-        id="azure-gpt-4",
-        provider=ModelProvider.AZURE_OPENAI,
-        display_name="Azure OpenAI GPT-4",
-        description="Azure-hosted GPT model",
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            supports_structured_output=True,
-            supported_attachment_types=OPENAI_ATTACHMENTS,
-            context_window=8192,
-        ),
-    ),
-]
-
-
-def create_custom_openai_model(model_name: str, base_url: str) -> AvailableModel:
-    """Create a custom OpenAI-compatible model (e.g., LM Studio, Ollama)."""
-    return AvailableModel(
-        id=model_name,
-        provider=ModelProvider.OPENAI,
-        display_name=model_name,
-        description=f"Custom OpenAI-compatible model at {base_url}",
-        base_url=base_url,
-        capabilities=ModelCapabilities(
-            supports_tools=True,
-            supported_attachment_types=OPENAI_ATTACHMENTS,
-        ),
-    )
--- a/application/core/model_registry.py
+++ b/application/core/model_registry.py
@@ -0,0 +1,385 @@
+"""Layered model registry.
+
+Loads model catalogs from YAML files (built-in + operator-supplied),
+groups them by provider name, then for each registered provider plugin
+calls ``get_models`` to produce the final per-provider model list.
+
+End-user BYOM (per-user model records in Postgres) is layered on top:
+when a lookup arrives with a ``user_id``, the registry consults a
+per-user cache first (loaded from the ``user_custom_models`` table on
+miss) and falls through to the built-in catalog.
+
+Cross-process invalidation: ``ModelRegistry`` is a per-process
+singleton, so a CRUD write only evicts the cache in the process that
+served it. Other gunicorn workers and Celery workers would otherwise
+keep using a deleted/disabled/key-rotated BYOM record indefinitely.
+``invalidate_user`` therefore both drops the local layer *and* bumps a
+Redis-side version counter; other processes notice the bump on their
+next access (after the local TTL window) and reload from Postgres. If
+Redis is unreachable the per-process TTL still bounds staleness — pure
+TTL semantics, no regression.
+"""
+
+from __future__ import annotations
+
+import logging
+import time
+from collections import defaultdict
+from typing import Dict, List, Optional, Tuple
+
+from application.core.model_settings import AvailableModel
+from application.core.model_yaml import (
+    BUILTIN_MODELS_DIR,
+    ProviderCatalog,
+    load_model_yamls,
+)
+
+logger = logging.getLogger(__name__)
+
+_USER_CACHE_TTL_SECONDS = 60.0
+_USER_VERSION_KEY_PREFIX = "byom:registry_version:"
+
+
+class ModelRegistry:
+    """Singleton registry of available models."""
+
+    _instance: Optional["ModelRegistry"] = None
+    _initialized: bool = False
+
+    def __new__(cls):
+        if cls._instance is None:
+            cls._instance = super().__new__(cls)
+        return cls._instance
+
+    def __init__(self):
+        if not ModelRegistry._initialized:
+            self.models: Dict[str, AvailableModel] = {}
+            self.default_model_id: Optional[str] = None
+            # Per-user BYOM cache. Each entry is
+            # ``(layer, version_at_load, loaded_at_monotonic)``:
+            #   * ``layer`` — {model_id: AvailableModel}
+            #   * ``version_at_load`` — Redis-side counter snapshot at
+            #     reload time, or ``None`` if Redis was unreachable
+            #   * ``loaded_at_monotonic`` — for TTL bookkeeping
+            # Populated lazily, evicted by TTL + cross-process
+            # invalidation (see ``invalidate_user``).
+            self._user_models: Dict[
+                str,
+                Tuple[Dict[str, AvailableModel], Optional[int], float],
+            ] = {}
+            self._load_models()
+            ModelRegistry._initialized = True
+
+    @classmethod
+    def get_instance(cls) -> "ModelRegistry":
+        return cls()
+
+    @classmethod
+    def reset(cls) -> None:
+        """Clear the singleton. Intended for test fixtures."""
+        cls._instance = None
+        cls._initialized = False
+
+    @classmethod
+    def invalidate_user(cls, user_id: str) -> None:
+        """Drop the cached per-user model layer for ``user_id``.
+
+        Called by the BYOM REST routes after every create/update/delete.
+        Two effects:
+
+        * Local: pop the entry from this process's cache so the next
+          lookup re-reads from Postgres immediately.
+        * Cross-process: ``INCR`` a Redis-side version counter for this
+          user. Other gunicorn/Celery processes notice the counter
+          changed on their next TTL-driven recheck (see
+          ``_user_models_for``) and reload. If Redis is unreachable we
+          log and continue — local invalidation still happened, and
+          peers fall back to TTL-only staleness bounds.
+        """
+        if cls._instance is not None:
+            cls._instance._user_models.pop(user_id, None)
+        try:
+            from application.cache import get_redis_instance
+
+            client = get_redis_instance()
+            if client is not None:
+                client.incr(_USER_VERSION_KEY_PREFIX + user_id)
+        except Exception as e:
+            logger.warning(
+                "BYOM invalidate: failed to publish version bump for "
+                "user %s (Redis unreachable?): %s",
+                user_id,
+                e,
+            )
+
+    @classmethod
+    def _read_user_version(cls, user_id: str) -> Optional[int]:
+        """Return the Redis-side invalidation counter for ``user_id``.
+
+        ``0`` if the key has never been bumped; ``None`` if Redis is
+        unreachable or the read failed (callers fall back to TTL-only
+        staleness in that case).
+        """
+        try:
+            from application.cache import get_redis_instance
+
+            client = get_redis_instance()
+            if client is None:
+                return None
+            raw = client.get(_USER_VERSION_KEY_PREFIX + user_id)
+            if raw is None:
+                return 0
+            return int(raw)
+        except Exception:
+            return None
+
+    def _load_models(self) -> None:
+        from pathlib import Path
+
+        from application.core.settings import settings
+        from application.llm.providers import ALL_PROVIDERS
+
+        directories = [BUILTIN_MODELS_DIR]
+        operator_dir = getattr(settings, "MODELS_CONFIG_DIR", None)
+        if operator_dir:
+            op_path = Path(operator_dir)
+            if not op_path.exists():
+                logger.warning(
+                    "MODELS_CONFIG_DIR=%s does not exist; no operator "
+                    "model YAMLs will be loaded.",
+                    operator_dir,
+                )
+            elif not op_path.is_dir():
+                logger.warning(
+                    "MODELS_CONFIG_DIR=%s is not a directory; no operator "
+                    "model YAMLs will be loaded.",
+                    operator_dir,
+                )
+            else:
+                directories.append(op_path)
+
+        catalogs = load_model_yamls(directories)
+
+        # Validate every catalog targets a known plugin before doing any
+        # registry work, so an unknown provider name in YAML aborts boot
+        # with a clear error.
+        plugin_names = {p.name for p in ALL_PROVIDERS}
+        for c in catalogs:
+            if c.provider not in plugin_names:
+                raise ValueError(
+                    f"{c.source_path}: YAML declares unknown provider "
+                    f"{c.provider!r}; no Provider plugin is registered "
+                    f"under that name. Known: {sorted(plugin_names)}"
+                )
+
+        catalogs_by_provider: Dict[str, List[ProviderCatalog]] = defaultdict(list)
+        for c in catalogs:
+            catalogs_by_provider[c.provider].append(c)
+
+        self.models.clear()
+        for provider in ALL_PROVIDERS:
+            if not provider.is_enabled(settings):
+                continue
+            for model in provider.get_models(
+                settings, catalogs_by_provider.get(provider.name, [])
+            ):
+                self.models[model.id] = model
+
+        self.default_model_id = self._resolve_default(settings)
+
+        logger.info(
+            "ModelRegistry loaded %d models, default: %s",
+            len(self.models),
+            self.default_model_id,
+        )
+
+    def _resolve_default(self, settings) -> Optional[str]:
+        if settings.LLM_NAME:
+            for name in self._parse_model_names(settings.LLM_NAME):
+                if name in self.models:
+                    return name
+            if settings.LLM_NAME in self.models:
+                return settings.LLM_NAME
+
+        if settings.LLM_PROVIDER and settings.API_KEY:
+            for model_id, model in self.models.items():
+                if model.provider.value == settings.LLM_PROVIDER:
+                    return model_id
+
+        if self.models:
+            return next(iter(self.models.keys()))
+        return None
+
+    @staticmethod
+    def _parse_model_names(llm_name: str) -> List[str]:
+        if not llm_name:
+            return []
+        return [name.strip() for name in llm_name.split(",") if name.strip()]
+
+    # Per-user (BYOM) layer
+
+    def _user_models_for(self, user_id: str) -> Dict[str, AvailableModel]:
+        """Return the user's BYOM models keyed by registry id (UUID).
+
+        Loaded lazily from Postgres on first access; cached subject to
+        a per-process TTL (``_USER_CACHE_TTL_SECONDS``) and a Redis-
+        backed version counter for cross-process invalidation. The TTL
+        bounds staleness even when Redis is unreachable, while the
+        version stamp lets peers refresh without a DB read on the
+        common case (no invalidation since last load). Decryption
+        failures and DB errors yield an empty layer (logged) — the
+        user simply doesn't see their custom models on this request,
+        never a 500.
+        """
+        cached = self._user_models.get(user_id)
+        now = time.monotonic()
+
+        if cached is not None:
+            layer, cached_version, loaded_at = cached
+            if (now - loaded_at) < _USER_CACHE_TTL_SECONDS:
+                return layer
+            # TTL elapsed: peek at the cross-process counter. If it
+            # matches what we saw at load time, no invalidation has
+            # happened — extend the TTL without touching Postgres. If
+            # Redis is unreachable (``current_version is None``) we
+            # fall through to a real reload, which keeps staleness
+            # bounded to the TTL.
+            current_version = self._read_user_version(user_id)
+            if (
+                current_version is not None
+                and cached_version is not None
+                and current_version == cached_version
+            ):
+                self._user_models[user_id] = (layer, cached_version, now)
+                return layer
+
+        # Capture the counter *before* the DB read so a CRUD that lands
+        # mid-reload doesn't get masked: the next access will see a
+        # newer version and reload again.
+        version_before_read = self._read_user_version(user_id)
+
+        layer: Dict[str, AvailableModel] = {}
+        try:
+            from application.core.model_settings import (
+                ModelCapabilities,
+                ModelProvider,
+            )
+            from application.storage.db.repositories.user_custom_models import (
+                UserCustomModelsRepository,
+            )
+            from application.storage.db.session import db_readonly
+
+            with db_readonly() as conn:
+                repo = UserCustomModelsRepository(conn)
+                rows = repo.list_for_user(user_id)
+                for row in rows:
+                    api_key = repo._decrypt_api_key(
+                        row.get("api_key_encrypted", ""), user_id
+                    )
+                    if not api_key:
+                        # SECURITY: do NOT register an unroutable BYOM
+                        # record. If we did, LLMCreator would fall back
+                        # to the caller-passed api_key (settings.API_KEY
+                        # for openai_compatible) and POST it to the
+                        # user-supplied base_url — leaking the instance
+                        # credential to the user's chosen endpoint.
+                        # Most likely cause is ENCRYPTION_SECRET_KEY
+                        # having rotated; user must re-save the model.
+                        logger.warning(
+                            "user_custom_models: skipping model %s for "
+                            "user %s — api_key could not be decrypted "
+                            "(rotated ENCRYPTION_SECRET_KEY?). Re-save "
+                            "the model to recover.",
+                            row.get("id"),
+                            user_id,
+                        )
+                        continue
+                    caps_raw = row.get("capabilities") or {}
+                    # Stored attachments may be aliases (``image``) or
+                    # raw MIME types. Built-in YAML models expand at
+                    # load time; mirror that here so downstream MIME-
+                    # type comparisons (handlers/base.prepare_messages)
+                    # match concrete types like ``image/png`` rather
+                    # than the bare alias.
+                    from application.core.model_yaml import (
+                        expand_attachments_lenient,
+                    )
+
+                    raw_attachments = caps_raw.get("attachments", []) or []
+                    expanded_attachments = expand_attachments_lenient(
+                        raw_attachments,
+                        f"user_custom_models[user={user_id}, model={row.get('id')}]",
+                    )
+                    caps = ModelCapabilities(
+                        supports_tools=bool(caps_raw.get("supports_tools", False)),
+                        supports_structured_output=bool(
+                            caps_raw.get("supports_structured_output", False)
+                        ),
+                        supports_streaming=bool(
+                            caps_raw.get("supports_streaming", True)
+                        ),
+                        supported_attachment_types=expanded_attachments,
+                        context_window=int(
+                            caps_raw.get("context_window") or 128000
+                        ),
+                    )
+                    model_id = str(row["id"])
+                    layer[model_id] = AvailableModel(
+                        id=model_id,
+                        provider=ModelProvider.OPENAI_COMPATIBLE,
+                        display_name=row["display_name"],
+                        description=row.get("description") or "",
+                        capabilities=caps,
+                        enabled=bool(row.get("enabled", True)),
+                        base_url=row["base_url"],
+                        upstream_model_id=row["upstream_model_id"],
+                        source="user",
+                        api_key=api_key,
+                    )
+        except Exception as e:
+            logger.warning(
+                "user_custom_models: failed to load layer for user %s: %s",
+                user_id,
+                e,
+            )
+            layer = {}
+
+        self._user_models[user_id] = (layer, version_before_read, now)
+        return layer
+
+    # Lookup API. ``user_id`` enables the BYOM per-user layer; without
+    # it, callers see only the built-in + operator catalog.
+
+    def get_model(
+        self, model_id: str, user_id: Optional[str] = None
+    ) -> Optional[AvailableModel]:
+        if user_id:
+            user_layer = self._user_models_for(user_id)
+            if model_id in user_layer:
+                return user_layer[model_id]
+        return self.models.get(model_id)
+
+    def get_all_models(
+        self, user_id: Optional[str] = None
+    ) -> List[AvailableModel]:
+        out = list(self.models.values())
+        if user_id:
+            out.extend(self._user_models_for(user_id).values())
+        return out
+
+    def get_enabled_models(
+        self, user_id: Optional[str] = None
+    ) -> List[AvailableModel]:
+        out = [m for m in self.models.values() if m.enabled]
+        if user_id:
+            out.extend(
+                m for m in self._user_models_for(user_id).values() if m.enabled
+            )
+        return out
+
+    def model_exists(
+        self, model_id: str, user_id: Optional[str] = None
+    ) -> bool:
+        if user_id and model_id in self._user_models_for(user_id):
+            return True
+        return model_id in self.models
--- a/application/core/model_settings.py
+++ b/application/core/model_settings.py
@@ -5,9 +5,16 @@ from typing import Dict, List, Optional

 logger = logging.getLogger(__name__)

+# Re-exported here so existing call sites (and tests) that do
+# ``from application.core.model_settings import ModelRegistry`` keep
+# working. The implementation lives in ``application/core/model_registry.py``.
+# Imported lazily inside ``__getattr__`` to avoid an import cycle with
+# ``model_yaml`` → ``model_settings`` (this file).
+

 class ModelProvider(str, Enum):
    OPENAI = "openai"
+    OPENAI_COMPATIBLE = "openai_compatible"
    OPENROUTER = "openrouter"
    AZURE_OPENAI = "azure_openai"
    ANTHROPIC = "anthropic"
@@ -41,11 +48,21 @@ class AvailableModel:
    capabilities: ModelCapabilities = field(default_factory=ModelCapabilities)
    enabled: bool = True
    base_url: Optional[str] = None
+    # User-facing label distinct from dispatch provider (e.g. mistral
+    # routed through openai_compatible).
+    display_provider: Optional[str] = None
+    # Sent in the API call's ``model`` field; falls back to ``self.id``
+    # for built-ins where id IS the upstream name.
+    upstream_model_id: Optional[str] = None
+    # "builtin" for catalog YAMLs, "user" for BYOM records.
+    source: str = "builtin"
+    # Decrypted/resolved at registry-merge time. Never serialized.
+    api_key: Optional[str] = field(default=None, repr=False, compare=False)

    def to_dict(self) -> Dict:
        result = {
            "id": self.id,
-            "provider": self.provider.value,
+            "provider": self.display_provider or self.provider.value,
            "display_name": self.display_name,
            "description": self.description,
            "supported_attachment_types": self.capabilities.supported_attachment_types,
@@ -54,261 +71,21 @@ class AvailableModel:
            "supports_streaming": self.capabilities.supports_streaming,
            "context_window": self.capabilities.context_window,
            "enabled": self.enabled,
+            "source": self.source,
        }
        if self.base_url:
            result["base_url"] = self.base_url
        return result


-class ModelRegistry:
-    _instance = None
-    _initialized = False
+def __getattr__(name):
+    """Lazy re-export of ``ModelRegistry`` from ``model_registry.py``.

-    def __new__(cls):
-        if cls._instance is None:
-            cls._instance = super().__new__(cls)
-        return cls._instance
+    Done lazily to avoid an import cycle: ``model_registry`` imports
+    ``model_yaml`` which imports the dataclasses from this file.
+    """
+    if name == "ModelRegistry":
+        from application.core.model_registry import ModelRegistry as _MR

-    def __init__(self):
-        if not ModelRegistry._initialized:
-            self.models: Dict[str, AvailableModel] = {}
-            self.default_model_id: Optional[str] = None
-            self._load_models()
-            ModelRegistry._initialized = True
-
-    @classmethod
-    def get_instance(cls) -> "ModelRegistry":
-        return cls()
-
-    def _load_models(self):
-        from application.core.settings import settings
-
-        self.models.clear()
-
-        # Skip DocsGPT model if using custom OpenAI-compatible endpoint
-        if not settings.OPENAI_BASE_URL:
-            self._add_docsgpt_models(settings)
-        if (
-            settings.OPENAI_API_KEY
-            or (settings.LLM_PROVIDER == "openai" and settings.API_KEY)
-            or settings.OPENAI_BASE_URL
-        ):
-            self._add_openai_models(settings)
-        if settings.OPENAI_API_BASE or (
-            settings.LLM_PROVIDER == "azure_openai" and settings.API_KEY
-        ):
-            self._add_azure_openai_models(settings)
-        if settings.ANTHROPIC_API_KEY or (
-            settings.LLM_PROVIDER == "anthropic" and settings.API_KEY
-        ):
-            self._add_anthropic_models(settings)
-        if settings.GOOGLE_API_KEY or (
-            settings.LLM_PROVIDER == "google" and settings.API_KEY
-        ):
-            self._add_google_models(settings)
-        if settings.GROQ_API_KEY or (
-            settings.LLM_PROVIDER == "groq" and settings.API_KEY
-        ):
-            self._add_groq_models(settings)
-        if settings.OPEN_ROUTER_API_KEY or (
-            settings.LLM_PROVIDER == "openrouter" and settings.API_KEY
-        ):
-            self._add_openrouter_models(settings)
-        if settings.NOVITA_API_KEY or (
-            settings.LLM_PROVIDER == "novita" and settings.API_KEY
-        ):
-            self._add_novita_models(settings)
-        if settings.HUGGINGFACE_API_KEY or (
-            settings.LLM_PROVIDER == "huggingface" and settings.API_KEY
-        ):
-            self._add_huggingface_models(settings)
-        # Default model selection
-        if settings.LLM_NAME:
-            # Parse LLM_NAME (may be comma-separated)
-            model_names = self._parse_model_names(settings.LLM_NAME)
-            # First model in the list becomes default
-            for model_name in model_names:
-                if model_name in self.models:
-                    self.default_model_id = model_name
-                    break
-            # Backward compat: try exact match if no parsed model found
-            if not self.default_model_id and settings.LLM_NAME in self.models:
-                self.default_model_id = settings.LLM_NAME
-
-        if not self.default_model_id:
-            if settings.LLM_PROVIDER and settings.API_KEY:
-                for model_id, model in self.models.items():
-                    if model.provider.value == settings.LLM_PROVIDER:
-                        self.default_model_id = model_id
-                        break
-
-        if not self.default_model_id and self.models:
-            self.default_model_id = next(iter(self.models.keys()))
-        logger.info(
-            f"ModelRegistry loaded {len(self.models)} models, default: {self.default_model_id}"
-        )
-
-    def _add_openai_models(self, settings):
-        from application.core.model_configs import (
-            OPENAI_MODELS,
-            create_custom_openai_model,
-        )
-
-        # Check if using local OpenAI-compatible endpoint (Ollama, LM Studio, etc.)
-        using_local_endpoint = bool(
-            settings.OPENAI_BASE_URL and settings.OPENAI_BASE_URL.strip()
-        )
-
-        if using_local_endpoint:
-            # When OPENAI_BASE_URL is set, ONLY register custom models from LLM_NAME
-            # Do NOT add standard OpenAI models (gpt-5.1, etc.)
-            if settings.LLM_NAME:
-                model_names = self._parse_model_names(settings.LLM_NAME)
-                for model_name in model_names:
-                    custom_model = create_custom_openai_model(
-                        model_name, settings.OPENAI_BASE_URL
-                    )
-                    self.models[model_name] = custom_model
-                    logger.info(
-                        f"Registered custom OpenAI model: {model_name} at {settings.OPENAI_BASE_URL}"
-                    )
-        else:
-            # Standard OpenAI API usage - add standard models if API key is valid
-            if settings.OPENAI_API_KEY:
-                for model in OPENAI_MODELS:
-                    self.models[model.id] = model
-
-    def _add_azure_openai_models(self, settings):
-        from application.core.model_configs import AZURE_OPENAI_MODELS
-
-        if settings.LLM_PROVIDER == "azure_openai" and settings.LLM_NAME:
-            for model in AZURE_OPENAI_MODELS:
-                if model.id == settings.LLM_NAME:
-                    self.models[model.id] = model
-                    return
-        for model in AZURE_OPENAI_MODELS:
-            self.models[model.id] = model
-
-    def _add_anthropic_models(self, settings):
-        from application.core.model_configs import ANTHROPIC_MODELS
-
-        if settings.ANTHROPIC_API_KEY:
-            for model in ANTHROPIC_MODELS:
-                self.models[model.id] = model
-            return
-        if settings.LLM_PROVIDER == "anthropic" and settings.LLM_NAME:
-            for model in ANTHROPIC_MODELS:
-                if model.id == settings.LLM_NAME:
-                    self.models[model.id] = model
-                    return
-        for model in ANTHROPIC_MODELS:
-            self.models[model.id] = model
-
-    def _add_google_models(self, settings):
-        from application.core.model_configs import GOOGLE_MODELS
-
-        if settings.GOOGLE_API_KEY:
-            for model in GOOGLE_MODELS:
-                self.models[model.id] = model
-            return
-        if settings.LLM_PROVIDER == "google" and settings.LLM_NAME:
-            for model in GOOGLE_MODELS:
-                if model.id == settings.LLM_NAME:
-                    self.models[model.id] = model
-                    return
-        for model in GOOGLE_MODELS:
-            self.models[model.id] = model
-
-    def _add_groq_models(self, settings):
-        from application.core.model_configs import GROQ_MODELS
-
-        if settings.GROQ_API_KEY:
-            for model in GROQ_MODELS:
-                self.models[model.id] = model
-            return
-        if settings.LLM_PROVIDER == "groq" and settings.LLM_NAME:
-            for model in GROQ_MODELS:
-                if model.id == settings.LLM_NAME:
-                    self.models[model.id] = model
-                    return
-        for model in GROQ_MODELS:
-            self.models[model.id] = model
-    
-    def _add_openrouter_models(self, settings):
-        from application.core.model_configs import OPENROUTER_MODELS
-
-        if settings.OPEN_ROUTER_API_KEY:
-            for model in OPENROUTER_MODELS:
-                self.models[model.id] = model
-            return
-        if settings.LLM_PROVIDER == "openrouter" and settings.LLM_NAME:
-            for model in OPENROUTER_MODELS:
-                if model.id == settings.LLM_NAME:
-                    self.models[model.id] = model
-                    return
-        for model in OPENROUTER_MODELS:
-            self.models[model.id] = model
-
-    def _add_novita_models(self, settings):
-        from application.core.model_configs import NOVITA_MODELS
-
-        if settings.NOVITA_API_KEY:
-            for model in NOVITA_MODELS:
-                self.models[model.id] = model
-            return
-        if settings.LLM_PROVIDER == "novita" and settings.LLM_NAME:
-            for model in NOVITA_MODELS:
-                if model.id == settings.LLM_NAME:
-                    self.models[model.id] = model
-                    return
-        for model in NOVITA_MODELS:
-            self.models[model.id] = model
-
-    def _add_docsgpt_models(self, settings):
-        model_id = "docsgpt-local"
-        model = AvailableModel(
-            id=model_id,
-            provider=ModelProvider.DOCSGPT,
-            display_name="DocsGPT Model",
-            description="Local model",
-            capabilities=ModelCapabilities(
-                supports_tools=False,
-                supported_attachment_types=[],
-            ),
-        )
-        self.models[model_id] = model
-
-    def _add_huggingface_models(self, settings):
-        model_id = "huggingface-local"
-        model = AvailableModel(
-            id=model_id,
-            provider=ModelProvider.HUGGINGFACE,
-            display_name="Hugging Face Model",
-            description="Local Hugging Face model",
-            capabilities=ModelCapabilities(
-                supports_tools=False,
-                supported_attachment_types=[],
-            ),
-        )
-        self.models[model_id] = model
-
-    def _parse_model_names(self, llm_name: str) -> List[str]:
-        """
-        Parse LLM_NAME which may contain comma-separated model names.
-        E.g., 'deepseek-r1:1.5b,gemma:2b' -> ['deepseek-r1:1.5b', 'gemma:2b']
-        """
-        if not llm_name:
-            return []
-        return [name.strip() for name in llm_name.split(",") if name.strip()]
-
-    def get_model(self, model_id: str) -> Optional[AvailableModel]:
-        return self.models.get(model_id)
-
-    def get_all_models(self) -> List[AvailableModel]:
-        return list(self.models.values())
-
-    def get_enabled_models(self) -> List[AvailableModel]:
-        return [m for m in self.models.values() if m.enabled]
-
-    def model_exists(self, model_id: str) -> bool:
-        return model_id in self.models
+        return _MR
+    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
--- a/application/core/model_utils.py
+++ b/application/core/model_utils.py
@@ -1,47 +1,59 @@
 from typing import Any, Dict, Optional

-from application.core.model_settings import ModelRegistry
+from application.core.model_registry import ModelRegistry


 def get_api_key_for_provider(provider: str) -> Optional[str]:
-    """Get the appropriate API key for a provider"""
+    """Get the appropriate API key for a provider.
+
+    Delegates to the provider plugin's ``get_api_key``. Falls back to the
+    generic ``settings.API_KEY`` for unknown providers.
+    """
    from application.core.settings import settings
+    from application.llm.providers import PROVIDERS_BY_NAME

-    provider_key_map = {
-        "openai": settings.OPENAI_API_KEY,
-        "openrouter": settings.OPEN_ROUTER_API_KEY,
-        "novita": settings.NOVITA_API_KEY,
-        "anthropic": settings.ANTHROPIC_API_KEY,
-        "google": settings.GOOGLE_API_KEY,
-        "groq": settings.GROQ_API_KEY,
-        "huggingface": settings.HUGGINGFACE_API_KEY,
-        "azure_openai": settings.API_KEY,
-        "docsgpt": None,
-        "llama.cpp": None,
-    }
-
-    provider_key = provider_key_map.get(provider)
-    if provider_key:
-        return provider_key
+    plugin = PROVIDERS_BY_NAME.get(provider)
+    if plugin is not None:
+        key = plugin.get_api_key(settings)
+        if key:
+            return key
    return settings.API_KEY


-def get_all_available_models() -> Dict[str, Dict[str, Any]]:
-    """Get all available models with metadata for API response"""
+def get_all_available_models(
+    user_id: Optional[str] = None,
+) -> Dict[str, Dict[str, Any]]:
+    """Get all available models with metadata for API response.
+
+    When ``user_id`` is supplied, the user's BYOM custom-model records
+    are merged into the result alongside the built-in catalog.
+    """
    registry = ModelRegistry.get_instance()
-    return {model.id: model.to_dict() for model in registry.get_enabled_models()}
+    return {
+        model.id: model.to_dict()
+        for model in registry.get_enabled_models(user_id=user_id)
+    }


-def validate_model_id(model_id: str) -> bool:
-    """Check if a model ID exists in registry"""
+def validate_model_id(model_id: str, user_id: Optional[str] = None) -> bool:
+    """Check if a model ID exists in registry.
+
+    ``user_id`` enables resolution of per-user BYOM records (UUIDs).
+    Without it, only built-in catalog ids resolve.
+    """
    registry = ModelRegistry.get_instance()
-    return registry.model_exists(model_id)
+    return registry.model_exists(model_id, user_id=user_id)


-def get_model_capabilities(model_id: str) -> Optional[Dict[str, Any]]:
-    """Get capabilities for a specific model"""
+def get_model_capabilities(
+    model_id: str, user_id: Optional[str] = None
+) -> Optional[Dict[str, Any]]:
+    """Get capabilities for a specific model.
+
+    ``user_id`` enables resolution of per-user BYOM records.
+    """
    registry = ModelRegistry.get_instance()
-    model = registry.get_model(model_id)
+    model = registry.get_model(model_id, user_id=user_id)
    if model:
        return {
            "supported_attachment_types": model.capabilities.supported_attachment_types,
@@ -58,36 +70,68 @@ def get_default_model_id() -> str:
    return registry.default_model_id


-def get_provider_from_model_id(model_id: str) -> Optional[str]:
-    """Get the provider name for a given model_id"""
+def get_provider_from_model_id(
+    model_id: str, user_id: Optional[str] = None
+) -> Optional[str]:
+    """Get the provider name for a given model_id.
+
+    ``user_id`` enables resolution of per-user BYOM records (UUIDs).
+    Without it, BYOM model ids return ``None`` and the caller falls
+    back to the deployment default.
+    """
    registry = ModelRegistry.get_instance()
-    model = registry.get_model(model_id)
+    model = registry.get_model(model_id, user_id=user_id)
    if model:
        return model.provider.value
    return None


-def get_token_limit(model_id: str) -> int:
-    """
-    Get context window (token limit) for a model.
-    Returns model's context_window or default 128000 if model not found.
+def get_token_limit(model_id: str, user_id: Optional[str] = None) -> int:
+    """Get context window (token limit) for a model.
+
+    Returns the model's ``context_window`` or ``DEFAULT_LLM_TOKEN_LIMIT``
+    if not found. ``user_id`` enables resolution of per-user BYOM records.
    """
    from application.core.settings import settings

    registry = ModelRegistry.get_instance()
-    model = registry.get_model(model_id)
+    model = registry.get_model(model_id, user_id=user_id)
    if model:
        return model.capabilities.context_window
    return settings.DEFAULT_LLM_TOKEN_LIMIT


-def get_base_url_for_model(model_id: str) -> Optional[str]:
-    """
-    Get the custom base_url for a specific model if configured.
-    Returns None if no custom base_url is set.
+def get_base_url_for_model(
+    model_id: str, user_id: Optional[str] = None
+) -> Optional[str]:
+    """Get the custom base_url for a specific model if configured.
+
+    Returns ``None`` if no custom base_url is set. ``user_id`` enables
+    resolution of per-user BYOM records.
    """
    registry = ModelRegistry.get_instance()
-    model = registry.get_model(model_id)
+    model = registry.get_model(model_id, user_id=user_id)
    if model:
        return model.base_url
    return None
+
+
+def get_api_key_for_model(
+    model_id: str, user_id: Optional[str] = None
+) -> Optional[str]:
+    """Resolve the API key to use when invoking ``model_id``.
+
+    Priority:
+      1. The model record's own ``api_key`` (BYOM records and
+         ``openai_compatible`` YAMLs populate this).
+      2. The provider plugin's settings-based key.
+
+    ``user_id`` enables resolution of per-user BYOM records.
+    """
+    registry = ModelRegistry.get_instance()
+    model = registry.get_model(model_id, user_id=user_id)
+    if model is not None and model.api_key:
+        return model.api_key
+    if model is not None:
+        return get_api_key_for_provider(model.provider.value)
+    return None
--- a/application/core/model_yaml.py
+++ b/application/core/model_yaml.py
@@ -0,0 +1,358 @@
+"""YAML loader for model catalog files under ``application/core/models/``.
+
+Each ``*.yaml`` file declares one provider's static model catalog. Files
+are validated with Pydantic at load time; any parse, schema, or alias
+error aborts startup with the offending file path in the message.
+
+For most providers, one YAML maps to one catalog. The
+``openai_compatible`` provider is special: each YAML file represents a
+distinct logical endpoint (Mistral, Together, Ollama, ...) with its own
+``api_key_env`` and ``base_url``. The loader returns a flat list so the
+registry can distinguish multiple files with the same ``provider:`` value.
+"""
+
+from __future__ import annotations
+
+import logging
+from pathlib import Path
+from typing import Dict, List, Optional, Sequence
+
+import yaml
+from pydantic import BaseModel, ConfigDict, Field, field_validator
+
+from application.core.model_settings import (
+    AvailableModel,
+    ModelCapabilities,
+    ModelProvider,
+)
+
+logger = logging.getLogger(__name__)
+
+BUILTIN_MODELS_DIR = Path(__file__).parent / "models"
+DEFAULTS_FILENAME = "_defaults.yaml"
+
+
+class _DefaultsFile(BaseModel):
+    """Schema for ``_defaults.yaml``. Currently just attachment aliases."""
+
+    model_config = ConfigDict(extra="forbid")
+
+    attachment_aliases: Dict[str, List[str]] = Field(default_factory=dict)
+
+
+class _CapabilityFields(BaseModel):
+    """Capability fields shared between provider ``defaults:`` and per-model overrides.
+
+    All fields are optional so a per-model override can selectively replace
+    a single field from the provider-level defaults.
+    """
+
+    model_config = ConfigDict(extra="forbid")
+
+    supports_tools: Optional[bool] = None
+    supports_structured_output: Optional[bool] = None
+    supports_streaming: Optional[bool] = None
+    attachments: Optional[List[str]] = None
+    context_window: Optional[int] = None
+    input_cost_per_token: Optional[float] = None
+    output_cost_per_token: Optional[float] = None
+
+
+class _ModelEntry(_CapabilityFields):
+    """Schema for one model row inside a YAML's ``models:`` list."""
+
+    id: str
+    display_name: Optional[str] = None
+    description: str = ""
+    enabled: bool = True
+    base_url: Optional[str] = None
+    aliases: List[str] = Field(default_factory=list)
+
+    @field_validator("id")
+    @classmethod
+    def _id_nonempty(cls, v: str) -> str:
+        if not v or not v.strip():
+            raise ValueError("model id must be a non-empty string")
+        return v
+
+
+class _ProviderFile(BaseModel):
+    """Schema for one ``<provider>.yaml`` catalog file."""
+
+    model_config = ConfigDict(extra="forbid")
+
+    provider: str
+    defaults: _CapabilityFields = Field(default_factory=_CapabilityFields)
+    models: List[_ModelEntry] = Field(default_factory=list)
+    # openai_compatible metadata. Optional for other providers.
+    display_provider: Optional[str] = None
+    api_key_env: Optional[str] = None
+    base_url: Optional[str] = None
+
+
+class ProviderCatalog(BaseModel):
+    """One YAML file's parsed contents, ready for the registry.
+
+    For most providers, multiple catalogs with the same ``provider`` get
+    merged later by the registry. The ``openai_compatible`` provider is
+    the exception: each catalog is treated as a distinct endpoint, with
+    its own ``api_key_env`` and ``base_url``.
+    """
+
+    provider: str
+    models: List[AvailableModel]
+    source_path: Optional[Path] = None
+    display_provider: Optional[str] = None
+    api_key_env: Optional[str] = None
+    base_url: Optional[str] = None
+
+    model_config = ConfigDict(arbitrary_types_allowed=True)
+
+
+class ModelYAMLError(ValueError):
+    """Raised when a model YAML fails parsing, schema, or alias validation."""
+
+
+def _expand_attachments(
+    attachments: Sequence[str], aliases: Dict[str, List[str]], source: str
+) -> List[str]:
+    """Resolve attachment shorthands (``image``, ``pdf``) to MIME types.
+
+    Raw MIME-typed entries (containing ``/``) pass through unchanged.
+    Unknown aliases raise ``ModelYAMLError``.
+    """
+    expanded: List[str] = []
+    seen: set = set()
+    for entry in attachments:
+        if "/" in entry:
+            if entry not in seen:
+                expanded.append(entry)
+                seen.add(entry)
+            continue
+        if entry not in aliases:
+            valid = ", ".join(sorted(aliases.keys())) or "<none defined>"
+            raise ModelYAMLError(
+                f"{source}: unknown attachment alias '{entry}'. "
+                f"Valid aliases: {valid}. "
+                "(Or use a raw MIME type like 'image/png'.)"
+            )
+        for mime in aliases[entry]:
+            if mime not in seen:
+                expanded.append(mime)
+                seen.add(mime)
+    return expanded
+
+
+def _load_defaults(directory: Path) -> Dict[str, List[str]]:
+    """Load ``_defaults.yaml`` from ``directory`` if it exists."""
+    path = directory / DEFAULTS_FILENAME
+    if not path.exists():
+        return {}
+    try:
+        raw = yaml.safe_load(path.read_text(encoding="utf-8")) or {}
+    except yaml.YAMLError as e:
+        raise ModelYAMLError(f"{path}: invalid YAML: {e}") from e
+    try:
+        parsed = _DefaultsFile.model_validate(raw)
+    except Exception as e:
+        raise ModelYAMLError(f"{path}: schema error: {e}") from e
+    return parsed.attachment_aliases
+
+
+def _resolve_provider_enum(name: str, source: Path) -> ModelProvider:
+    try:
+        return ModelProvider(name)
+    except ValueError as e:
+        valid = ", ".join(p.value for p in ModelProvider)
+        raise ModelYAMLError(
+            f"{source}: unknown provider '{name}'. Valid: {valid}"
+        ) from e
+
+
+def _build_model(
+    entry: _ModelEntry,
+    defaults: _CapabilityFields,
+    provider: ModelProvider,
+    aliases: Dict[str, List[str]],
+    source: Path,
+    display_provider: Optional[str] = None,
+) -> AvailableModel:
+    """Merge defaults + per-model overrides into a final ``AvailableModel``."""
+
+    def pick(field_name: str, fallback):
+        v = getattr(entry, field_name)
+        if v is not None:
+            return v
+        d = getattr(defaults, field_name)
+        if d is not None:
+            return d
+        return fallback
+
+    raw_attachments = entry.attachments
+    if raw_attachments is None:
+        raw_attachments = defaults.attachments
+    if raw_attachments is None:
+        raw_attachments = []
+    expanded = _expand_attachments(
+        raw_attachments, aliases, f"{source} [model={entry.id}]"
+    )
+
+    caps = ModelCapabilities(
+        supports_tools=pick("supports_tools", False),
+        supports_structured_output=pick("supports_structured_output", False),
+        supports_streaming=pick("supports_streaming", True),
+        supported_attachment_types=expanded,
+        context_window=pick("context_window", 128000),
+        input_cost_per_token=pick("input_cost_per_token", None),
+        output_cost_per_token=pick("output_cost_per_token", None),
+    )
+
+    return AvailableModel(
+        id=entry.id,
+        provider=provider,
+        display_name=entry.display_name or entry.id,
+        description=entry.description,
+        capabilities=caps,
+        enabled=entry.enabled,
+        base_url=entry.base_url,
+        display_provider=display_provider,
+    )
+
+
+def _load_one_yaml(
+    path: Path, aliases: Dict[str, List[str]]
+) -> ProviderCatalog:
+    try:
+        raw = yaml.safe_load(path.read_text(encoding="utf-8")) or {}
+    except yaml.YAMLError as e:
+        raise ModelYAMLError(f"{path}: invalid YAML: {e}") from e
+    try:
+        parsed = _ProviderFile.model_validate(raw)
+    except Exception as e:
+        raise ModelYAMLError(f"{path}: schema error: {e}") from e
+
+    provider_enum = _resolve_provider_enum(parsed.provider, path)
+    models = [
+        _build_model(
+            entry,
+            parsed.defaults,
+            provider_enum,
+            aliases,
+            path,
+            display_provider=parsed.display_provider,
+        )
+        for entry in parsed.models
+    ]
+
+    return ProviderCatalog(
+        provider=parsed.provider,
+        models=models,
+        source_path=path,
+        display_provider=parsed.display_provider,
+        api_key_env=parsed.api_key_env,
+        base_url=parsed.base_url,
+    )
+
+
+_BUILTIN_ALIASES_CACHE: Optional[Dict[str, List[str]]] = None
+
+
+def builtin_attachment_aliases() -> Dict[str, List[str]]:
+    """Return the built-in attachment alias map from ``_defaults.yaml``.
+
+    Cached after first read so repeat calls are cheap.
+    """
+    global _BUILTIN_ALIASES_CACHE
+    if _BUILTIN_ALIASES_CACHE is None:
+        _BUILTIN_ALIASES_CACHE = _load_defaults(BUILTIN_MODELS_DIR)
+    return _BUILTIN_ALIASES_CACHE
+
+
+def resolve_attachment_alias(alias: str) -> List[str]:
+    """Resolve a single attachment alias (e.g. ``"image"``) to its
+    canonical MIME-type list. Raises ``ModelYAMLError`` if unknown.
+    """
+    aliases = builtin_attachment_aliases()
+    if alias not in aliases:
+        valid = ", ".join(sorted(aliases.keys())) or "<none defined>"
+        raise ModelYAMLError(
+            f"Unknown attachment alias '{alias}'. Valid: {valid}"
+        )
+    return list(aliases[alias])
+
+
+def expand_attachments_lenient(
+    attachments: Sequence[str], source: str
+) -> List[str]:
+    """Expand attachment aliases to MIME types, tolerating unknowns.
+
+    Mirrors ``_expand_attachments`` but logs+skips unknown aliases
+    rather than raising. Used for runtime call sites (BYOM registry
+    load) where an operator-side alias-map edit must not drop the
+    entire user's BYOM layer; the strict raise still happens at the
+    API validation boundary.
+    """
+    aliases = builtin_attachment_aliases()
+    expanded: List[str] = []
+    seen: set = set()
+    for entry in attachments:
+        if "/" in entry:
+            if entry not in seen:
+                expanded.append(entry)
+                seen.add(entry)
+            continue
+        mime_list = aliases.get(entry)
+        if mime_list is None:
+            logger.warning(
+                "%s: skipping unknown attachment alias %r", source, entry,
+            )
+            continue
+        for mime in mime_list:
+            if mime not in seen:
+                expanded.append(mime)
+                seen.add(mime)
+    return expanded
+
+
+def load_model_yamls(directories: Sequence[Path]) -> List[ProviderCatalog]:
+    """Load every ``*.yaml`` file (excluding ``_defaults.yaml``) under each
+    directory in order and return a flat list of catalogs.
+
+    Caller is responsible for merging multiple catalogs that target the
+    same provider plugin. The flat-list shape lets ``openai_compatible``
+    keep each file separate (one logical endpoint per file).
+
+    When the same model ``id`` appears in more than one YAML across the
+    directory list, a warning is logged. Order in the returned list
+    preserves load order, so the registry's "later wins" merge gives the
+    later directory's definition.
+    """
+    catalogs: List[ProviderCatalog] = []
+    seen_ids: Dict[str, Path] = {}
+
+    aliases: Dict[str, List[str]] = {}
+    for d in directories:
+        if not d or not d.exists():
+            continue
+        aliases.update(_load_defaults(d))
+
+    for d in directories:
+        if not d or not d.exists():
+            continue
+        for path in sorted(d.glob("*.yaml")):
+            if path.name == DEFAULTS_FILENAME:
+                continue
+            catalog = _load_one_yaml(path, aliases)
+            catalogs.append(catalog)
+            for m in catalog.models:
+                prior = seen_ids.get(m.id)
+                if prior is not None and prior != path:
+                    logger.warning(
+                        "Model id %r redefined: %s overrides %s (later wins)",
+                        m.id,
+                        path,
+                        prior,
+                    )
+                seen_ids[m.id] = path
+
+    return catalogs
--- a/application/core/models/README.md
+++ b/application/core/models/README.md
@@ -0,0 +1,213 @@
+# Model catalogs
+
+Each `*.yaml` file in this directory declares one provider's model
+catalog. The registry loads every YAML at boot and joins it to the
+matching provider plugin under `application/llm/providers/`.
+
+To add or edit models, you almost always only touch a YAML here — no
+Python code required.
+
+## Add a model to an existing provider
+
+Open the provider's YAML (e.g. `anthropic.yaml`) and append two lines
+under `models:`:
+
+```yaml
+models:
+  - id: claude-3-7-sonnet
+    display_name: Claude 3.7 Sonnet
+```
+
+Capabilities default to the provider's `defaults:` block. Override
+per-model only when needed:
+
+```yaml
+  - id: claude-3-7-sonnet
+    display_name: Claude 3.7 Sonnet
+    context_window: 500000
+```
+
+Restart the app. The new model appears in `/api/models`.
+
+> The model `id` is what gets stored in agent / workflow records. Once
+> users start picking the model, **don't rename it** — agent and
+> workflow rows reference it as a free-form string and silently fall
+> back to the system default if the id disappears.
+
+## Add an OpenAI-compatible provider (zero Python)
+
+Drop a YAML in this directory (or in your `MODELS_CONFIG_DIR`) that uses
+the `openai_compatible` plugin. Set the env var named in `api_key_env`
+and you're done — no Python, no settings.py edit, no LLMCreator change:
+
+```yaml
+# mistral.yaml
+provider: openai_compatible
+display_provider: mistral             # shown in /api/models response
+api_key_env: MISTRAL_API_KEY          # env var the plugin reads at boot
+base_url: https://api.mistral.ai/v1
+defaults:
+  supports_tools: true
+  context_window: 128000
+models:
+  - id: mistral-large-latest
+    display_name: Mistral Large
+  - id: mistral-small-latest
+    display_name: Mistral Small
+```
+
+`MISTRAL_API_KEY=sk-... ; restart` — Mistral models appear in
+`/api/models` with `provider: "mistral"`. They route through the OpenAI
+wire format (it's `OpenAILLM` under the hood) but with Mistral's
+endpoint and key.
+
+Multiple `openai_compatible` YAMLs coexist: each file is one logical
+endpoint with its own `api_key_env` and `base_url`. Drop in
+`together.yaml`, `fireworks.yaml`, etc. side by side. If an env var
+isn't set, that catalog is silently skipped at boot (logged at INFO) —
+no error.
+
+Working example: `examples/mistral.yaml.example`. Files inside
+`examples/` aren't loaded by the registry; the glob only picks up
+`*.yaml` at the top level.
+
+## Add a provider with its own SDK
+
+For a provider that doesn't speak OpenAI's wire format, add one Python
+file to `application/llm/providers/<name>.py`:
+
+```python
+from application.llm.providers.base import Provider
+from application.llm.my_provider import MyLLM
+
+class MyProvider(Provider):
+    name = "my_provider"
+    llm_class = MyLLM
+
+    def get_api_key(self, settings):
+        return settings.MY_PROVIDER_API_KEY
+```
+
+Register it in `application/llm/providers/__init__.py` (one line in
+`ALL_PROVIDERS`), add `MY_PROVIDER_API_KEY` to `settings.py`, and create
+`my_provider.yaml` here with the model catalog.
+
+## Schema reference
+
+```yaml
+provider: <string, required>          # matches the Provider plugin's `name`
+
+# openai_compatible only — required for that provider, ignored for others
+display_provider: <string>            # label shown in /api/models response
+api_key_env: <string>                 # name of the env var carrying the key
+base_url: <string>                    # endpoint URL
+
+defaults:                              # optional, applied to every model below
+  supports_tools: bool                 # default false
+  supports_structured_output: bool     # default false
+  supports_streaming: bool             # default true
+  attachments: [<alias-or-mime>, ...]  # default []
+  context_window: int                  # default 128000
+  input_cost_per_token: float          # default null
+  output_cost_per_token: float         # default null
+
+models:                                # required
+  - id: <string, required>             # the value persisted in agent records
+    display_name: <string>             # default: id
+    description: <string>              # default: ""
+    enabled: bool                      # default true; false hides from /api/models
+    base_url: <string>                 # optional custom endpoint for this model
+    # All `defaults:` fields above can be overridden here per-model.
+```
+
+### Attachment aliases
+
+The `attachments:` list can mix human-readable aliases with raw MIME
+types. Aliases are defined in `_defaults.yaml`:
+
+| Alias | Expands to |
+|---|---|
+| `image` | `image/png`, `image/jpeg`, `image/jpg`, `image/webp`, `image/gif` |
+| `pdf` | `application/pdf` |
+| `audio` | `audio/mpeg`, `audio/wav`, `audio/ogg` |
+
+Use raw MIME types when you need surgical control:
+
+```yaml
+attachments: [image/png, image/webp]   # only these two
+```
+
+## Operator-supplied YAMLs (`MODELS_CONFIG_DIR`)
+
+Set the `MODELS_CONFIG_DIR` env var (or `.env` entry) to a directory
+path. Every `*.yaml` in that directory is loaded **after** the built-in
+catalog under `application/core/models/`. Operators use this to:
+
+- Add new `openai_compatible` providers (Mistral, Together, Fireworks,
+  Ollama, ...) without forking the repo.
+- Extend an existing provider's catalog with extra models — append
+  models under `provider: anthropic` and they show up alongside the
+  built-ins.
+- Override a built-in model's capabilities — declare the same `id`
+  with different fields (e.g. a higher `context_window`). Later wins;
+  the override is logged as a `WARNING` so you can audit it.
+
+Things you cannot do via `MODELS_CONFIG_DIR`:
+
+- Add a brand-new non-OpenAI provider — that needs a Python plugin
+  under `application/llm/providers/` (see "Add a provider with its own
+  SDK" above). Operator YAMLs may only target a `provider:` value that
+  already has a registered plugin.
+
+### Example: Docker
+
+Mount your model YAMLs into the container and point the env var at the
+mount path:
+
+```yaml
+# docker-compose.yml
+services:
+  app:
+    image: arc53/docsgpt
+    environment:
+      MODELS_CONFIG_DIR: /etc/docsgpt/models
+      MISTRAL_API_KEY: ${MISTRAL_API_KEY}
+    volumes:
+      - ./my-models:/etc/docsgpt/models:ro
+```
+
+Then `./my-models/mistral.yaml` (the file from
+`examples/mistral.yaml.example`) gets picked up at boot.
+
+### Example: Kubernetes
+
+Mount a `ConfigMap` containing your YAMLs at a known path and set
+`MODELS_CONFIG_DIR` on the deployment. The same `examples/mistral.yaml.example`
+becomes a key in the ConfigMap.
+
+### Misconfiguration
+
+If `MODELS_CONFIG_DIR` is set but the path doesn't exist (or isn't a
+directory), the app logs a `WARNING` at boot and continues with just
+the built-in catalog. The app does *not* fail to start — operators can
+ship config drift without taking down the service — but the warning is
+loud enough to surface in any reasonable log aggregator.
+
+## Validation
+
+YAMLs are parsed with Pydantic at boot. The app fails to start with a
+clear error message if:
+
+- a top-level key is unknown
+- a model is missing `id`
+- an attachment alias isn't defined
+- the `provider:` value isn't registered as a plugin
+
+This is intentional — silent fallbacks would mean users don't notice
+their model picks broke until they hit the API.
+
+## Reserved fields (not yet implemented)
+
+- `aliases:` on a model — old IDs that resolve to this model. Reserved
+  for future renames; the schema accepts the field but it is not yet
+  acted on.
--- a/application/core/models/_defaults.yaml
+++ b/application/core/models/_defaults.yaml
@@ -0,0 +1,18 @@
+# Global defaults applied across every model YAML in this directory.
+# Keep this file sparse — per-provider `defaults:` blocks are clearer
+# than a deep global default chain. This file is for things that
+# genuinely never vary, like the meaning of "image".
+
+attachment_aliases:
+  image:
+    - image/png
+    - image/jpeg
+    - image/jpg
+    - image/webp
+    - image/gif
+  pdf:
+    - application/pdf
+  audio:
+    - audio/mpeg
+    - audio/wav
+    - audio/ogg
--- a/application/core/models/anthropic.yaml
+++ b/application/core/models/anthropic.yaml
@@ -0,0 +1,23 @@
+provider: anthropic
+defaults:
+  supports_tools: true
+  attachments: [image]
+  context_window: 200000
+
+models:
+  - id: claude-opus-4-7
+    display_name: Claude Opus 4.7
+    description: Most capable Claude model for complex reasoning and agentic coding
+    context_window: 1000000
+    supports_structured_output: true
+
+  - id: claude-sonnet-4-6
+    display_name: Claude Sonnet 4.6
+    description: Best balance of speed and intelligence with extended thinking
+    context_window: 1000000
+    supports_structured_output: true
+
+  - id: claude-haiku-4-5
+    display_name: Claude Haiku 4.5
+    description: Fastest Claude model with near-frontier intelligence
+    supports_structured_output: true
--- a/application/core/models/azure_openai.yaml
+++ b/application/core/models/azure_openai.yaml
@@ -0,0 +1,31 @@
+# Azure OpenAI catalog.
+#
+# IMPORTANT: For Azure OpenAI, the `id` field is the **deployment name**, not
+# a model name. Deployment names are arbitrary strings the operator chooses
+# in Azure portal (or via ARM/Bicep/Terraform) when they create a deployment
+# for a given underlying model + version.
+#
+# The IDs below are sensible defaults that mirror the underlying OpenAI
+# model name (prefixed with `azure-`). Operators almost always need to
+# override them via `MODELS_CONFIG_DIR` to match the deployment names that
+# actually exist in their Azure resource. The `display_name`, capability
+# flags, and `context_window` reflect the underlying OpenAI model.
+provider: azure_openai
+
+defaults:
+  supports_tools: true
+  supports_structured_output: true
+  attachments: [image]
+  context_window: 400000
+
+models:
+  - id: azure-gpt-5.5
+    display_name: Azure OpenAI GPT-5.5
+    description: Azure-hosted flagship frontier model for complex reasoning, coding, and agentic work with a 1M-token context window
+    context_window: 1050000
+  - id: azure-gpt-5.4-mini
+    display_name: Azure OpenAI GPT-5.4 Mini
+    description: Azure-hosted cost-efficient GPT-5.4-class model for high-volume coding, computer use, and subagent workloads
+  - id: azure-gpt-5.4-nano
+    display_name: Azure OpenAI GPT-5.4 Nano
+    description: Azure-hosted cheapest GPT-5.4-class model, optimized for simple high-volume tasks where speed and cost matter most
--- a/application/core/models/docsgpt.yaml
+++ b/application/core/models/docsgpt.yaml
@@ -0,0 +1,7 @@
+provider: docsgpt
+
+models:
+  - id: docsgpt-local
+    display_name: DocsGPT Model
+    description: Local model
+    supports_tools: false
--- a/application/core/models/examples/mistral.yaml.example
+++ b/application/core/models/examples/mistral.yaml.example
@@ -0,0 +1,31 @@
+# EXAMPLE — copy this file to ../mistral.yaml (or to your
+# MODELS_CONFIG_DIR) and set MISTRAL_API_KEY in your environment.
+#
+# This is the entire integration. No Python required: the
+# `openai_compatible` plugin reads `api_key_env` and `base_url` from
+# the file and routes calls through the OpenAI wire format.
+#
+# Files in this `examples/` directory are NOT loaded by the registry
+# (the loader globs *.yaml at the top level only).
+
+provider: openai_compatible
+display_provider: mistral             # shown in /api/models response
+api_key_env: MISTRAL_API_KEY          # env var the plugin reads
+base_url: https://api.mistral.ai/v1   # OpenAI-compatible endpoint
+
+defaults:
+  supports_tools: true
+  context_window: 128000
+
+models:
+  - id: mistral-large-latest
+    display_name: Mistral Large
+    description: Top-tier reasoning model
+
+  - id: mistral-small-latest
+    display_name: Mistral Small
+    description: Fast, cost-efficient
+
+  - id: codestral-latest
+    display_name: Codestral
+    description: Code-specialized model
--- a/application/core/models/google.yaml
+++ b/application/core/models/google.yaml
@@ -0,0 +1,17 @@
+provider: google
+defaults:
+  supports_tools: true
+  supports_structured_output: true
+  attachments: [pdf, image]
+  context_window: 1048576
+
+models:
+  - id: gemini-3.1-pro-preview
+    display_name: Gemini 3.1 Pro
+    description: Most capable Gemini 3 model with advanced reasoning and agentic coding (preview)
+  - id: gemini-3-flash-preview
+    display_name: Gemini 3 Flash
+    description: Frontier-class performance for low-latency, high-volume tasks (preview)
+  - id: gemini-3.1-flash-lite-preview
+    display_name: Gemini 3.1 Flash-Lite
+    description: Cost-efficient frontier-class multimodal model for high-throughput workloads (preview)
--- a/application/core/models/groq.yaml
+++ b/application/core/models/groq.yaml
@@ -0,0 +1,16 @@
+provider: groq
+defaults:
+  supports_tools: true
+  context_window: 131072
+
+models:
+  - id: openai/gpt-oss-120b
+    display_name: GPT-OSS 120B
+    description: OpenAI's open-weight 120B flagship served on Groq's LPU hardware; strong general reasoning with strict structured output support
+    supports_structured_output: true
+  - id: llama-3.3-70b-versatile
+    display_name: Llama 3.3 70B Versatile
+    description: Meta's Llama 3.3 70B for general-purpose chat with parallel tool use
+  - id: llama-3.1-8b-instant
+    display_name: Llama 3.1 8B Instant
+    description: Small, very low-latency Llama model (~560 tok/s) with parallel tool use
--- a/application/core/models/huggingface.yaml
+++ b/application/core/models/huggingface.yaml
@@ -0,0 +1,7 @@
+provider: huggingface
+
+models:
+  - id: huggingface-local
+    display_name: Hugging Face Model
+    description: Local Hugging Face model
+    supports_tools: false
--- a/application/core/models/novita.yaml
+++ b/application/core/models/novita.yaml
@@ -0,0 +1,21 @@
+provider: novita
+defaults:
+  supports_tools: true
+  supports_structured_output: true
+
+models:
+  - id: deepseek/deepseek-v4-pro
+    display_name: DeepSeek V4 Pro
+    description: 1.6T MoE (49B active) with 1M context, hybrid CSA/HCA attention, top-tier reasoning and agentic coding
+    context_window: 1048576
+
+  - id: moonshotai/kimi-k2.6
+    display_name: Kimi K2.6
+    description: 1T-parameter open-weight MoE with native vision/video, multi-step tool calling, and agentic long-horizon execution
+    attachments: [image]
+    context_window: 262144
+
+  - id: zai-org/glm-5
+    display_name: GLM-5
+    description: Z.AI 754B-parameter MoE with strong general reasoning, function calling, and structured output
+    context_window: 202800
--- a/application/core/models/openai.yaml
+++ b/application/core/models/openai.yaml
@@ -0,0 +1,18 @@
+provider: openai
+defaults:
+  supports_tools: true
+  supports_structured_output: true
+  attachments: [image]
+  context_window: 400000
+
+models:
+  - id: gpt-5.5
+    display_name: GPT-5.5
+    description: Flagship frontier model for complex reasoning, coding, and agentic work with a 1M-token context window
+    context_window: 1050000
+  - id: gpt-5.4-mini
+    display_name: GPT-5.4 Mini
+    description: Cost-efficient GPT-5.4-class model for high-volume coding, computer use, and subagent workloads
+  - id: gpt-5.4-nano
+    display_name: GPT-5.4 Nano
+    description: Cheapest GPT-5.4-class model, optimized for simple high-volume tasks where speed and cost matter most
--- a/application/core/models/openrouter.yaml
+++ b/application/core/models/openrouter.yaml
@@ -0,0 +1,25 @@
+provider: openrouter
+defaults:
+  supports_tools: true
+  attachments: [image]
+  context_window: 128000
+
+models:
+  - id: qwen/qwen3-coder:free
+    display_name: Qwen3 Coder (free)
+    description: Free-tier 480B MoE coder model with strong agentic tool use; rate-limited
+    context_window: 262000
+    attachments: []
+
+  - id: deepseek/deepseek-v3.2
+    display_name: DeepSeek V3.2
+    description: Open-weights reasoning model, very low cost (~$0.25 in / $0.38 out per 1M)
+    context_window: 131072
+    attachments: []
+    supports_structured_output: true
+
+  - id: anthropic/claude-sonnet-4.6
+    display_name: Claude Sonnet 4.6 (via OpenRouter)
+    description: Frontier Sonnet-class model with 1M context, vision, and extended thinking
+    context_window: 1000000
+    supports_structured_output: true
--- a/application/core/mongo_db.py
+++ b/application/core/mongo_db.py
@@ -1,24 +0,0 @@
-from application.core.settings import settings
-from pymongo import MongoClient
-
-
-class MongoDB:
-    _client = None
-
-    @classmethod
-    def get_client(cls):
-        """
-        Get the MongoDB client instance, creating it if necessary.
-        """
-        if cls._client is None:
-            cls._client = MongoClient(settings.MONGO_URI)
-        return cls._client
-
-    @classmethod
-    def close_client(cls):
-        """
-        Close the MongoDB client connection.
-        """
-        if cls._client is not None:
-            cls._client.close()
-            cls._client = None
--- a/application/core/settings.py
+++ b/application/core/settings.py
@@ -23,16 +23,27 @@ class Settings(BaseSettings):
    EMBEDDINGS_NAME: str = "huggingface_sentence-transformers/all-mpnet-base-v2"
    EMBEDDINGS_BASE_URL: Optional[str] = None  # Remote embeddings API URL (OpenAI-compatible)
    EMBEDDINGS_KEY: Optional[str] = None  # api key for embeddings (if using openai, just copy API_KEY)
+    # Optional directory of operator-supplied model YAMLs, loaded after the
+    # built-in catalog under application/core/models/. Later wins on
+    # duplicate model id. See application/core/models/README.md.
+    MODELS_CONFIG_DIR: Optional[str] = None

    CELERY_BROKER_URL: str = "redis://localhost:6379/0"
    CELERY_RESULT_BACKEND: str = "redis://localhost:6379/1"
-    MONGO_URI: str = "mongodb://localhost:27017/docsgpt"
-    MONGO_DB_NAME: str = "docsgpt"
+    # Prefetch=1 caps SIGKILL loss to one task. Visibility timeout must exceed
+    # the longest legitimate task runtime (ingest, agent webhook) but stay
+    # short enough that SIGKILLed tasks redeliver promptly. 1h matches Onyx
+    # and Dify defaults; long ingests can override via env.
+    CELERY_WORKER_PREFETCH_MULTIPLIER: int = 1
+    CELERY_VISIBILITY_TIMEOUT: int = 3600
+    # Only consulted when VECTOR_STORE=mongodb or when running scripts/db/backfill.py; user data lives in Postgres.
+    MONGO_URI: Optional[str] = None
    # User-data Postgres DB.
    POSTGRES_URI: Optional[str] = None
-
-    # MongoDB→Postgres migration: dual-write to Postgres (Mongo stays source of truth)
-    USE_POSTGRES: bool = False
+    # On app startup, apply pending Alembic migrations. Default ON for dev; disable in prod if you manage schema out-of-band.
+    AUTO_MIGRATE: bool = True
+    # On app startup, create the target Postgres database if it's missing (requires CREATEDB privilege). Dev-friendly default.
+    AUTO_CREATE_DB: bool = True
    LLM_PATH: str = os.path.join(current_dir, "models/docsgpt-7b-f16.gguf")
    DEFAULT_MAX_HISTORY: int = 150
    DEFAULT_LLM_TOKEN_LIMIT: int = 128000  # Fallback when model not found in registry
@@ -148,6 +159,9 @@ class Settings(BaseSettings):

    FLASK_DEBUG_MODE: bool = False
    STORAGE_TYPE: str = "local"  # local or s3
+
+    # Anonymous startup version check for security issues.
+    VERSION_CHECK: bool = True
    URL_STRATEGY: str = "backend"  # backend or s3

    JWT_SECRET_KEY: str = ""
--- a/application/gunicorn_conf.py
+++ b/application/gunicorn_conf.py
@@ -0,0 +1,72 @@
+"""Gunicorn config — keeps uvicorn's access log in NCSA format."""
+
+from __future__ import annotations
+
+import logging
+import logging.config
+
+# NCSA common log format:
+#   %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
+# Uvicorn's access formatter exposes a ``client_addr``/``request_line``/
+# ``status_code`` trio but not the full NCSA field set, so we re-derive
+# what we can.
+_NCSA_FMT = (
+    '%(client_addr)s - - [%(asctime)s] "%(request_line)s" %(status_code)s'
+)
+
+logconfig_dict = {
+    "version": 1,
+    "disable_existing_loggers": False,
+    "formatters": {
+        "ncsa_access": {
+            "()": "uvicorn.logging.AccessFormatter",
+            "fmt": _NCSA_FMT,
+            "datefmt": "%d/%b/%Y:%H:%M:%S %z",
+            "use_colors": False,
+        },
+        "default": {
+            "format": "[%(asctime)s] [%(process)d] [%(levelname)s] %(name)s: %(message)s",
+        },
+    },
+    "handlers": {
+        "access": {
+            "class": "logging.StreamHandler",
+            "formatter": "ncsa_access",
+            "stream": "ext://sys.stdout",
+        },
+        "default": {
+            "class": "logging.StreamHandler",
+            "formatter": "default",
+            "stream": "ext://sys.stderr",
+        },
+    },
+    "loggers": {
+        "uvicorn": {"handlers": ["default"], "level": "INFO", "propagate": False},
+        "uvicorn.error": {
+            "handlers": ["default"],
+            "level": "INFO",
+            "propagate": False,
+        },
+        "uvicorn.access": {
+            "handlers": ["access"],
+            "level": "INFO",
+            "propagate": False,
+        },
+        "gunicorn.error": {
+            "handlers": ["default"],
+            "level": "INFO",
+            "propagate": False,
+        },
+        "gunicorn.access": {
+            "handlers": ["access"],
+            "level": "INFO",
+            "propagate": False,
+        },
+    },
+    "root": {"handlers": ["default"], "level": "INFO"},
+}
+
+
+def on_starting(server):  # pragma: no cover — gunicorn hook
+    """Ensure gunicorn's own loggers use the configured handlers."""
+    logging.config.dictConfig(logconfig_dict)
--- a/application/llm/anthropic.py
+++ b/application/llm/anthropic.py
@@ -11,6 +11,7 @@ logger = logging.getLogger(__name__)


 class AnthropicLLM(BaseLLM):
+    provider_name = "anthropic"

    def __init__(self, api_key=None, user_api_key=None, base_url=None, *args, **kwargs):

--- a/application/llm/base.py
+++ b/application/llm/base.py
@@ -1,5 +1,6 @@
 import logging
 from abc import ABC, abstractmethod
+from typing import ClassVar

 from application.cache import gen_cache, stream_cache

@@ -10,6 +11,10 @@ logger = logging.getLogger(__name__)


 class BaseLLM(ABC):
+    # Stamped onto the ``llm_stream_start`` event so dashboards can group
+    # calls by vendor. Subclasses override.
+    provider_name: ClassVar[str] = "unknown"
+
    def __init__(
        self,
        decoded_token=None,
@@ -17,6 +22,8 @@ class BaseLLM(ABC):
        model_id=None,
        base_url=None,
        backup_models=None,
+        model_user_id=None,
+        capabilities=None,
    ):
        self.decoded_token = decoded_token
        self.agent_id = str(agent_id) if agent_id else None
@@ -25,6 +32,12 @@ class BaseLLM(ABC):
        self.token_usage = {"prompt_tokens": 0, "generated_tokens": 0}
        self._backup_models = backup_models or []
        self._fallback_llm = None
+        # Registry-resolved per-model capability overrides (BYOM caps,
+        # operator YAML). None falls back to provider-class defaults.
+        self.capabilities = capabilities
+        # BYOM-resolution scope captured at LLM creation time so backup
+        # / fallback lookups hit the same per-user layer as the primary.
+        self.model_user_id = model_user_id

    @property
    def fallback_llm(self):
@@ -39,10 +52,19 @@ class BaseLLM(ABC):
            get_api_key_for_provider,
        )

-        # Try per-agent backup models first
+        # model_user_id (BYOM scope) takes precedence over the caller's
+        # sub so shared-agent backups resolve under the owner's layer.
+        caller_sub = (
+            self.decoded_token.get("sub")
+            if isinstance(self.decoded_token, dict)
+            else None
+        )
+        backup_user_id = self.model_user_id or caller_sub
        for backup_model_id in self._backup_models:
            try:
-                provider = get_provider_from_model_id(backup_model_id)
+                provider = get_provider_from_model_id(
+                    backup_model_id, user_id=backup_user_id
+                )
                if not provider:
                    logger.warning(
                        f"Could not resolve provider for backup model: {backup_model_id}"
@@ -56,6 +78,15 @@ class BaseLLM(ABC):
                    decoded_token=self.decoded_token,
                    model_id=backup_model_id,
                    agent_id=self.agent_id,
+                    model_user_id=self.model_user_id,
+                )
+                # Tag the fallback LLM so its rows land as
+                # ``source='fallback'`` in cost-attribution dashboards.
+                # Propagate the parent's ``_request_id`` so a user
+                # request that ran fallback is still grouped under one id.
+                self._fallback_llm._token_usage_source = "fallback"
+                self._fallback_llm._request_id = getattr(
+                    self, "_request_id", None,
                )
                logger.info(
                    f"Fallback LLM initialized from agent backup model: "
@@ -68,7 +99,10 @@ class BaseLLM(ABC):
                )
                continue

-        # Fall back to global FALLBACK_* settings
+        # Fall back to global FALLBACK_* settings. Forward
+        # ``model_user_id`` here too: deployments can configure
+        # ``FALLBACK_LLM_NAME`` to a BYOM UUID, and that UUID is owned
+        # by the same user the primary model was resolved under.
        if settings.FALLBACK_LLM_PROVIDER:
            try:
                self._fallback_llm = LLMCreator.create_llm(
@@ -78,6 +112,12 @@ class BaseLLM(ABC):
                    decoded_token=self.decoded_token,
                    model_id=settings.FALLBACK_LLM_NAME,
                    agent_id=self.agent_id,
+                    model_user_id=self.model_user_id,
+                )
+                # Same rationale as the agent-backup branch.
+                self._fallback_llm._token_usage_source = "fallback"
+                self._fallback_llm._request_id = getattr(
+                    self, "_request_id", None,
                )
                logger.info(
                    f"Fallback LLM initialized from global settings: "
@@ -96,6 +136,26 @@ class BaseLLM(ABC):
            return args_dict
        return {k: v for k, v in args_dict.items() if v is not None}

+    @staticmethod
+    def _is_non_retriable_client_error(exc: BaseException) -> bool:
+        """4xx errors mean the request itself is malformed — retrying with
+        a different model fails identically and doubles the work. Only
+        transient/5xx/connection errors should trigger fallback."""
+        try:
+            from google.genai.errors import ClientError as _GenaiClientError
+
+            if isinstance(exc, _GenaiClientError):
+                return True
+        except ImportError:
+            pass
+        for attr in ("status_code", "code", "http_status"):
+            v = getattr(exc, attr, None)
+            if isinstance(v, int) and 400 <= v < 500:
+                return True
+        resp = getattr(exc, "response", None)
+        v = getattr(resp, "status_code", None)
+        return isinstance(v, int) and 400 <= v < 500
+
    def _execute_with_fallback(
        self, method_name: str, decorators: list, *args, **kwargs
    ):
@@ -119,12 +179,18 @@ class BaseLLM(ABC):

        if is_stream:
            return self._stream_with_fallback(
-                decorated_method, method_name, *args, **kwargs
+                decorated_method, method_name, decorators, *args, **kwargs
            )

        try:
            return decorated_method()
        except Exception as e:
+            if self._is_non_retriable_client_error(e):
+                logger.error(
+                    f"Primary LLM failed with non-retriable client error; "
+                    f"skipping fallback: {str(e)}"
+                )
+                raise
            if not self.fallback_llm:
                logger.error(f"Primary LLM failed and no fallback configured: {str(e)}")
                raise
@@ -134,14 +200,27 @@ class BaseLLM(ABC):
                f"{fallback.model_id}. Error: {str(e)}"
            )

-            fallback_method = getattr(
-                fallback, method_name.replace("_raw_", "")
-            )
+            # Apply decorators to fallback's raw method directly — calling
+            # fallback.gen() would re-enter the orchestrator and recurse via
+            # fallback.fallback_llm.
+            fallback_method = getattr(fallback, method_name)
+            for decorator in decorators:
+                fallback_method = decorator(fallback_method)
            fallback_kwargs = {**kwargs, "model": fallback.model_id}
-            return fallback_method(*args, **fallback_kwargs)
+            try:
+                return fallback_method(fallback, *args, **fallback_kwargs)
+            except Exception as e2:
+                if self._is_non_retriable_client_error(e2):
+                    logger.error(
+                        f"Fallback LLM failed with non-retriable client "
+                        f"error; giving up: {str(e2)}"
+                    )
+                else:
+                    logger.error(f"Fallback LLM also failed; giving up: {str(e2)}")
+                raise

    def _stream_with_fallback(
-        self, decorated_method, method_name, *args, **kwargs
+        self, decorated_method, method_name, decorators, *args, **kwargs
    ):
        """
        Wrapper generator that catches mid-stream errors and falls back.
@@ -154,6 +233,12 @@ class BaseLLM(ABC):
        try:
            yield from decorated_method()
        except Exception as e:
+            if self._is_non_retriable_client_error(e):
+                logger.error(
+                    f"Primary LLM failed mid-stream with non-retriable client "
+                    f"error; skipping fallback: {str(e)}"
+                )
+                raise
            if not self.fallback_llm:
                logger.error(
                    f"Primary LLM failed and no fallback configured: {str(e)}"
@@ -164,11 +249,37 @@ class BaseLLM(ABC):
                f"Primary LLM failed mid-stream. Falling back to "
                f"{fallback.model_id}. Error: {str(e)}"
            )
-            fallback_method = getattr(
-                fallback, method_name.replace("_raw_", "")
+            # Apply decorators to fallback's raw stream method directly —
+            # calling fallback.gen_stream() would re-enter the orchestrator
+            # and recurse via fallback.fallback_llm. Emit the stream-start
+            # event manually so dashboards still see the fallback's
+            # provider/model when the response actually comes from it.
+            fallback._emit_stream_start_log(
+                fallback.model_id,
+                kwargs.get("messages"),
+                kwargs.get("tools"),
+                bool(
+                    kwargs.get("_usage_attachments")
+                    or kwargs.get("attachments")
+                ),
            )
+            fallback_method = getattr(fallback, method_name)
+            for decorator in decorators:
+                fallback_method = decorator(fallback_method)
            fallback_kwargs = {**kwargs, "model": fallback.model_id}
-            yield from fallback_method(*args, **fallback_kwargs)
+            try:
+                yield from fallback_method(fallback, *args, **fallback_kwargs)
+            except Exception as e2:
+                if self._is_non_retriable_client_error(e2):
+                    logger.error(
+                        f"Fallback LLM failed mid-stream with non-retriable "
+                        f"client error; giving up: {str(e2)}"
+                    )
+                else:
+                    logger.error(
+                        f"Fallback LLM also failed mid-stream; giving up: {str(e2)}"
+                    )
+                raise

    def gen(self, model, messages, stream=False, tools=None, *args, **kwargs):
        decorators = [gen_token_usage, gen_cache]
@@ -183,7 +294,58 @@ class BaseLLM(ABC):
            **kwargs,
        )

+    def _emit_stream_start_log(self, model, messages, tools, has_attachments):
+        # Stamped with ``self.provider_name`` so dashboards can group calls
+        # by vendor; the fallback path emits its own copy on the fallback
+        # instance so the actual responding provider is recorded.
+        logging.info(
+            "llm_stream_start",
+            extra={
+                "model": model,
+                "provider": self.provider_name,
+                "message_count": len(messages) if messages is not None else 0,
+                "has_attachments": bool(has_attachments),
+                "has_tools": bool(tools),
+            },
+        )
+
+    def _emit_stream_finished_log(
+        self,
+        model,
+        *,
+        prompt_tokens,
+        completion_tokens,
+        latency_ms,
+        cached_tokens=None,
+        error=None,
+    ):
+        # Paired with ``llm_stream_start`` so cost dashboards can sum tokens
+        # by user/agent/provider. Token counts are client-side estimates
+        # from ``stream_token_usage``; vendor-reported counts (incl.
+        # ``cached_tokens`` for prompt caching) require per-provider
+        # extraction in each ``_raw_gen_stream`` and aren't wired yet.
+        extra = {
+            "model": model,
+            "provider": self.provider_name,
+            "prompt_tokens": int(prompt_tokens),
+            "completion_tokens": int(completion_tokens),
+            "latency_ms": int(latency_ms),
+            "status": "error" if error is not None else "ok",
+        }
+        if cached_tokens is not None:
+            extra["cached_tokens"] = int(cached_tokens)
+        if error is not None:
+            extra["error_class"] = type(error).__name__
+        logging.info("llm_stream_finished", extra=extra)
+
    def gen_stream(self, model, messages, stream=True, tools=None, *args, **kwargs):
+        # Attachments arrive as ``_usage_attachments`` from ``Agent._llm_gen``;
+        # the ``stream_token_usage`` decorator pops that key, but the log
+        # fires before the decorator runs so it's still in ``kwargs`` here.
+        has_attachments = bool(
+            kwargs.get("_usage_attachments") or kwargs.get("attachments")
+        )
+        self._emit_stream_start_log(model, messages, tools, has_attachments)
        decorators = [stream_cache, stream_token_usage]
        return self._execute_with_fallback(
            "_raw_gen_stream",
--- a/application/llm/docsgpt_provider.py
+++ b/application/llm/docsgpt_provider.py
@@ -6,6 +6,8 @@ DOCSGPT_BASE_URL = "https://oai.arc53.com"
 DOCSGPT_MODEL = "docsgpt"

 class DocsGPTAPILLM(OpenAILLM):
+    provider_name = "docsgpt"
+
    def __init__(self, api_key=None, user_api_key=None, base_url=None, *args, **kwargs):
        super().__init__(
            api_key=DOCSGPT_API_KEY,
--- a/application/llm/google_ai.py
+++ b/application/llm/google_ai.py
@@ -6,10 +6,13 @@ from google.genai import types
 from application.core.settings import settings

 from application.llm.base import BaseLLM
+from application.llm.handlers.google import _decode_thought_signature
 from application.storage.storage_creator import StorageCreator


 class GoogleLLM(BaseLLM):
+    provider_name = "google"
+
    def __init__(
        self, api_key=None, user_api_key=None, decoded_token=None, *args, **kwargs
    ):
@@ -79,24 +82,39 @@ class GoogleLLM(BaseLLM):
        for attachment in attachments:
            mime_type = attachment.get("mime_type")

-            if mime_type in self.get_supported_attachment_types():
-                try:
+            if mime_type not in self.get_supported_attachment_types():
+                continue
+            try:
+                # Images go inline as bytes per Google's guidance for
+                # requests under 20MB; the Files API can return before
+                # the upload reaches ACTIVE state and yield an empty URI.
+                if mime_type.startswith("image/"):
+                    file_bytes = self._read_attachment_bytes(attachment)
+                    files.append(
+                        {"file_bytes": file_bytes, "mime_type": mime_type}
+                    )
+                else:
                    file_uri = self._upload_file_to_google(attachment)
+                    if not file_uri:
+                        raise ValueError(
+                            f"Google Files API returned empty URI for "
+                            f"{attachment.get('path', 'unknown')}"
+                        )
                    logging.info(
                        f"GoogleLLM: Successfully uploaded file, got URI: {file_uri}"
                    )
                    files.append({"file_uri": file_uri, "mime_type": mime_type})
-                except Exception as e:
-                    logging.error(
-                        f"GoogleLLM: Error uploading file: {e}", exc_info=True
+            except Exception as e:
+                logging.error(
+                    f"GoogleLLM: Error processing attachment: {e}", exc_info=True
+                )
+                if "content" in attachment:
+                    prepared_messages[user_message_index]["content"].append(
+                        {
+                            "type": "text",
+                            "text": f"[File could not be processed: {attachment.get('path', 'unknown')}]",
+                        }
                    )
-                    if "content" in attachment:
-                        prepared_messages[user_message_index]["content"].append(
-                            {
-                                "type": "text",
-                                "text": f"[File could not be processed: {attachment.get('path', 'unknown')}]",
-                            }
-                        )
        if files:
            logging.info(f"GoogleLLM: Adding {len(files)} files to message")
            prepared_messages[user_message_index]["content"].append({"files": files})
@@ -112,7 +130,9 @@ class GoogleLLM(BaseLLM):
        Returns:
            str: Google AI file URI for the uploaded file.
        """
-        if "google_file_uri" in attachment:
+        # Truthy check, not membership: a poisoned cache row of "" or
+        # None must be treated as a miss and trigger a fresh upload.
+        if attachment.get("google_file_uri"):
            return attachment["google_file_uri"]
        file_path = attachment.get("path")
        if not file_path:
@@ -126,21 +146,63 @@ class GoogleLLM(BaseLLM):
                    file=local_path
                ).uri,
            )
-
-            from application.core.mongo_db import MongoDB
-
-            mongo = MongoDB.get_client()
-            db = mongo[settings.MONGO_DB_NAME]
-            attachments_collection = db["attachments"]
-            if "_id" in attachment:
-                attachments_collection.update_one(
-                    {"_id": attachment["_id"]}, {"$set": {"google_file_uri": file_uri}}
+            if not file_uri:
+                raise ValueError(
+                    f"Google Files API upload returned empty URI for {file_path}"
                )
+
+            # Cache the Google file URI on the attachment row so we don't
+            # re-upload on the next LLM call. Accept either a PG UUID
+            # (``id``) or a legacy Mongo ObjectId (``_id``). Opened per
+            # write — this runs mid-LLM-call, so we don't wrap the
+            # surrounding generator in a long-lived session.
+            attachment_id = attachment.get("id") or attachment.get("_id")
+            if attachment_id:
+                user_id = None
+                decoded = getattr(self, "decoded_token", None)
+                if isinstance(decoded, dict):
+                    user_id = decoded.get("sub")
+                from application.storage.db.repositories.attachments import (
+                    AttachmentsRepository,
+                )
+                from application.storage.db.session import db_session
+
+                try:
+                    with db_session() as conn:
+                        AttachmentsRepository(conn).update_any(
+                            str(attachment_id),
+                            user_id,
+                            {"google_file_uri": file_uri},
+                        )
+                except Exception as cache_err:
+                    logging.warning(
+                        f"Failed to cache google_file_uri on attachment {attachment_id}: {cache_err}"
+                    )
            return file_uri
        except Exception as e:
            logging.error(f"Error uploading file to Google AI: {e}", exc_info=True)
            raise

+    def _read_attachment_bytes(self, attachment):
+        """
+        Read attachment bytes from storage for inline transmission.
+
+        Args:
+            attachment (dict): Attachment dictionary with path and metadata.
+
+        Returns:
+            bytes: Raw file bytes.
+        """
+        file_path = attachment.get("path")
+        if not file_path:
+            raise ValueError("No file path provided in attachment")
+        if not self.storage.file_exists(file_path):
+            raise FileNotFoundError(f"File not found: {file_path}")
+        return self.storage.process_file(
+            file_path,
+            lambda local_path, **kwargs: open(local_path, "rb").read(),
+        )
+
    def _clean_messages_google(self, messages):
        """
        Convert OpenAI format messages to Google AI format and collect system prompts.
@@ -197,7 +259,7 @@ class GoogleLLM(BaseLLM):
                        except (_json.JSONDecodeError, TypeError):
                            args = {}
                    cleaned_args = self._remove_null_values(args)
-                    thought_sig = tc.get("thought_signature")
+                    thought_sig = _decode_thought_signature(tc.get("thought_signature"))
                    if thought_sig:
                        parts.append(
                            types.Part(
@@ -261,7 +323,9 @@ class GoogleLLM(BaseLLM):
                                            name=item["function_call"]["name"],
                                            args=cleaned_args,
                                        ),
-                                        thoughtSignature=item["thought_signature"],
+                                        thoughtSignature=_decode_thought_signature(
+                                            item["thought_signature"]
+                                        ),
                                    )
                                )
                            else:
@@ -280,12 +344,24 @@ class GoogleLLM(BaseLLM):
                            )
                        elif "files" in item:
                            for file_data in item["files"]:
-                                parts.append(
-                                    types.Part.from_uri(
-                                        file_uri=file_data["file_uri"],
-                                        mime_type=file_data["mime_type"],
+                                if "file_bytes" in file_data:
+                                    parts.append(
+                                        types.Part.from_bytes(
+                                            data=file_data["file_bytes"],
+                                            mime_type=file_data["mime_type"],
+                                        )
+                                    )
+                                elif file_data.get("file_uri"):
+                                    parts.append(
+                                        types.Part.from_uri(
+                                            file_uri=file_data["file_uri"],
+                                            mime_type=file_data["mime_type"],
+                                        )
+                                    )
+                                else:
+                                    logging.warning(
+                                        "GoogleLLM: dropping file part with empty URI and no bytes"
                                    )
-                                )
                        else:
                            raise ValueError(
                                f"Unexpected content dictionary format:{item}"
@@ -523,22 +599,6 @@ class GoogleLLM(BaseLLM):
            config.response_mime_type = "application/json"
        # Check if we have both tools and file attachments

-        has_attachments = False
-        for message in messages:
-            for part in message.parts:
-                if hasattr(part, "file_data") and part.file_data is not None:
-                    has_attachments = True
-                    break
-            if has_attachments:
-                break
-        messages_summary = self._summarize_messages_for_log(messages)
-        logging.info(
-            "GoogleLLM: Starting stream generation. Model: %s, Messages: %s, Has attachments: %s",
-            model,
-            messages_summary,
-            has_attachments,
-        )
-
        response = client.models.generate_content_stream(
            model=model,
            contents=messages,
--- a/application/llm/groq.py
+++ b/application/llm/groq.py
@@ -5,6 +5,8 @@ GROQ_BASE_URL = "https://api.groq.com/openai/v1"


 class GroqLLM(OpenAILLM):
+    provider_name = "groq"
+
    def __init__(self, api_key=None, user_api_key=None, base_url=None, *args, **kwargs):
        super().__init__(
            api_key=api_key or settings.GROQ_API_KEY or settings.API_KEY,
--- a/application/llm/handlers/base.py
+++ b/application/llm/handlers/base.py
@@ -10,6 +10,18 @@ from application.logging import build_stack_data
 logger = logging.getLogger(__name__)


+# Cap the agent tool-call loop. Without this an LLM that keeps
+# requesting more tool calls (preview models, sparse tool results,
+# under-specified prompts) can chain searches indefinitely and the
+# stream never finalises. 25 mirrors Dify's default.
+MAX_TOOL_ITERATIONS = 25
+_FINALIZE_INSTRUCTION = (
+    f"You have made {MAX_TOOL_ITERATIONS} tool calls. Provide a final "
+    "response to the user based on what you have, without making any "
+    "additional tool calls."
+)
+
+
@dataclass
 class ToolCall:
    """Represents a tool/function call from the LLM."""
@@ -280,7 +292,26 @@ class LLMHandler(ABC):
                        # Keep serialized function calls/responses so the compressor sees actions
                        parts_text.append(str(item))
                    elif "files" in item:
-                        parts_text.append(str(item))
+                        # Image attachments arrive with raw bytes / base64
+                        # inline (see GoogleLLM.prepare_messages_with_attachments).
+                        # ``str(item)`` would dump the whole byte/base64
+                        # blob into the compression prompt and bust the
+                        # compression LLM's input limit.
+                        files = item.get("files") or []
+                        descriptors = []
+                        if isinstance(files, list):
+                            for f in files:
+                                if isinstance(f, dict):
+                                    descriptors.append(
+                                        f.get("mime_type") or "file"
+                                    )
+                                elif isinstance(f, str):
+                                    descriptors.append(f)
+                        if not descriptors:
+                            descriptors = ["file"]
+                        parts_text.append(
+                            f"[attachment: {', '.join(descriptors)}]"
+                        )
            return "\n".join(parts_text)
        return ""

@@ -470,10 +501,14 @@ class LLMHandler(ABC):
                )
                return self._perform_in_memory_compression(agent, messages)

-            # Use orchestrator to perform compression
+            # Use orchestrator to perform compression. ``model_user_id``
+            # keeps BYOM registry resolution scoped to the model owner
+            # (shared-agent dispatch) while ``user_id`` stays the caller
+            # for the conversation access check.
            result = orchestrator.compress_mid_execution(
                conversation_id=agent.conversation_id,
                user_id=agent.initial_user_id,
+                model_user_id=getattr(agent, "model_user_id", None),
                model_id=agent.model_id,
                decoded_token=getattr(agent, "decoded_token", {}),
                current_conversation=conversation,
@@ -577,7 +612,20 @@ class LLMHandler(ABC):
                if settings.COMPRESSION_MODEL_OVERRIDE
                else agent.model_id
            )
-            provider = get_provider_from_model_id(compression_model)
+            agent_decoded = getattr(agent, "decoded_token", None)
+            caller_sub = (
+                agent_decoded.get("sub")
+                if isinstance(agent_decoded, dict)
+                else None
+            )
+            # Use model-owner scope (mirrors orchestrator path) so
+            # shared-agent owner-BYOM resolves under the owner's layer.
+            compression_user_id = (
+                getattr(agent, "model_user_id", None) or caller_sub
+            )
+            provider = get_provider_from_model_id(
+                compression_model, user_id=compression_user_id
+            )
            api_key = get_api_key_for_provider(provider)
            compression_llm = LLMCreator.create_llm(
                provider,
@@ -586,7 +634,12 @@ class LLMHandler(ABC):
                getattr(agent, "decoded_token", None),
                model_id=compression_model,
                agent_id=getattr(agent, "agent_id", None),
+                model_user_id=compression_user_id,
            )
+            # Side-channel LLM tag — see ``orchestrator.py`` for rationale.
+            compression_llm._token_usage_source = "compression"
+            compression_llm._request_id = getattr(agent, "_request_id", None) \
+                or getattr(getattr(agent, "llm", None), "_request_id", None)

            # Create service without DB persistence capability
            compression_service = CompressionService(
@@ -897,7 +950,9 @@ class LLMHandler(ABC):
        parsed = self.parse_response(response)
        self.llm_calls.append(build_stack_data(agent.llm))

+        iteration = 0
        while parsed.requires_tool_call:
+            iteration += 1
            tool_handler_gen = self.handle_tool_calls(
                agent, parsed.tool_calls, tools_dict, messages
            )
@@ -921,15 +976,46 @@ class LLMHandler(ABC):
                }
                return ""

+            # Cap reached: force one final tool-less call so the stream
+            # always ends with content rather than cutting off.
+            if iteration >= MAX_TOOL_ITERATIONS:
+                logger.warning(
+                    "agent tool loop hit cap (%d); forcing finalize",
+                    MAX_TOOL_ITERATIONS,
+                )
+                messages.append(
+                    {"role": "system", "content": _FINALIZE_INSTRUCTION},
+                )
+                response = agent.llm.gen(
+                    model=getattr(agent.llm, "model_id", None) or agent.model_id,
+                    messages=messages,
+                    tools=None,
+                )
+                parsed = self.parse_response(response)
+                self.llm_calls.append(build_stack_data(agent.llm))
+                break
+
+            # ``agent.model_id`` is the registry id (a UUID for BYOM
+            # records). Use the LLM's own model_id, which LLMCreator
+            # already resolved to the upstream model name. Built-ins:
+            # the two are equal; BYOM: the upstream name like
+            # "mistral-large-latest" instead of the UUID.
            response = agent.llm.gen(
-                model=agent.model_id, messages=messages, tools=agent.tools
+                model=getattr(agent.llm, "model_id", None) or agent.model_id,
+                messages=messages,
+                tools=agent.tools,
            )
            parsed = self.parse_response(response)
            self.llm_calls.append(build_stack_data(agent.llm))
        return parsed.content

    def handle_streaming(
-        self, agent, response: Any, tools_dict: Dict, messages: List[Dict]
+        self,
+        agent,
+        response: Any,
+        tools_dict: Dict,
+        messages: List[Dict],
+        _iteration: int = 0,
    ) -> Generator:
        """
        Handle streaming response flow.
@@ -998,6 +1084,9 @@ class LLMHandler(ABC):
                    }
                    return

+                next_iteration = _iteration + 1
+                cap_reached = next_iteration >= MAX_TOOL_ITERATIONS
+
                # Check if context limit was reached during tool execution
                if hasattr(agent, 'context_limit_reached') and agent.context_limit_reached:
                    # Add system message warning about context limit
@@ -1010,13 +1099,32 @@ class LLMHandler(ABC):
                        )
                    })
                    logger.info("Context limit reached - instructing agent to wrap up")
+                elif cap_reached:
+                    logger.warning(
+                        "agent tool loop hit cap (%d); forcing finalize",
+                        MAX_TOOL_ITERATIONS,
+                    )
+                    messages.append(
+                        {"role": "system", "content": _FINALIZE_INSTRUCTION},
+                    )

+                # See note above on agent.model_id vs llm.model_id.
                response = agent.llm.gen_stream(
-                    model=agent.model_id, messages=messages, tools=agent.tools if not agent.context_limit_reached else None
+                    model=getattr(agent.llm, "model_id", None) or agent.model_id,
+                    messages=messages,
+                    tools=(
+                        None
+                        if cap_reached
+                        or getattr(agent, "context_limit_reached", False)
+                        else agent.tools
+                    ),
                )
                self.llm_calls.append(build_stack_data(agent.llm))

-                yield from self.handle_streaming(agent, response, tools_dict, messages)
+                yield from self.handle_streaming(
+                    agent, response, tools_dict, messages,
+                    _iteration=next_iteration,
+                )
                return
            if parsed.content:
                buffer += parsed.content
--- a/application/llm/handlers/google.py
+++ b/application/llm/handlers/google.py
@@ -1,9 +1,35 @@
+import base64
+import binascii
 import uuid
-from typing import Any, Dict, Generator
+from typing import Any, Dict, Generator, Optional, Union

 from application.llm.handlers.base import LLMHandler, LLMResponse, ToolCall


+def _encode_thought_signature(sig: Optional[Union[bytes, str]]) -> Optional[str]:
+    # Gemini's Python SDK returns thought_signature as raw bytes, but the
+    # field is typed Optional[str] downstream and gets json.dumps'd into
+    # SSE events. Encode once at ingress so callers only ever see a str.
+    if isinstance(sig, bytes):
+        return base64.b64encode(sig).decode("ascii")
+    return sig
+
+
+def _decode_thought_signature(
+    sig: Optional[Union[bytes, str]],
+) -> Optional[Union[bytes, str]]:
+    # Reverse of _encode_thought_signature — Gemini's SDK expects bytes
+    # back when we replay a tool call. ``validate=True`` keeps ASCII
+    # strings that happen to be loosely decodable from being silently
+    # turned into bytes; non-base64 inputs pass through unchanged.
+    if isinstance(sig, str):
+        try:
+            return base64.b64decode(sig.encode("ascii"), validate=True)
+        except (binascii.Error, ValueError):
+            return sig
+    return sig
+
+
 class GoogleLLMHandler(LLMHandler):
    """Handler for Google's GenAI API."""

@@ -23,7 +49,7 @@ class GoogleLLMHandler(LLMHandler):
            for idx, part in enumerate(parts):
                if hasattr(part, "function_call") and part.function_call is not None:
                    has_sig = hasattr(part, "thought_signature") and part.thought_signature is not None
-                    thought_sig = part.thought_signature if has_sig else None
+                    thought_sig = _encode_thought_signature(part.thought_signature) if has_sig else None
                    tool_calls.append(
                        ToolCall(
                            id=str(uuid.uuid4()),
@@ -50,7 +76,7 @@ class GoogleLLMHandler(LLMHandler):
            tool_calls = []
            if hasattr(response, "function_call") and response.function_call is not None:
                has_sig = hasattr(response, "thought_signature") and response.thought_signature is not None
-                thought_sig = response.thought_signature if has_sig else None
+                thought_sig = _encode_thought_signature(response.thought_signature) if has_sig else None
                tool_calls.append(
                    ToolCall(
                        id=str(uuid.uuid4()),
@@ -70,8 +96,15 @@ class GoogleLLMHandler(LLMHandler):
        """Create a tool result message in the standard internal format."""
        import json as _json

+        from application.storage.db.serialization import PGNativeJSONEncoder
+
+        # PostgresTool results commonly include PG-native types
+        # (datetime / UUID / Decimal / bytea) when SELECT touches
+        # timestamptz / numeric / uuid / bytea columns. The shared
+        # encoder handles all five — bytes get base64 (lossless) instead
+        # of the ``str(b'...')`` repr that ``default=str`` would emit.
        content = (
-            _json.dumps(result)
+            _json.dumps(result, cls=PGNativeJSONEncoder)
            if not isinstance(result, str)
            else result
        )
--- a/application/llm/handlers/openai.py
+++ b/application/llm/handlers/openai.py
@@ -40,8 +40,15 @@ class OpenAILLMHandler(LLMHandler):
        """Create a tool result message in the standard internal format."""
        import json as _json

+        from application.storage.db.serialization import PGNativeJSONEncoder
+
+        # PostgresTool results commonly include PG-native types
+        # (datetime / UUID / Decimal / bytea) when SELECT touches
+        # timestamptz / numeric / uuid / bytea columns. The shared
+        # encoder handles all five — bytes get base64 (lossless) instead
+        # of the ``str(b'...')`` repr that ``default=str`` would emit.
        content = (
-            _json.dumps(result)
+            _json.dumps(result, cls=PGNativeJSONEncoder)
            if not isinstance(result, str)
            else result
        )
--- a/application/llm/llama_cpp.py
+++ b/application/llm/llama_cpp.py
@@ -26,6 +26,8 @@ class LlamaSingleton:


 class LlamaCpp(BaseLLM):
+    provider_name = "llama_cpp"
+
    def __init__(
        self,
        api_key=None,
--- a/application/llm/llm_creator.py
+++ b/application/llm/llm_creator.py
@@ -1,34 +1,11 @@
 import logging

-from application.llm.anthropic import AnthropicLLM
-from application.llm.docsgpt_provider import DocsGPTAPILLM
-from application.llm.google_ai import GoogleLLM
-from application.llm.groq import GroqLLM
-from application.llm.llama_cpp import LlamaCpp
-from application.llm.novita import NovitaLLM
-from application.llm.openai import AzureOpenAILLM, OpenAILLM
-from application.llm.premai import PremAILLM
-from application.llm.sagemaker import SagemakerAPILLM
-from application.llm.open_router import OpenRouterLLM
+from application.llm.providers import PROVIDERS_BY_NAME

 logger = logging.getLogger(__name__)


 class LLMCreator:
-    llms = {
-        "openai": OpenAILLM,
-        "azure_openai": AzureOpenAILLM,
-        "sagemaker": SagemakerAPILLM,
-        "llama.cpp": LlamaCpp,
-        "anthropic": AnthropicLLM,
-        "docsgpt": DocsGPTAPILLM,
-        "premai": PremAILLM,
-        "groq": GroqLLM,
-        "google": GoogleLLM,
-        "novita": NovitaLLM,
-        "openrouter": OpenRouterLLM,
-    }
-
    @classmethod
    def create_llm(
        cls,
@@ -39,28 +16,111 @@ class LLMCreator:
        model_id=None,
        agent_id=None,
        backup_models=None,
+        model_user_id=None,
        *args,
        **kwargs,
    ):
-        from application.core.model_utils import get_base_url_for_model
+        """Construct an LLM for the given provider ``type``.

-        llm_class = cls.llms.get(type.lower())
-        if not llm_class:
+        ``model_user_id`` is the BYOM-resolution scope. Defaults to
+        ``decoded_token['sub']`` (the caller). Pass it explicitly when
+        the model record belongs to a *different* user — most notably
+        for shared-agent dispatch, where the agent's stored
+        ``default_model_id`` is the owner's BYOM UUID but
+        ``decoded_token`` represents the caller.
+        """
+        from application.core.model_registry import ModelRegistry
+        from application.security.safe_url import (
+            UnsafeUserUrlError,
+            pinned_httpx_client,
+            validate_user_base_url,
+        )
+
+        plugin = PROVIDERS_BY_NAME.get(type.lower())
+        if plugin is None or plugin.llm_class is None:
            raise ValueError(f"No LLM class found for type {type}")

-        # Extract base_url from model configuration if model_id is provided
+        # Prefer per-model endpoint config from the registry. This is what
+        # makes openai_compatible AND end-user BYOM work without changing
+        # every call site: if the registered AvailableModel carries its
+        # own api_key / base_url, they win over whatever the caller
+        # resolved via the provider plugin.
+        #
+        # End-user BYOM lookups need the user_id from decoded_token to
+        # find the user's per-user models layer (built-in models resolve
+        # without it, so this stays back-compat).
        base_url = None
+        upstream_model_id = model_id
+        capabilities = None
        if model_id:
-            base_url = get_base_url_for_model(model_id)
+            user_id = model_user_id
+            if user_id is None:
+                user_id = (
+                    (decoded_token or {}).get("sub") if decoded_token else None
+                )
+            model = ModelRegistry.get_instance().get_model(model_id, user_id=user_id)
+            if model is not None:
+                # Forward registry caps so the LLM enforces them at
+                # dispatch (built-in classes hard-code True otherwise).
+                capabilities = getattr(model, "capabilities", None)
+                # SECURITY: refuse user-source dispatch without its own
+                # api_key (would leak settings.API_KEY to base_url).
+                if (
+                    getattr(model, "source", "builtin") == "user"
+                    and not model.api_key
+                ):
+                    raise ValueError(
+                        f"Custom model {model_id!r} has no usable API key "
+                        "(decryption may have failed). Re-save the model "
+                        "in settings to dispatch it."
+                    )
+                if model.api_key:
+                    api_key = model.api_key
+                if model.base_url:
+                    base_url = model.base_url
+                # For BYOM the registry id is a UUID; the upstream API
+                # call needs the user's typed model name instead.
+                if model.upstream_model_id:
+                    upstream_model_id = model.upstream_model_id

-        return llm_class(
+                # SECURITY: re-validate at dispatch (defense in depth
+                # for pre-guard rows / YAML-supplied entries). The
+                # pinned httpx.Client below is what actually closes the
+                # DNS-rebinding TOCTOU window.
+                if base_url and getattr(model, "source", "builtin") == "user":
+                    try:
+                        validate_user_base_url(base_url)
+                    except UnsafeUserUrlError as e:
+                        raise ValueError(
+                            f"Refusing to dispatch model {model_id!r}: {e}"
+                        ) from e
+                    # Pinned httpx.Client: resolves once, validates, and
+                    # binds the SDK's outbound socket to the validated IP
+                    # (preserves Host / SNI). Future BYOM providers must
+                    # opt in explicitly — only openai_compatible takes
+                    # http_client today.
+                    if plugin.name == "openai_compatible":
+                        try:
+                            kwargs["http_client"] = pinned_httpx_client(
+                                base_url
+                            )
+                        except UnsafeUserUrlError as e:
+                            raise ValueError(
+                                f"Refusing to dispatch model {model_id!r}: {e}"
+                            ) from e
+
+        # Forward model_user_id so backup/fallback resolves under the
+        # owner's scope on shared-agent dispatch.
+        return plugin.llm_class(
            api_key,
            user_api_key,
            decoded_token=decoded_token,
-            model_id=model_id,
+            model_id=upstream_model_id,
            agent_id=agent_id,
            base_url=base_url,
            backup_models=backup_models,
+            model_user_id=model_user_id,
+            capabilities=capabilities,
            *args,
            **kwargs,
        )
--- a/application/llm/novita.py
+++ b/application/llm/novita.py
@@ -5,6 +5,8 @@ NOVITA_BASE_URL = "https://api.novita.ai/openai"


 class NovitaLLM(OpenAILLM):
+    provider_name = "novita"
+
    def __init__(self, api_key=None, user_api_key=None, base_url=None, *args, **kwargs):
        super().__init__(
            api_key=api_key or settings.NOVITA_API_KEY or settings.API_KEY,
--- a/application/llm/open_router.py
+++ b/application/llm/open_router.py
@@ -5,6 +5,8 @@ OPEN_ROUTER_BASE_URL = "https://openrouter.ai/api/v1"


 class OpenRouterLLM(OpenAILLM):
+    provider_name = "openrouter"
+
    def __init__(self, api_key=None, user_api_key=None, base_url=None, *args, **kwargs):
        super().__init__(
            api_key=api_key or settings.OPEN_ROUTER_API_KEY or settings.API_KEY,
--- a/application/llm/openai.py
+++ b/application/llm/openai.py
@@ -61,8 +61,17 @@ def _truncate_base64_for_logging(messages):


 class OpenAILLM(BaseLLM):
+    provider_name = "openai"

-    def __init__(self, api_key=None, user_api_key=None, base_url=None, *args, **kwargs):
+    def __init__(
+        self,
+        api_key=None,
+        user_api_key=None,
+        base_url=None,
+        http_client=None,
+        *args,
+        **kwargs,
+    ):

        super().__init__(*args, **kwargs)
        self.api_key = api_key or settings.OPENAI_API_KEY or settings.API_KEY
@@ -80,7 +89,18 @@ class OpenAILLM(BaseLLM):
        else:
            effective_base_url = "https://api.openai.com/v1"

-        self.client = OpenAI(api_key=self.api_key, base_url=effective_base_url)
+        # http_client (set by LLMCreator for BYOM) is a DNS-rebinding-safe
+        # httpx.Client; without it the SDK re-resolves DNS per request.
+        if http_client is not None:
+            self.client = OpenAI(
+                api_key=self.api_key,
+                base_url=effective_base_url,
+                http_client=http_client,
+            )
+        else:
+            self.client = OpenAI(
+                api_key=self.api_key, base_url=effective_base_url
+            )
        self.storage = StorageCreator.get_storage()

    def _clean_messages_openai(self, messages):
@@ -243,6 +263,13 @@ class OpenAILLM(BaseLLM):
        if "max_tokens" in kwargs:
            kwargs["max_completion_tokens"] = kwargs.pop("max_tokens")

+        # Defense-in-depth: drop tools / response_format if the
+        # registry's capability flags deny them.
+        if tools and not self._supports_tools():
+            tools = None
+        if response_format and not self._supports_structured_output():
+            response_format = None
+
        request_params = {
            "model": model,
            "messages": messages,
@@ -279,6 +306,13 @@ class OpenAILLM(BaseLLM):
        if "max_tokens" in kwargs:
            kwargs["max_completion_tokens"] = kwargs.pop("max_tokens")

+        # See _raw_gen for rationale — drop tools/response_format when the
+        # registry-provided capabilities say the model doesn't support them.
+        if tools and not self._supports_tools():
+            tools = None
+        if response_format and not self._supports_structured_output():
+            response_format = None
+
        request_params = {
            "model": model,
            "messages": messages,
@@ -320,9 +354,17 @@ class OpenAILLM(BaseLLM):
                response.close()

    def _supports_tools(self):
+        # When the LLM was constructed via LLMCreator with a registered
+        # AvailableModel, ``self.capabilities`` is the per-model record.
+        # BYOM users can disable tool support; respect that. Otherwise
+        # OpenAI's API supports tools by default.
+        if self.capabilities is not None:
+            return bool(self.capabilities.supports_tools)
        return True

    def _supports_structured_output(self):
+        if self.capabilities is not None:
+            return bool(self.capabilities.supports_structured_output)
        return True

    def prepare_structured_output_format(self, json_schema):
@@ -389,8 +431,14 @@ class OpenAILLM(BaseLLM):
        Returns:
            list: List of supported MIME types
        """
-        from application.core.model_configs import OPENAI_ATTACHMENTS
-        return OPENAI_ATTACHMENTS
+        # Per-model caps from the registry win when present — a BYOM
+        # endpoint that doesn't accept images would otherwise still be
+        # sent base64 image parts because the OpenAI default below
+        # advertises the image alias unconditionally.
+        if self.capabilities is not None:
+            return list(self.capabilities.supported_attachment_types or [])
+        from application.core.model_yaml import resolve_attachment_alias
+        return resolve_attachment_alias("image")

    def prepare_messages_with_attachments(self, messages, attachments=None):
        """
@@ -527,15 +575,34 @@ class OpenAILLM(BaseLLM):
                ).id,
            )

-            from application.core.mongo_db import MongoDB
-
-            mongo = MongoDB.get_client()
-            db = mongo[settings.MONGO_DB_NAME]
-            attachments_collection = db["attachments"]
-            if "_id" in attachment:
-                attachments_collection.update_one(
-                    {"_id": attachment["_id"]}, {"$set": {"openai_file_id": file_id}}
+            # Cache the OpenAI file id on the attachment row so we don't
+            # re-upload the same blob on the next LLM call. Prefer the PG
+            # UUID (``id``) when present; fall back to the legacy Mongo
+            # ObjectId string (``_id``). Opened per-write — this runs
+            # inside the hot LLM path, so we don't want a long-lived
+            # session wrapping the generator.
+            attachment_id = attachment.get("id") or attachment.get("_id")
+            if attachment_id:
+                user_id = None
+                decoded = getattr(self, "decoded_token", None)
+                if isinstance(decoded, dict):
+                    user_id = decoded.get("sub")
+                from application.storage.db.repositories.attachments import (
+                    AttachmentsRepository,
                )
+                from application.storage.db.session import db_session
+
+                try:
+                    with db_session() as conn:
+                        AttachmentsRepository(conn).update_any(
+                            str(attachment_id),
+                            user_id,
+                            {"openai_file_id": file_id},
+                        )
+                except Exception as cache_err:
+                    logging.warning(
+                        f"Failed to cache openai_file_id on attachment {attachment_id}: {cache_err}"
+                    )
            return file_id
        except Exception as e:
            logging.error(f"Error uploading file to OpenAI: {e}", exc_info=True)
--- a/Show More
+++ b/Show More