Compare commits

...

195 Commits

Author SHA1 Message Date
Alex
80a3d424d6 fix: tests 2026-04-18 13:06:39 +01:00
Alex
d8670b65a4 Merge branch 'main' into pg-4 2026-04-18 12:51:03 +01:00
Alex
c7266bf507 fix: mini improvements 2026-04-18 12:48:37 +01:00
Alex
402fa18819 feat: qa tests 2026-04-18 12:00:43 +01:00
Alex
e103799f81 feat: postgres prep on start 2026-04-17 20:45:22 +01:00
Alex
66541e934b fix: more pg fixes 2026-04-17 17:46:12 +01:00
Alex
a7ef7b4402 fix: mini issues 2026-04-17 16:29:13 +01:00
Alex
cf513371ec fix: agent chunk view 2026-04-17 16:08:48 +01:00
Alex
0ebd326fb0 fix: mini codeql warning 2026-04-17 12:18:00 +01:00
Alex
5e32defe12 fix: mcp, mongo and more 2026-04-17 12:08:52 +01:00
Alex
ca0c943559 feat: e2e tests 2026-04-17 11:38:37 +01:00
Siddhant Rai
681485c21f feat: enhance workflow engine and API routes; add document retrieval and source handling 2026-04-17 15:06:35 +05:30
Manish Madan
7f6360b4ff Merge pull request #2396 from arc53/dependabot/npm_and_yarn/frontend/lucide-react-1.8.0
chore(deps): bump lucide-react from 0.562.0 to 1.8.0 in /frontend
2026-04-17 14:23:19 +05:30
dependabot[bot]
c68f18a0ae chore(deps): bump lucide-react from 0.562.0 to 1.8.0 in /frontend
Bumps [lucide-react](https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react) from 0.562.0 to 1.8.0.
- [Release notes](https://github.com/lucide-icons/lucide/releases)
- [Commits](https://github.com/lucide-icons/lucide/commits/1.8.0/packages/lucide-react)

---
updated-dependencies:
- dependency-name: lucide-react
  dependency-version: 1.8.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-17 08:50:18 +00:00
Manish Madan
684b29e73c Merge pull request #2393 from arc53/dependabot/npm_and_yarn/frontend/eslint-plugin-prettier-5.5.5
chore(deps-dev): bump eslint-plugin-prettier from 5.5.4 to 5.5.5 in /frontend
2026-04-17 14:18:50 +05:30
Alex
a1efea81d0 patch: sitemap and web loader 2026-04-15 23:39:44 +01:00
Alex
f61095b32e fix: remove mongo mock 2026-04-15 22:55:37 +01:00
Alex
928e8588ca fix: ruff fixes 2026-04-15 22:38:22 +01:00
Alex
318acf548a feat: decent coverage 2026-04-15 22:32:28 +01:00
dependabot[bot]
9eb34262e0 chore(deps-dev): bump eslint-plugin-prettier in /frontend
Bumps [eslint-plugin-prettier](https://github.com/prettier/eslint-plugin-prettier) from 5.5.4 to 5.5.5.
- [Release notes](https://github.com/prettier/eslint-plugin-prettier/releases)
- [Changelog](https://github.com/prettier/eslint-plugin-prettier/blob/main/CHANGELOG.md)
- [Commits](https://github.com/prettier/eslint-plugin-prettier/compare/v5.5.4...v5.5.5)

---
updated-dependencies:
- dependency-name: eslint-plugin-prettier
  dependency-version: 5.5.5
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-15 20:53:23 +00:00
Alex
951bdb8365 Merge pull request #2391 from arc53/codex/draft-threat-model-for-docsgpt
Add public threat model document (.github/THREAT_MODEL.md)
2026-04-15 18:38:55 +01:00
Alex
c18f85a050 docs: clarify tool-access boundary in prompt injection section 2026-04-15 18:31:42 +01:00
Alex
c10d474156 feat: more tests 2026-04-15 18:26:20 +01:00
Alex
6325d5f044 feat: better tests 4 2026-04-15 14:30:53 +01:00
Alex
1bbd2b8b65 fix: tests coverage 2026-04-15 13:38:10 +01:00
Alex
f7964dcc8d fix: test mongo 2026-04-15 12:57:35 +01:00
Alex
ef7ff1613b fix: codeql columns thing 2026-04-15 12:56:11 +01:00
Alex
ff0f02c2f0 vale lint fix 2 2026-04-15 12:49:42 +01:00
Alex
4248e4fcc7 fix: mini suggestions 2026-04-15 12:47:23 +01:00
Manish Madan
5ecb174567 Merge pull request #2322 from arc53/dependabot/npm_and_yarn/extensions/react-widget/svgo-4.0.1
chore(deps-dev): bump svgo from 3.3.3 to 4.0.1 in /extensions/react-widget
2026-04-15 17:06:40 +05:30
Alex
0ec8208427 Potential fix for pull request finding 'CodeQL / Clear-text logging of sensitive information'
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2026-04-15 12:35:34 +01:00
Alex
13c21e5a7e fix: vale 2026-04-15 12:32:52 +01:00
Alex
a1ceb1ea8a fix: ruff 2026-04-15 12:30:19 +01:00
Alex
2aa1bc04b0 feat: test fixes 2026-04-15 12:27:01 +01:00
dependabot[bot]
ed7212d016 chore(deps-dev): bump svgo in /extensions/react-widget
Bumps [svgo](https://github.com/svg/svgo) from 3.3.3 to 4.0.1.
- [Release notes](https://github.com/svg/svgo/releases)
- [Commits](https://github.com/svg/svgo/compare/v3.3.3...v4.0.1)

---
updated-dependencies:
- dependency-name: svgo
  dependency-version: 4.0.1
  dependency-type: direct:development
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-15 10:43:02 +00:00
Manish Madan
f82acdab5d Merge pull request #2384 from arc53/dependabot/npm_and_yarn/frontend/eslint-plugin-n-17.24.0
chore(deps-dev): bump eslint-plugin-n from 17.23.1 to 17.24.0 in /frontend
2026-04-15 16:03:56 +05:30
dependabot[bot]
361aebc34c chore(deps-dev): bump eslint-plugin-n in /frontend
Bumps [eslint-plugin-n](https://github.com/eslint-community/eslint-plugin-n) from 17.23.1 to 17.24.0.
- [Release notes](https://github.com/eslint-community/eslint-plugin-n/releases)
- [Changelog](https://github.com/eslint-community/eslint-plugin-n/blob/master/CHANGELOG.md)
- [Commits](https://github.com/eslint-community/eslint-plugin-n/compare/v17.23.1...v17.24.0)

---
updated-dependencies:
- dependency-name: eslint-plugin-n
  dependency-version: 17.24.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-15 10:28:46 +00:00
Alex
aa880d0b31 fix: tests and k8s mongo stuff 2026-04-15 11:08:40 +01:00
Alex
ea4bd608a8 fix: mini code mongo removals 2026-04-15 10:24:56 +01:00
Alex
b35572c0fc feat: adjust docs and compose files 2026-04-15 09:49:01 +01:00
Alex
fa0b358f42 feat: mongo cutoff 2026-04-15 09:36:51 +01:00
Manish Madan
bf194c1a0f Merge pull request #2311 from arc53/dependabot/npm_and_yarn/extensions/react-widget/babel/preset-env-7.29.2
chore(deps-dev): bump @babel/preset-env from 7.24.6 to 7.29.2 in /extensions/react-widget
2026-04-15 13:16:41 +05:30
Manish Madan
54c396750b Merge pull request #2385 from arc53/dependabot/npm_and_yarn/frontend/multi-2a6546692b
chore(deps): bump react-dom and @types/react-dom in /frontend
2026-04-15 13:12:49 +05:30
dependabot[bot]
9adebfec69 chore(deps): bump react-dom and @types/react-dom in /frontend
Bumps [react-dom](https://github.com/facebook/react/tree/HEAD/packages/react-dom) and [@types/react-dom](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-dom). These dependencies needed to be updated together.

Updates `react-dom` from 19.2.0 to 19.2.5
- [Release notes](https://github.com/facebook/react/releases)
- [Changelog](https://github.com/facebook/react/blob/main/CHANGELOG.md)
- [Commits](https://github.com/facebook/react/commits/v19.2.5/packages/react-dom)

Updates `@types/react-dom` from 19.2.2 to 19.2.3
- [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases)
- [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-dom)

---
updated-dependencies:
- dependency-name: react-dom
  dependency-version: 19.2.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
- dependency-name: "@types/react-dom"
  dependency-version: 19.2.3
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-15 07:41:54 +00:00
Manish Madan
92c321f163 Merge pull request #2386 from arc53/dependabot/npm_and_yarn/frontend/tailwindcss/postcss-4.2.2
chore(deps-dev): bump @tailwindcss/postcss from 4.1.16 to 4.2.2 in /frontend
2026-04-15 13:10:30 +05:30
dependabot[bot]
e3d36b9e52 chore(deps-dev): bump @tailwindcss/postcss in /frontend
Bumps [@tailwindcss/postcss](https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/@tailwindcss-postcss) from 4.1.16 to 4.2.2.
- [Release notes](https://github.com/tailwindlabs/tailwindcss/releases)
- [Changelog](https://github.com/tailwindlabs/tailwindcss/blob/main/CHANGELOG.md)
- [Commits](https://github.com/tailwindlabs/tailwindcss/commits/v4.2.2/packages/@tailwindcss-postcss)

---
updated-dependencies:
- dependency-name: "@tailwindcss/postcss"
  dependency-version: 4.2.2
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-15 07:39:25 +00:00
Manish Madan
8950e11208 Merge pull request #2388 from arc53/dependabot/npm_and_yarn/frontend/tailwindcss-4.2.2
chore(deps-dev): bump tailwindcss from 4.2.1 to 4.2.2 in /frontend
2026-04-15 12:39:31 +05:30
dependabot[bot]
5de0132a65 chore(deps-dev): bump tailwindcss from 4.2.1 to 4.2.2 in /frontend
Bumps [tailwindcss](https://github.com/tailwindlabs/tailwindcss/tree/HEAD/packages/tailwindcss) from 4.2.1 to 4.2.2.
- [Release notes](https://github.com/tailwindlabs/tailwindcss/releases)
- [Changelog](https://github.com/tailwindlabs/tailwindcss/blob/main/CHANGELOG.md)
- [Commits](https://github.com/tailwindlabs/tailwindcss/commits/v4.2.2/packages/tailwindcss)

---
updated-dependencies:
- dependency-name: tailwindcss
  dependency-version: 4.2.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-15 07:06:26 +00:00
Manish Madan
b92ca91512 Merge pull request #2387 from arc53/dependabot/npm_and_yarn/frontend/mermaid-11.14.0
chore(deps): bump mermaid from 11.13.0 to 11.14.0 in /frontend
2026-04-15 12:34:28 +05:30
dependabot[bot]
8e0b2844a2 chore(deps): bump mermaid from 11.13.0 to 11.14.0 in /frontend
Bumps [mermaid](https://github.com/mermaid-js/mermaid) from 11.13.0 to 11.14.0.
- [Release notes](https://github.com/mermaid-js/mermaid/releases)
- [Commits](https://github.com/mermaid-js/mermaid/compare/mermaid@11.13.0...mermaid@11.14.0)

---
updated-dependencies:
- dependency-name: mermaid
  dependency-version: 11.14.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-15 07:02:28 +00:00
dependabot[bot]
0969db5e30 chore(deps-dev): bump @babel/preset-env in /extensions/react-widget
Bumps [@babel/preset-env](https://github.com/babel/babel/tree/HEAD/packages/babel-preset-env) from 7.24.6 to 7.29.2.
- [Release notes](https://github.com/babel/babel/releases)
- [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md)
- [Commits](https://github.com/babel/babel/commits/v7.29.2/packages/babel-preset-env)

---
updated-dependencies:
- dependency-name: "@babel/preset-env"
  dependency-version: 7.29.2
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-15 05:30:32 +00:00
Manish Madan
af335a27e8 Merge pull request #2309 from arc53/dependabot/npm_and_yarn/extensions/react-widget/babel/core-7.29.0
chore(deps-dev): bump @babel/core from 7.24.6 to 7.29.0 in /extensions/react-widget
2026-04-15 10:59:07 +05:30
dependabot[bot]
0ae3139284 chore(deps-dev): bump @babel/core in /extensions/react-widget
Bumps [@babel/core](https://github.com/babel/babel/tree/HEAD/packages/babel-core) from 7.24.6 to 7.29.0.
- [Release notes](https://github.com/babel/babel/releases)
- [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md)
- [Commits](https://github.com/babel/babel/commits/v7.29.0/packages/babel-core)

---
updated-dependencies:
- dependency-name: "@babel/core"
  dependency-version: 7.29.0
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-14 23:05:51 +00:00
Manish Madan
7529ca3dd6 Merge pull request #2389 from arc53/dependabot/pip/application/pip-3344959f9f
chore(deps): bump cryptography from 46.0.6 to 46.0.7 in /application in the pip group across 1 directory
2026-04-15 04:32:07 +05:30
dependabot[bot]
1b813320f1 chore(deps): bump cryptography
Bumps the pip group with 1 update in the /application directory: [cryptography](https://github.com/pyca/cryptography).


Updates `cryptography` from 46.0.6 to 46.0.7
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pyca/cryptography/compare/46.0.6...46.0.7)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-version: 46.0.7
  dependency-type: direct:production
  dependency-group: pip
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-14 21:46:13 +00:00
Manish Madan
02012e9a0b Merge pull request #2366 from arc53/dependabot/pip/application/tzdata-2026.1
chore(deps): bump tzdata from 2025.3 to 2026.1 in /application
2026-04-15 03:14:25 +05:30
dependabot[bot]
c2f027265a chore(deps): bump tzdata from 2025.3 to 2026.1 in /application
Bumps [tzdata](https://github.com/python/tzdata) from 2025.3 to 2026.1.
- [Release notes](https://github.com/python/tzdata/releases)
- [Changelog](https://github.com/python/tzdata/blob/master/NEWS.md)
- [Commits](https://github.com/python/tzdata/compare/2025.3...2026.1)

---
updated-dependencies:
- dependency-name: tzdata
  dependency-version: '2026.1'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-14 21:30:10 +00:00
Manish Madan
0ae615c10e Merge pull request #2365 from arc53/dependabot/pip/application/langchain-core-1.2.26
chore(deps): bump langchain-core from 1.2.23 to 1.2.26 in /application
2026-04-15 02:58:36 +05:30
dependabot[bot]
881d0da344 chore(deps): bump langchain-core from 1.2.23 to 1.2.26 in /application
Bumps [langchain-core](https://github.com/langchain-ai/langchain) from 1.2.23 to 1.2.26.
- [Release notes](https://github.com/langchain-ai/langchain/releases)
- [Commits](https://github.com/langchain-ai/langchain/compare/langchain-core==1.2.23...langchain-core==1.2.26)

---
updated-dependencies:
- dependency-name: langchain-core
  dependency-version: 1.2.26
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-14 21:10:32 +00:00
Manish Madan
1376de6bae Merge pull request #2364 from arc53/dependabot/pip/application/langsmith-0.7.26
chore(deps): bump langsmith from 0.7.23 to 0.7.26 in /application
2026-04-15 02:39:17 +05:30
Alex
571f132f3e feat: mongo cutoff 2026-04-14 21:43:06 +01:00
dependabot[bot]
362ebfcc0a chore(deps): bump langsmith from 0.7.23 to 0.7.26 in /application
Bumps [langsmith](https://github.com/langchain-ai/langsmith-sdk) from 0.7.23 to 0.7.26.
- [Release notes](https://github.com/langchain-ai/langsmith-sdk/releases)
- [Commits](https://github.com/langchain-ai/langsmith-sdk/compare/v0.7.23...v0.7.26)

---
updated-dependencies:
- dependency-name: langsmith
  dependency-version: 0.7.26
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-14 20:29:55 +00:00
Manish Madan
bc77eed3d8 Merge pull request #2363 from arc53/dependabot/pip/application/jsonpointer-3.1.1
chore(deps): bump jsonpointer from 3.0.0 to 3.1.1 in /application
2026-04-15 01:58:23 +05:30
dependabot[bot]
1f346588e7 chore(deps): bump jsonpointer from 3.0.0 to 3.1.1 in /application
Bumps [jsonpointer](https://github.com/stefankoegl/python-json-pointer) from 3.0.0 to 3.1.1.
- [Commits](https://github.com/stefankoegl/python-json-pointer/compare/v3.0.0...v3.1.1)

---
updated-dependencies:
- dependency-name: jsonpointer
  dependency-version: 3.1.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-14 20:13:38 +00:00
Alex
2fed5c882b Merge pull request #2383 from arc53/codex/add-zizmor-ci-configuration
Add GitHub Actions zizmor security workflow
2026-04-14 18:02:18 +01:00
Alex
aa938d76d7 Add GitHub Actions zizmor security workflow 2026-04-14 17:56:14 +01:00
Alex
e7f1e2a376 feat: postgres tests 2026-04-14 17:07:02 +01:00
Manish Madan
2940628aa6 Merge pull request #2319 from arc53/dependabot/npm_and_yarn/frontend/npm_and_yarn-e5a595f223
chore(deps-dev): bump flatted from 3.4.1 to 3.4.2 in /frontend in the npm_and_yarn group across 1 directory
2026-04-14 21:30:54 +05:30
dependabot[bot]
7f23928134 chore(deps-dev): bump flatted
Bumps the npm_and_yarn group with 1 update in the /frontend directory: [flatted](https://github.com/WebReflection/flatted).


Updates `flatted` from 3.4.1 to 3.4.2
- [Commits](https://github.com/WebReflection/flatted/compare/v3.4.1...v3.4.2)

---
updated-dependencies:
- dependency-name: flatted
  dependency-version: 3.4.2
  dependency-type: indirect
  dependency-group: npm_and_yarn
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-14 15:10:18 +00:00
Alex
20e17c84c7 Merge pull request #2379 from arc53/codex/refine-and-review-incident-response-plan
Add INCIDENT_RESPONSE.md and reference it from SECURITY.md
2026-04-14 14:59:04 +01:00
copilot-swe-agent[bot]
389ddf6068 Fix secret references in INCIDENT_RESPONSE.md to match actual DocsGPT config
Agent-Logs-Url: https://github.com/arc53/DocsGPT/sessions/c6bfd68d-4dac-46ec-8404-fe5bfda0e8f3

Co-authored-by: dartpain <15183589+dartpain@users.noreply.github.com>
2026-04-14 10:51:22 +00:00
Alex
1e2443fb90 Update .github/INCIDENT_RESPONSE.md
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-04-14 11:49:13 +01:00
Manish Madan
6387bd1892 Merge pull request #2303 from arc53/dependabot/npm_and_yarn/frontend/typescript-eslint/eslint-plugin-8.57.1
chore(deps-dev): bump @typescript-eslint/eslint-plugin from 8.46.3 to 8.57.1 in /frontend
2026-04-14 14:16:35 +05:30
dependabot[bot]
7d22724d1c chore(deps-dev): bump @typescript-eslint/eslint-plugin in /frontend
Bumps [@typescript-eslint/eslint-plugin](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/eslint-plugin) from 8.46.3 to 8.57.1.
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/eslint-plugin/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v8.57.1/packages/eslint-plugin)

---
updated-dependencies:
- dependency-name: "@typescript-eslint/eslint-plugin"
  dependency-version: 8.57.1
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-14 08:39:33 +00:00
Manish Madan
f6f12f6895 Merge pull request #2302 from arc53/dependabot/npm_and_yarn/frontend/prettier-plugin-tailwindcss-0.7.2
chore(deps-dev): bump prettier-plugin-tailwindcss from 0.7.1 to 0.7.2 in /frontend
2026-04-14 14:07:38 +05:30
dependabot[bot]
934127f323 chore(deps-dev): bump prettier-plugin-tailwindcss in /frontend
Bumps [prettier-plugin-tailwindcss](https://github.com/tailwindlabs/prettier-plugin-tailwindcss) from 0.7.1 to 0.7.2.
- [Release notes](https://github.com/tailwindlabs/prettier-plugin-tailwindcss/releases)
- [Changelog](https://github.com/tailwindlabs/prettier-plugin-tailwindcss/blob/main/CHANGELOG.md)
- [Commits](https://github.com/tailwindlabs/prettier-plugin-tailwindcss/compare/v0.7.1...v0.7.2)

---
updated-dependencies:
- dependency-name: prettier-plugin-tailwindcss
  dependency-version: 0.7.2
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-14 08:25:09 +00:00
Manish Madan
1780e3cc91 Merge pull request #2301 from arc53/dependabot/npm_and_yarn/frontend/react-i18next-16.5.8
chore(deps): bump react-i18next from 16.2.4 to 16.5.8 in /frontend
2026-04-14 13:53:12 +05:30
ManishMadan2882
5e7fab2f34 (chore:fe) i18next 2026-04-14 13:50:03 +05:30
Alex
92ae76f95e Merge pull request #2381 from arc53/pg-3
feat: pre depriciation
2026-04-14 08:33:42 +01:00
Alex
18755bdd9b fix: workflow tests 2026-04-14 00:35:57 +01:00
Alex
0f20adcbf4 feat: pre depriciation 2026-04-14 00:19:50 +01:00
Alex
18e2a829c9 docs: apply revised incident response plan wording 2026-04-13 14:11:45 +01:00
dependabot[bot]
cd44501a71 chore(deps): bump react-i18next from 16.2.4 to 16.5.8 in /frontend
Bumps [react-i18next](https://github.com/i18next/react-i18next) from 16.2.4 to 16.5.8.
- [Changelog](https://github.com/i18next/react-i18next/blob/master/CHANGELOG.md)
- [Commits](https://github.com/i18next/react-i18next/compare/v16.2.4...v16.5.8)

---
updated-dependencies:
- dependency-name: react-i18next
  dependency-version: 16.5.8
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-13 12:36:35 +00:00
Manish Madan
f8ebdf3fd4 Merge pull request #2300 from arc53/dependabot/npm_and_yarn/frontend/i18next-browser-languagedetector-8.2.1
chore(deps): bump i18next-browser-languagedetector from 8.2.0 to 8.2.1 in /frontend
2026-04-13 18:03:46 +05:30
dependabot[bot]
7c6fca18ad chore(deps): bump i18next-browser-languagedetector in /frontend
Bumps [i18next-browser-languagedetector](https://github.com/i18next/i18next-browser-languageDetector) from 8.2.0 to 8.2.1.
- [Changelog](https://github.com/i18next/i18next-browser-languageDetector/blob/master/CHANGELOG.md)
- [Commits](https://github.com/i18next/i18next-browser-languageDetector/compare/v8.2.0...v8.2.1)

---
updated-dependencies:
- dependency-name: i18next-browser-languagedetector
  dependency-version: 8.2.1
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-04-13 12:28:26 +00:00
Alex
5fab798707 Merge pull request #2377 from arc53/pg-2
feat: pg-2
2026-04-12 14:09:52 +01:00
Alex
cb30a24e05 feat: fixes on pg2 2026-04-12 13:51:29 +01:00
Alex
530761d08c feat: pg-2 2026-04-12 13:35:32 +01:00
Alex
73fbc28744 Merge pull request #2376 from arc53/pg-1
Pg 1
2026-04-12 12:44:12 +01:00
Alex
b5b6538762 fix: tests 2026-04-12 12:35:23 +01:00
Alex
a9761061fc fix: mini issues 2026-04-12 12:24:58 +01:00
Alex
9388996a15 fix: ruff 2026-04-12 12:18:31 +01:00
Alex
875868b7e5 fix: comment 2026-04-12 12:16:19 +01:00
Alex
502819ae52 feat: pg migration, more tables 2026-04-12 12:15:59 +01:00
Alex
cada1a44fc Merge pull request #2373 from siiddhantt/feat/confluence-connector
feat: add Confluence connector for data ingestion
2026-04-12 11:33:49 +01:00
Alex
6192767451 fix: sanitize attachment filenames, drop dateutil dep, add connector docs 2026-04-12 11:32:24 +01:00
Alex
5c3e6eca54 Merge pull request #2375 from arc53/pg
feat: init pg migration
2026-04-12 10:44:31 +01:00
Alex
59d9d4ac50 fix: comments in settings 2026-04-12 10:42:46 +01:00
Alex
3931ccccee Merge pull request #2374 from ManishMadan2882/main
UX: Conversation scroll experience
2026-04-12 00:37:11 +01:00
Alex
55717043f6 fix: vale 2026-04-12 00:29:23 +01:00
Alex
ececcb8b17 feat: init pg migration 2026-04-12 00:07:24 +01:00
ManishMadan2882
420e9d3dd5 (feat) conversation: scroll experience 2026-04-10 19:27:07 +05:30
Siddhant Rai
749eed3d0b feat: add Confluence integration with authentication and file loading capabilities
- Enhanced settings.py to include Confluence client ID and secret
- Created ConfluenceAuth class for handling authentication with Confluence
- Implemented ConfluenceLoader class for loading data from Confluence
- Updated connector_creator.py to register Confluence as a connector
- Added confluence.svg asset for UI representation
- Modified ConnectorAuth component to support Confluence connection
- Updated FilePicker component to include Confluence as a file source
- Added localization support for Confluence in multiple languages (de, en, es, jp, ru, zh-TW, zh)
- Enhanced Upload component to handle Confluence file selection
- Updated ingestor types to include Confluence and its configuration
2026-04-10 19:10:35 +05:30
Alex
bd03a513e3 Merge pull request #2372 from arc53/fast-ebook
feat: faster ebook parsing
2026-04-09 18:38:13 +01:00
Alex
fcdb4fb5e8 feat: faster ebook parsing 2026-04-09 18:31:06 +01:00
Alex
e787c896eb upd Security.md 2026-04-08 12:49:20 +01:00
Alex
23aeaff5db Merge pull request #2362 from arc53/v1-mini-improvements
feat: history overwrite
2026-04-06 15:02:32 +01:00
Alex
689dd79597 fix: lang 2026-04-06 14:57:51 +01:00
Alex
0c15af90b1 feat: history overwrite 2026-04-06 14:42:01 +01:00
Alex
cdd6ff6557 chore: bump deps 2026-04-04 12:45:34 +01:00
Alex
72b3d94453 fix: tests 2026-04-03 18:30:46 +01:00
Alex
7e88d09e5d Merge branch 'main' of https://github.com/arc53/DocsGPT 2026-04-03 18:26:37 +01:00
Alex
74a4a237dc fix: bump deps 2026-04-03 18:26:29 +01:00
Alex
c3f01c6619 Merge pull request #2347 from ManishMadan2882/main
Minor frontend updates
2026-04-03 18:17:27 +01:00
Alex
6b408823d4 fix: mini theme color edits 2026-04-03 18:16:07 +01:00
Alex
3fc81ac5d8 fix: clean error 2026-04-03 18:08:38 +01:00
Alex
2652f8a5b0 fix: chatwoot 2026-04-03 18:04:49 +01:00
Alex
d711eefe96 patch: agent usage limits 2026-04-03 18:03:31 +01:00
Alex
79206f3919 fix: harden faiss 2026-04-03 17:57:49 +01:00
Alex
de971d9452 fix: validate mcp url 2026-04-03 17:52:48 +01:00
Alex
1b4d5ca0dd patch: mcp identity 2026-04-03 17:40:22 +01:00
Alex
81989e8258 fix: patch /v1/models 2026-04-03 17:37:09 +01:00
Alex
dc262d1698 patch: error 2026-04-03 17:30:23 +01:00
Alex
69f9c93869 patch: s3 2026-04-03 17:28:09 +01:00
Alex
74bf80b25c patch: sharing convos 2026-04-03 17:20:06 +01:00
Alex
d9a92a7208 feat: improve setup scripts 2026-04-03 17:15:21 +01:00
Alex
02e93d993d patch: available tools 2026-04-03 17:12:36 +01:00
Alex
6b6495f48c patch: key 2026-04-03 17:06:35 +01:00
Alex
249dd9ce37 patch: paths 2026-04-03 16:45:03 +01:00
Alex
9134ab0478 Merge branch 'main' of https://github.com/arc53/DocsGPT 2026-04-03 16:40:50 +01:00
Alex
10ef68c9d0 Revise vulnerability reporting process
Updated vulnerability reporting instructions to use GitHub's private reporting flow.
2026-04-03 16:36:10 +01:00
Alex
7d65cf1c2b chore: bump deps 2026-04-03 16:35:10 +01:00
Alex
13c6cc59c1 Merge pull request #2349 from arc53/messages-format
Messages format
2026-04-03 16:26:57 +01:00
Alex
6381f7dd4e fix: remove bad tests 2026-04-03 16:20:15 +01:00
Alex
e6ac4008fe feat: better tool names for llms 2026-04-03 15:35:50 +01:00
Alex
1af09f114d fix: tool mapping 2026-04-03 13:32:55 +01:00
Alex
be7da983e7 fix: remove internal tools when creating tools and better Approval gate UX 2026-04-03 10:36:48 +01:00
Alex
8b9e595d85 fix: structure improvements of messages 2026-04-01 14:58:44 +01:00
Alex
398f3acc8d fix: clean error 2026-04-01 13:01:02 +01:00
Alex
e04baa7ed8 feat: tests and approval gate 2026-04-01 12:49:32 +01:00
Alex
e5586b6f20 feat: fronted connection to api 2026-04-01 10:55:54 +01:00
Alex
addf57cab7 feat: compatible api 2026-03-31 23:10:09 +01:00
ManishMadan2882
648b3f1d20 (fix) lint/fe 2026-04-01 03:30:44 +05:30
ManishMadan2882
a75a9e23f9 (feat:fe) minor good things 2026-04-01 03:19:03 +05:30
Alex
73256389cf feat: client side tools 2026-03-31 22:20:55 +01:00
Alex
d609efca49 feat: continuation messages 2026-03-31 21:30:24 +01:00
Alex
772860b667 fix: mini fe changes 2026-03-31 11:59:38 +01:00
Alex
ea2fd8b04a chore: remove unused deps 2026-03-31 11:57:01 +01:00
Alex
2c73deac20 deps upgrades 2026-03-31 11:32:55 +01:00
Alex
47f3907e5e Merge pull request #2340 from arc53/coverage-3
chore: more tests
2026-03-31 00:50:46 +01:00
Alex
727495c553 fix: mongo in unit tests 2026-03-31 00:34:49 +01:00
Alex
a3b08a5b44 More tests 2026-03-31 00:07:19 +01:00
Alex
81532ada2a Merge pull request #2318 from siiddhantt/feat/standardize-css
feat: update styles and improve accessibility across frontend
2026-03-30 23:26:45 +01:00
ManishMadan2882
43f71374e5 (chore:fe) lint-fix 2026-03-30 23:26:11 +05:30
Alex
d5c0322e2a chore: more tests 2026-03-30 16:13:08 +01:00
Siddhant Rai
3b66a3176c fix: improve option matching logic in Dropdown component + selected style 2026-03-30 18:36:35 +05:30
Alex
dc6db847ca Merge pull request #2339 from arc53/test-handlers
chore: handlers tests
2026-03-30 13:06:53 +01:00
Alex
ed0063aada chore: handlers tests 2026-03-30 12:53:50 +01:00
Siddhant Rai
9a6a55b6da Merge branch 'main' into feat/standardize-css 2026-03-30 13:14:43 +05:30
Siddhant Rai
12a8368216 fix: merge conflicts 2026-03-30 13:12:24 +05:30
Alex
3f6d6f15ea Merge pull request #2338 from arc53/tests-utils
chore: utils tests
2026-03-29 11:54:59 +01:00
Alex
126fa01b14 chore: utils tests 2026-03-29 11:49:35 +01:00
Alex
e06debad5f chore: connector tests 2026-03-29 10:32:48 +01:00
Alex
6492852f7d fix: lint on seeder test 2026-03-29 09:48:46 +01:00
Alex
00a621f33a tests: retriever and seeder 2026-03-29 09:46:07 +01:00
Alex
e92ffc6fdc Merge pull request #2337 from arc53/tests-api
chore: api and tool tests
2026-03-28 22:55:10 +00:00
Alex
fe185e5b8d chore: api and tool tests 2026-03-28 21:51:47 +00:00
Alex
9f3d9ab860 docs: agent types 2026-03-28 18:50:50 +00:00
Alex
1c0adde380 chore: docs update 2026-03-28 17:04:06 +00:00
Alex
3c56bd0d0b docs: fix conflicts 2026-03-28 16:16:15 +00:00
Alex
86664ebda2 Merge pull request #2320 from Alex-wuhu/novita-integration
feat: complete Novita AI provider integration
2026-03-28 13:22:02 +00:00
Alex
db18b743d1 fix tests 2 2026-03-28 13:10:41 +00:00
Alex
9e85cc9065 fix: test errors 2026-03-28 13:04:21 +00:00
Alex
aaaa6f002d Merge branch 'main' of https://github.com/arc53/DocsGPT 2026-03-28 12:03:27 +00:00
Alex
47dcbcb74b fix: tests and sources on workflow agent 2026-03-28 12:03:16 +00:00
Alex
ddbfd94193 Adjust demo GIF height in README
Update the height of the demo GIF in README.
2026-03-28 11:09:05 +00:00
Alex
8dec60ab8b Update demo GIF in README
Replaced demo video GIF in README with a new example.
2026-03-28 11:08:10 +00:00
Alex
84b2e4bab4 fix: end node multi input 2026-03-28 10:00:01 +00:00
Siddhant Rai
193ca6fd63 fix: lint errors + redundant css 2026-03-28 15:02:16 +05:30
Alex
2afdd7f026 fix: enable tools in workflow agents 2026-03-27 13:43:09 +00:00
Alex
f364475f64 Merge pull request #2335 from arc53/tests-worker
tests: worker coverage
2026-03-27 12:14:56 +00:00
Alex
b254de6ed6 tests: worker coverage 2026-03-27 12:03:55 +00:00
Alex
08dedcaf95 Merge pull request #2334 from arc53/tests-vectors
tests: vectors
2026-03-26 19:11:20 +00:00
Alex
c726eb8ebd fix: ruff 2026-03-26 18:48:53 +00:00
Alex
5f0d39e5f1 tests: vectors 2026-03-26 18:42:59 +00:00
Alex
8c82fc5495 Merge pull request #2333 from arc53/chore/bump-npm-v0.6.3
chore: bump npm libraries to v0.6.3
2026-03-26 14:17:09 +00:00
github-actions[bot]
6d81a15e97 chore: bump npm libraries to v0.6.3 2026-03-26 14:16:32 +00:00
Alex
5478e4234c chore: bump npm again 2026-03-26 14:06:05 +00:00
Alex
4056278fef chore: bump npm 2026-03-26 13:48:05 +00:00
Siddhant Rai
174dee0fe6 fix: inconsistencies with prev color patterns 2026-03-26 18:32:57 +05:30
Siddhant Rai
844167ba06 Merge branch 'feat/standardize-css' of https://github.com/siiddhantt/DocsGPT into feat/standardize-css 2026-03-25 19:36:35 +05:30
Siddhant Rai
6fa3acb1ca style: standardize colors across components according to figma 2026-03-25 19:36:32 +05:30
Alex
9fd063266b Mini fixes 2026-03-24 01:40:29 +00:00
Alex-wuhu
eaf39bb15b feat: add Novita AI as LLM provider
Add Novita AI (https://novita.ai) as a new LLM provider option.
Novita offers OpenAI-compatible API endpoints with competitive pricing.
2026-03-23 10:52:26 +08:00
Siddhant Rai
324a8cd4cf refactor: update styles and improve accessibility across frontend
- Updated text colors to use foreground and muted-foreground for better contrast.
- Replaced hardcoded colors with theme-based classes for consistency.
- Enhanced input fields with icons for improved usability.
- Adjusted button styles for a more cohesive design.
- Refactored search input components to use consistent styling and layout.
- Improved layout and spacing in various components for better user experience.
- Updated tool and source titles and subtitles for clarity.
2026-03-20 17:10:27 +05:30
596 changed files with 835578 additions and 13706 deletions

View File

@@ -3,6 +3,14 @@ LLM_NAME=docsgpt
VITE_API_STREAMING=true
INTERNAL_KEY=<internal key for worker-to-backend authentication>
# Provider-specific API keys (optional - use these to enable multiple providers)
# OPENAI_API_KEY=<your-openai-api-key>
# ANTHROPIC_API_KEY=<your-anthropic-api-key>
# GOOGLE_API_KEY=<your-google-api-key>
# GROQ_API_KEY=<your-groq-api-key>
# NOVITA_API_KEY=<your-novita-api-key>
# OPEN_ROUTER_API_KEY=<your-openrouter-api-key>
# Remote Embeddings (Optional - for using a remote embeddings API instead of local SentenceTransformer)
# When set, the app will use the remote API and won't load SentenceTransformer (saves RAM)
EMBEDDINGS_BASE_URL=
@@ -26,3 +34,9 @@ MICROSOFT_TENANT_ID=your-azure-ad-tenant-id
#or "https://login.microsoftonline.com/contoso.onmicrosoft.com".
#Alternatively, use "https://login.microsoftonline.com/common" for multi-tenant app.
MICROSOFT_AUTHORITY=https://{tenantId}.ciamlogin.com/{tenantId}
# User-data Postgres DB (Phase 0 of the MongoDB→Postgres migration).
# Standard Postgres URI — `postgres://` and `postgresql://` both work.
# Leave unset while the migration is still being rolled out; the app will
# fall back to MongoDB for user data until POSTGRES_URI is configured.
# POSTGRES_URI=postgresql://docsgpt:docsgpt@localhost:5432/docsgpt

99
.github/INCIDENT_RESPONSE.md vendored Normal file
View File

@@ -0,0 +1,99 @@
# DocsGPT Incident Response Plan (IRP)
This playbook describes how maintainers respond to confirmed or suspected security incidents.
- Vulnerability reporting: [`SECURITY.md`](../SECURITY.md)
- Non-security bugs/features: [`CONTRIBUTING.md`](../CONTRIBUTING.md)
## Severity
| Severity | Definition | Typical examples |
|---|---|---|
| **Critical** | Active exploitation, supply-chain compromise, or confirmed data breach requiring immediate user action. | Compromised release artifact/image; remote execution. |
| **High** | Serious undisclosed vulnerability with no practical workaround, or CVSS >= 7.0. | key leakage; prompt injection enabling cross-tenant access. |
| **Medium** | Material impact but constrained by preconditions/scope, or a practical workaround exists. | Auth-required exploit; dependency CVE with limited reachability. |
| **Low** | Defense-in-depth or narrow availability impact with no confirmed data exposure. | Missing rate limiting; hardening gap without exploit evidence. |
## Response workflow
### 1) Triage (target: initial response within 48 hours)
1. Acknowledge report.
2. Validate on latest release and `main`.
3. Confirm in-scope security issue vs. hardening item (per `SECURITY.md`).
4. Assign severity and open a **draft GitHub Security Advisory (GHSA)** (no public issue).
5. Determine whether root cause is DocsGPT code or upstream dependency/provider.
### 2) Investigation
1. Identify affected components, versions, and deployment scope (self-hosted, cloud, or both).
2. For AI issues, explicitly evaluate prompt injection, document isolation, and output leakage.
3. Request a CVE through GHSA for **Medium+** issues.
### 3) Containment, fix, and disclosure
1. Implement and test fix in private security workflow (GHSA private fork/branch).
2. Merge fix to `main`, cut patched release, and verify published artifacts/images.
3. Patch managed cloud deployment (`app.docsgpt.cloud`) and other deployments as soon as validated.
4. Publish GHSA with CVE (if assigned), affected/fixed versions, CVSS, mitigations, and upgrade guidance.
5. **Critical/High:** coordinate disclosure timing with reporter (goal: <= 90 days) and publish a notice.
6. **Medium/Low:** include in next scheduled release unless risk requires immediate out-of-band patching.
### 4) Post-incident
1. Monitor support channels (GitHub/Discord) for regressions or exploitation reports.
2. Run a short retrospective (root cause, detection, response gaps, prevention work).
3. Track follow-up hardening actions with owners/dates.
4. Update this IRP and related runbooks as needed.
## Scenario playbooks
### Supply-chain compromise
1. Freeze releases and investigate blast radius.
2. Rotate credentials in order: Docker Hub -> GitHub tokens -> LLM provider keys -> DB credentials -> `JWT_SECRET_KEY` -> `ENCRYPTION_SECRET_KEY` -> `INTERNAL_KEY`.
3. Replace compromised artifacts/tags with clean releases and revoke/remove bad tags where possible.
4. Publish advisory with exact affected versions and required user actions.
### Data exposure
1. Determine scope (users, documents, keys, logs, time window).
2. Disable affected path or hotfix immediately for managed cloud.
3. Notify affected users with concrete remediation steps (for example, rotate keys).
4. Continue through standard fix/disclosure workflow.
### Critical regression with security impact
1. Identify introducing change (`git bisect` if needed).
2. Publish workaround within 24 hours (for example, pin to known-good version).
3. Ship patch release with regression test and close incident with public summary.
## AI-specific guidance
Treat confirmed AI-specific abuse as security incidents:
- Prompt injection causing sensitive data exfiltration (from tools that don't belong to the agent) -> **High**
- Cross-tenant retrieval/isolation failure -> **High**
- API key disclosure in output -> **High**
## Secret rotation quick reference
| Secret | Standard rotation action |
|---|---|
| Docker Hub credentials | Revoke/replace in Docker Hub; update CI/CD secrets |
| GitHub tokens/PATs | Revoke/replace in GitHub; update automation secrets |
| LLM provider API keys | Rotate in provider console; update runtime/deploy secrets |
| Database credentials | Rotate in DB platform; redeploy with new secrets |
| `JWT_SECRET_KEY` | Rotate and redeploy (invalidates all active user sessions/tokens) |
| `ENCRYPTION_SECRET_KEY` | Rotate and redeploy (re-encrypt stored data if possible; existing encrypted data may become inaccessible) |
| `INTERNAL_KEY` | Rotate and redeploy (invalidates worker-to-backend authentication) |
## Maintenance
Review this document:
- after every **Critical/High** incident, and
- at least annually.
Changes should be proposed via pull request to `main`.

144
.github/THREAT_MODEL.md vendored Normal file
View File

@@ -0,0 +1,144 @@
# DocsGPT Public Threat Model
**Classification:** Public
**Last updated:** 2026-04-15
**Applies to:** Open-source and self-hosted DocsGPT deployments
## 1) Overview
DocsGPT ingests content (files/URLs/connectors), indexes it, and answers queries via LLM-backed APIs and optional tools.
Core components:
- Backend API (`application/`)
- Workers/ingestion (`application/worker.py` and related modules)
- Datastores (MongoDB/Redis/vector stores)
- Frontend (`frontend/`)
- Optional extensions/integrations (`extensions/`)
## 2) Scope and assumptions
In scope:
- Application-level threats in this repository.
- Local and internet-exposed self-hosted deployments.
Assumptions:
- Internet-facing instances enable auth and use strong secrets.
- Datastores/internal services are not publicly exposed.
Out of scope:
- Cloud hardware/provider compromise.
- Security guarantees of external LLM vendors.
- Full security audits of third-party systems targeted by tools (external DBs/MCP servers/code-exec APIs).
## 3) Security objectives
- Protect document/conversation confidentiality.
- Preserve integrity of prompts, agents, tools, and indexed data.
- Maintain API/worker availability.
- Enforce tenant isolation in authenticated deployments.
## 4) Assets
- Documents, attachments, chunks/embeddings, summaries.
- Conversations, agents, workflows, prompt templates.
- Secrets (JWT secret, `INTERNAL_KEY`, provider/API/OAuth credentials).
- Operational capacity (worker throughput, queue depth, model quota/cost).
## 5) Trust boundaries and untrusted input
Trust boundaries:
- Internet ↔ Frontend
- Frontend ↔ Backend API
- Backend ↔ Workers/internal APIs
- Backend/workers ↔ Datastores
- Backend ↔ External LLM/connectors/remote URLs
Untrusted input includes API payloads, file uploads, remote URLs, OAuth/webhook data, retrieved content, and LLM/tool arguments.
## 6) Main attack surfaces
1. Auth/authz paths and sharing tokens.
2. File upload + parsing pipeline.
3. Remote URL fetching and connectors (SSRF risk).
4. Agent/tool execution from LLM output.
5. Template/workflow rendering.
6. Frontend rendering + token storage.
7. Internal service endpoints (`INTERNAL_KEY`).
8. High-impact integrations (SQL tool, generic API tool, remote MCP tools).
## 7) Key threats and expected mitigations
### A. Auth/authz misconfiguration
- Threat: weak/no auth or leaked tokens leads to broad data access.
- Mitigations: require auth for public deployments, short-lived tokens, rotation/revocation, least-privilege sharing.
### B. Untrusted file ingestion
- Threat: malicious files/archives trigger traversal, parser exploits, or resource exhaustion.
- Mitigations: strict path checks, archive safeguards, file limits, patched parser dependencies.
### C. SSRF/outbound abuse
- Threat: URL loaders/tools access private/internal/metadata endpoints.
- Mitigations: validate URLs + redirects, block private/link-local ranges, apply egress controls/allowlists.
### D. Prompt injection + tool abuse
- Threat: retrieved text manipulates model behavior and causes unsafe tool calls.
- Threat: never rely on the model to "choose correctly" under adversarial input.
- Mitigations: treat retrieved/model output as untrusted, enforce tool policies, only expose tools explicitly assigned by the user/admin to that agent, separate system instructions from retrieved content, audit tool calls.
### E. Dangerous tool capability chaining (SQL/API/MCP)
- Threat: write-capable SQL credentials allow destructive queries.
- Threat: API tool can trigger side effects (infra/payment/webhook/code-exec endpoints).
- Threat: remote MCP tools may expose privileged operations.
- Mitigations: read-only-by-default credentials, destination allowlists, explicit approval for write/exec actions, per-tool policy enforcement + logging.
### F. Frontend/XSS + token theft
- Threat: XSS can steal local tokens and call APIs.
- Mitigations: reduce unsafe rendering paths, strong CSP, scoped short-lived credentials.
### G. Internal endpoint exposure
- Threat: weak/unset `INTERNAL_KEY` enables internal API abuse.
- Mitigations: fail closed, require strong random keys, keep internal APIs private.
### H. DoS and cost abuse
- Threat: request floods, large ingestion jobs, expensive prompts/crawls.
- Mitigations: rate limits, quotas, timeouts, queue backpressure, usage budgets.
## 8) Example attacker stories
- Internet-exposed deployment runs with weak/no auth and receives unauthorized data access/abuse.
- Intranet deployment intentionally using weak/no auth is vulnerable to insider misuse and lateral-movement abuse.
- Crafted archive attempts path traversal during extraction.
- Malicious URL/redirect chain targets internal services.
- Poisoned document causes data exfiltration through tool calls.
- Over-privileged SQL/API/MCP tool performs destructive side effects.
## 9) Severity calibration
- **Critical:** unauthenticated public data access; prompt-injection-driven exfiltration; SSRF to sensitive internal endpoints.
- **High:** cross-tenant leakage, persistent token compromise, over-privileged destructive tools.
- **Medium:** DoS/cost amplification and non-critical information disclosure.
- **Low:** minor hardening gaps with limited impact.
## 10) Baseline controls for public deployments
1. Enforce authentication and secure defaults.
2. Set/rotate strong secrets (`JWT`, `INTERNAL_KEY`, encryption keys).
3. Restrict CORS and front API with a hardened proxy.
4. Add rate limiting/quotas for answer/upload/crawl/token endpoints.
5. Enforce URL+redirect SSRF protections and egress restrictions.
6. Apply upload/archive/parsing hardening.
7. Require least-privilege tool credentials and auditable tool execution.
8. Monitor auth failures, tool anomalies, ingestion spikes, and cost anomalies.
9. Keep dependencies/images patched and scanned.
10. Validate multi-tenant isolation with explicit tests.
## 11) Maintenance
Review this model after major auth, ingestion, connector, tool, or workflow changes.
## References
- [OWASP Top 10 for LLM Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/)
- [OWASP ASVS](https://owasp.org/www-project-application-security-verification-standard/)
- [STRIDE overview](https://learn.microsoft.com/azure/security/develop/threat-modeling-tool-threats)
- [DocsGPT SECURITY.md](../SECURITY.md)

View File

@@ -1,46 +1,80 @@
Ollama
Qdrant
Milvus
Chatwoot
Nextra
VSCode
npm
LLMs
Agentic
Anthropic's
api
APIs
Groq
SGLang
LMDeploy
OAuth
Vite
LLM
JSONPath
UIs
Atlassian
automations
autoescaping
Autoescaping
backfill
backfills
bool
boolean
brave_web_search
chatbot
Chatwoot
config
configs
uncomment
qdrant
vectorstore
CSVs
dev
diarization
Docling
docsgpt
llm
docstrings
Entra
env
enqueues
EOL
ESLint
feedbacks
Figma
GPUs
Groq
hardcode
hardcoding
Idempotency
JSONPath
kubectl
Lightsail
enqueues
chatbot
VSCode's
Shareability
feedbacks
automations
llama_cpp
llm
LLM
LLMs
LMDeploy
Milvus
Mixtral
namespace
namespaces
needs_auth
Nextra
Novita
npm
OAuth
Ollama
opencode
parsable
passthrough
PDFs
pgvector
Postgres
Premade
Signup
Pydantic
pytest
Qdrant
qdrant
Repo
repo
env
URl
agentic
llama_cpp
parsable
Sanitization
SDKs
boolean
bool
hardcode
EOL
SGLang
Shareability
Signup
Supabase
UIs
uncomment
URl
vectorstore
Vite
VSCode
VSCode's
widget's

View File

@@ -11,7 +11,6 @@ on:
permissions:
contents: read
pull-requests: write
jobs:
vale:
@@ -20,11 +19,16 @@ jobs:
- name: Checkout code
uses: actions/checkout@v4
- name: Vale linter
uses: errata-ai/vale-action@v2
with:
files: docs
fail_on_error: false
version: 3.0.5
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Install Vale
run: |
curl -fsSL -o vale.tar.gz \
https://github.com/errata-ai/vale/releases/download/v3.0.5/vale_3.0.5_Linux_64-bit.tar.gz
tar -xzf vale.tar.gz
sudo mv vale /usr/local/bin/vale
vale --version
- name: Sync Vale packages
run: vale sync
- name: Run Vale
run: vale --minAlertLevel=error docs

25
.github/workflows/zizmor.yml vendored Normal file
View File

@@ -0,0 +1,25 @@
name: GitHub Actions Security Analysis
on:
push:
branches: ["master"]
pull_request:
branches: ["**"]
permissions: {}
jobs:
zizmor:
runs-on: ubuntu-latest
permissions:
security-events: write # Required for upload-sarif (used by zizmor-action) to upload SARIF files.
steps:
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Run zizmor 🌈
uses: zizmorcore/zizmor-action@71321a20a9ded102f6e9ce5718a2fcec2c4f70d8 # v0.5.2

11
.gitignore vendored
View File

@@ -108,6 +108,8 @@ celerybeat.pid
# Environments
.env
.venv
# Machine-specific Claude Code guidance (see CLAUDE.md preamble)
CLAUDE.md
env/
venv/
ENV/
@@ -181,5 +183,14 @@ application/vectors/
node_modules/
.vscode/settings.json
.vscode/sftp.json
/models/
model/
# E2E test artifacts
.e2e-tmp/
/tmp/docsgpt-e2e/
tests/e2e/node_modules/
tests/e2e/playwright-report/
tests/e2e/test-results/
tests/e2e/.e2e-last-run.json

View File

@@ -1,5 +1,7 @@
MinAlertLevel = warning
StylesPath = .github/styles
Vocab = DocsGPT
[*.{md,mdx}]
BasedOnStyles = DocsGPT

View File

@@ -10,9 +10,15 @@
For feature work, do **not** assume the environment needs to be recreated.
- Check whether the user already has a Python virtual environment such as `venv/` or `.venv/`.
- Check whether MongoDB is already running.
- Check whether Postgres is already running and reachable via `POSTGRES_URI` (the canonical user-data store).
- Check whether Redis is already running.
- Reuse what is already working. Do not stop or recreate MongoDB, Redis, or the Python environment unless the task is environment setup or troubleshooting.
- Reuse what is already working. Do not stop or recreate Postgres, Redis, or the Python environment unless the task is environment setup or troubleshooting.
> MongoDB is **not** required for the default install. It is only needed if
> the user opts into the Mongo vector-store backend (`VECTOR_STORE=mongodb`)
> or is running the one-shot `scripts/db/backfill.py` to migrate existing
> user data from the legacy Mongo-based install. In those cases, `pymongo`
> is available as an optional extra, not a core dependency.
## Normal local development commands

View File

@@ -29,7 +29,7 @@
<div align="center">
<br>
<img src="https://d3dg1063dc54p9.cloudfront.net/videos/demov7.gif" alt="video-example-of-docs-gpt" width="800" height="450">
<img src="https://d3dg1063dc54p9.cloudfront.net/videos/demo-26.gif" alt="video-example-of-docs-gpt" width="800" height="480">
</div>
<h3 align="left">
<strong>Key Features:</strong>

View File

@@ -2,13 +2,21 @@
## Supported Versions
Supported Versions:
Currently, we support security patches by committing changes and bumping the version published on Github.
Security patches target the latest release and the `main` branch. We recommend always running the most recent version.
## Reporting a Vulnerability
Found a vulnerability? Please email us:
Preferred method: use GitHub's private vulnerability reporting flow:
https://github.com/arc53/DocsGPT/security
security@arc53.com
Then click **Report a vulnerability**.
Alternatively, email us at: security@arc53.com
We aim to acknowledge reports within 48 hours.
## Incident Handling
For the public incident response process, see [`INCIDENT_RESPONSE.md`](./.github/INCIDENT_RESPONSE.md). If you believe an active exploit is occurring, include **URGENT** in your report subject line.

View File

@@ -1,7 +1,8 @@
import json
import logging
import uuid
from abc import ABC, abstractmethod
from typing import Dict, Generator, List, Optional
from typing import Any, Dict, Generator, List, Optional
from application.agents.tool_executor import ToolExecutor
from application.core.json_schema_utils import (
@@ -9,6 +10,7 @@ from application.core.json_schema_utils import (
normalize_json_schema_payload,
)
from application.core.settings import settings
from application.llm.handlers.base import ToolCall
from application.llm.handlers.handler_creator import LLMHandlerCreator
from application.llm.llm_creator import LLMCreator
from application.logging import build_stack_data, log_activity, LogContext
@@ -113,6 +115,153 @@ class BaseAgent(ABC):
) -> Generator[Dict, None, None]:
pass
def gen_continuation(
self,
messages: List[Dict],
tools_dict: Dict,
pending_tool_calls: List[Dict],
tool_actions: List[Dict],
) -> Generator[Dict, None, None]:
"""Resume generation after tool actions are resolved.
Processes the client-provided *tool_actions* (approvals, denials,
or client-side results), appends the resulting messages, then
hands back to the LLM to continue the conversation.
Args:
messages: The saved messages array from the pause point.
tools_dict: The saved tools dictionary.
pending_tool_calls: The pending tool call descriptors from the pause.
tool_actions: Client-provided actions resolving the pending calls.
"""
self._prepare_tools(tools_dict)
actions_by_id = {a["call_id"]: a for a in tool_actions}
# Build a single assistant message containing all tool calls so
# the message history matches the format LLM providers expect
# (one assistant message with N tool_calls, followed by N tool results).
tc_objects: List[Dict[str, Any]] = []
for pending in pending_tool_calls:
call_id = pending["call_id"]
args = pending["arguments"]
args_str = (
json.dumps(args) if isinstance(args, dict) else (args or "{}")
)
tc_obj: Dict[str, Any] = {
"id": call_id,
"type": "function",
"function": {
"name": pending["name"],
"arguments": args_str,
},
}
if pending.get("thought_signature"):
tc_obj["thought_signature"] = pending["thought_signature"]
tc_objects.append(tc_obj)
messages.append({
"role": "assistant",
"content": None,
"tool_calls": tc_objects,
})
# Now process each pending call and append tool result messages
for pending in pending_tool_calls:
call_id = pending["call_id"]
args = pending["arguments"]
action = actions_by_id.get(call_id)
if not action:
action = {
"call_id": call_id,
"decision": "denied",
"comment": "No response provided",
}
if action.get("decision") == "approved":
# Execute the tool server-side
tc = ToolCall(
id=call_id,
name=pending["name"],
arguments=(
json.dumps(args) if isinstance(args, dict) else args
),
)
tool_gen = self._execute_tool_action(tools_dict, tc)
tool_response = None
while True:
try:
event = next(tool_gen)
yield event
except StopIteration as e:
tool_response, _ = e.value
break
messages.append(
self.llm_handler.create_tool_message(tc, tool_response)
)
elif action.get("decision") == "denied":
comment = action.get("comment", "")
denial = (
f"Tool execution denied by user. Reason: {comment}"
if comment
else "Tool execution denied by user."
)
tc = ToolCall(
id=call_id, name=pending["name"], arguments=args
)
messages.append(
self.llm_handler.create_tool_message(tc, denial)
)
yield {
"type": "tool_call",
"data": {
"tool_name": pending.get("tool_name", "unknown"),
"call_id": call_id,
"action_name": pending.get("llm_name", pending["name"]),
"arguments": args,
"status": "denied",
},
}
elif "result" in action:
result = action["result"]
result_str = (
json.dumps(result)
if not isinstance(result, str)
else result
)
tc = ToolCall(
id=call_id, name=pending["name"], arguments=args
)
messages.append(
self.llm_handler.create_tool_message(tc, result_str)
)
yield {
"type": "tool_call",
"data": {
"tool_name": pending.get("tool_name", "unknown"),
"call_id": call_id,
"action_name": pending.get("llm_name", pending["name"]),
"arguments": args,
"result": (
result_str[:50] + "..."
if len(result_str) > 50
else result_str
),
"status": "completed",
},
}
# Resume the LLM loop with the updated messages
llm_response = self._llm_gen(messages)
yield from self._handle_response(
llm_response, tools_dict, messages, None
)
yield {"sources": self.retrieved_docs}
yield {"tool_calls": self._get_truncated_tool_calls()}
# ---- Tool delegation (thin wrappers around ToolExecutor) ----
@property
@@ -267,28 +416,35 @@ class BaseAgent(ABC):
if "tool_calls" in i:
for tool_call in i["tool_calls"]:
call_id = tool_call.get("call_id") or str(uuid.uuid4())
function_call_dict = {
"function_call": {
"name": tool_call.get("action_name"),
"args": tool_call.get("arguments"),
"call_id": call_id,
}
}
function_response_dict = {
"function_response": {
"name": tool_call.get("action_name"),
"response": {"result": tool_call.get("result")},
"call_id": call_id,
}
}
messages.append(
{"role": "assistant", "content": [function_call_dict]}
args = tool_call.get("arguments")
args_str = (
json.dumps(args)
if isinstance(args, dict)
else (args or "{}")
)
messages.append(
{"role": "tool", "content": [function_response_dict]}
messages.append({
"role": "assistant",
"content": None,
"tool_calls": [{
"id": call_id,
"type": "function",
"function": {
"name": tool_call.get("action_name", ""),
"arguments": args_str,
},
}],
})
result = tool_call.get("result")
result_str = (
json.dumps(result)
if not isinstance(result, str)
else (result or "")
)
messages.append({
"role": "tool",
"tool_call_id": call_id,
"content": result_str,
})
messages.append({"role": "user", "content": query})
return messages

View File

@@ -593,16 +593,22 @@ class ResearchAgent(BaseAgent):
)
result = result_str
function_call_content = {
"function_call": {
"name": call.name,
"args": call.arguments,
"call_id": call_id,
}
}
messages.append(
{"role": "assistant", "content": [function_call_content]}
import json as _json
args_str = (
_json.dumps(call.arguments)
if isinstance(call.arguments, dict)
else call.arguments
)
messages.append({
"role": "assistant",
"content": None,
"tool_calls": [{
"id": call_id,
"type": "function",
"function": {"name": call.name, "arguments": args_str},
}],
})
tool_message = self.llm_handler.create_tool_message(call, result)
messages.append(tool_message)

View File

@@ -1,14 +1,14 @@
import logging
import uuid
from typing import Dict, List, Optional
from bson.objectid import ObjectId
from collections import Counter
from typing import Dict, List, Optional, Tuple
from application.agents.tools.tool_action_parser import ToolActionParser
from application.agents.tools.tool_manager import ToolManager
from application.core.mongo_db import MongoDB
from application.core.settings import settings
from application.security.encryption import decrypt_credentials
from application.storage.db.repositories.agents import AgentsRepository
from application.storage.db.repositories.user_tools import UserToolsRepository
from application.storage.db.session import db_readonly
logger = logging.getLogger(__name__)
@@ -31,63 +31,166 @@ class ToolExecutor:
self.tool_calls: List[Dict] = []
self._loaded_tools: Dict[str, object] = {}
self.conversation_id: Optional[str] = None
self.client_tools: Optional[List[Dict]] = None
self._name_to_tool: Dict[str, Tuple[str, str]] = {}
self._tool_to_name: Dict[Tuple[str, str], str] = {}
def get_tools(self) -> Dict[str, Dict]:
"""Load tool configs from DB based on user context."""
"""Load tool configs from DB based on user context.
If *client_tools* have been set on this executor, they are
automatically merged into the returned dict.
"""
if self.user_api_key:
return self._get_tools_by_api_key(self.user_api_key)
return self._get_user_tools(self.user or "local")
tools = self._get_tools_by_api_key(self.user_api_key)
else:
tools = self._get_user_tools(self.user or "local")
if self.client_tools:
self.merge_client_tools(tools, self.client_tools)
return tools
def _get_tools_by_api_key(self, api_key: str) -> Dict[str, Dict]:
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
agents_collection = db["agents"]
tools_collection = db["user_tools"]
agent_data = agents_collection.find_one({"key": api_key})
tool_ids = agent_data.get("tools", []) if agent_data else []
tools = (
tools_collection.find(
{"_id": {"$in": [ObjectId(tool_id) for tool_id in tool_ids]}}
)
if tool_ids
else []
)
tools = list(tools)
return {str(tool["_id"]): tool for tool in tools} if tools else {}
# Per-operation session: the answer pipeline spans a long-lived
# generator; wrapping it in a single connection would pin a PG
# conn for the whole stream. Open, fetch, close.
with db_readonly() as conn:
agent_data = AgentsRepository(conn).find_by_key(api_key)
tool_ids = agent_data.get("tools", []) if agent_data else []
if not tool_ids:
return {}
tools_repo = UserToolsRepository(conn)
tools: List[Dict] = []
owner = (agent_data.get("user_id") or agent_data.get("user")) if agent_data else None
for tid in tool_ids:
row = None
if owner:
row = tools_repo.get_any(str(tid), owner)
if row is not None:
tools.append(row)
return {str(tool["id"]): tool for tool in tools} if tools else {}
def _get_user_tools(self, user: str = "local") -> Dict[str, Dict]:
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
user_tools_collection = db["user_tools"]
user_tools = user_tools_collection.find({"user": user, "status": True})
user_tools = list(user_tools)
with db_readonly() as conn:
user_tools = UserToolsRepository(conn).list_active_for_user(user)
return {str(i): tool for i, tool in enumerate(user_tools)}
def prepare_tools_for_llm(self, tools_dict: Dict) -> List[Dict]:
"""Convert tool configs to LLM function schemas."""
return [
{
"type": "function",
"function": {
"name": f"{action['name']}_{tool_id}",
"description": action["description"],
"parameters": self._build_tool_parameters(action),
},
def merge_client_tools(
self, tools_dict: Dict, client_tools: List[Dict]
) -> Dict:
"""Merge client-provided tool definitions into tools_dict.
Client tools use the standard function-calling format::
[{"type": "function", "function": {"name": "get_weather",
"description": "...", "parameters": {...}}}]
They are stored in *tools_dict* with ``client_side: True`` so that
:meth:`check_pause` returns a pause signal instead of trying to
execute them server-side.
Args:
tools_dict: The mutable server tools dict (will be modified in place).
client_tools: List of tool definitions in function-calling format.
Returns:
The updated *tools_dict* (same reference, for convenience).
"""
for i, ct in enumerate(client_tools):
func = ct.get("function", ct) # tolerate bare {"name":..} too
name = func.get("name", f"clienttool{i}")
tool_id = f"ct{i}"
tools_dict[tool_id] = {
"name": name,
"client_side": True,
"actions": [
{
"name": name,
"description": func.get("description", ""),
"active": True,
"parameters": func.get("parameters", {}),
}
],
}
for tool_id, tool in tools_dict.items()
if (
(tool["name"] == "api_tool" and "actions" in tool.get("config", {}))
or (tool["name"] != "api_tool" and "actions" in tool)
)
for action in (
return tools_dict
def prepare_tools_for_llm(self, tools_dict: Dict) -> List[Dict]:
"""Convert tool configs to LLM function schemas.
Action names are kept clean for the LLM:
- Unique action names appear as-is (e.g. ``get_weather``).
- Duplicate action names get numbered suffixes (e.g. ``search_1``,
``search_2``).
A reverse mapping is stored in ``_name_to_tool`` so that tool calls
can be routed back to the correct ``(tool_id, action_name)`` without
brittle string splitting.
"""
# Pass 1: collect entries and count action name occurrences
entries: List[Tuple[str, str, Dict, bool]] = [] # (tool_id, action_name, action, is_client)
name_counts: Counter = Counter()
for tool_id, tool in tools_dict.items():
is_api = tool["name"] == "api_tool"
is_client = tool.get("client_side", False)
if is_api and "actions" not in tool.get("config", {}):
continue
if not is_api and "actions" not in tool:
continue
actions = (
tool["config"]["actions"].values()
if tool["name"] == "api_tool"
if is_api
else tool["actions"]
)
if action.get("active", True)
]
for action in actions:
if not action.get("active", True):
continue
entries.append((tool_id, action["name"], action, is_client))
name_counts[action["name"]] += 1
# Pass 2: assign LLM-visible names and build mappings
self._name_to_tool = {}
self._tool_to_name = {}
collision_counters: Dict[str, int] = {}
all_llm_names: set = set()
result = []
for tool_id, action_name, action, is_client in entries:
if name_counts[action_name] == 1:
llm_name = action_name
else:
counter = collision_counters.get(action_name, 1)
candidate = f"{action_name}_{counter}"
# Skip if candidate collides with a unique action name
while candidate in all_llm_names or (
candidate in name_counts and name_counts[candidate] == 1
):
counter += 1
candidate = f"{action_name}_{counter}"
collision_counters[action_name] = counter + 1
llm_name = candidate
all_llm_names.add(llm_name)
self._name_to_tool[llm_name] = (tool_id, action_name)
self._tool_to_name[(tool_id, action_name)] = llm_name
if is_client:
params = action.get("parameters", {})
else:
params = self._build_tool_parameters(action)
result.append({
"type": "function",
"function": {
"name": llm_name,
"description": action.get("description", ""),
"parameters": params,
},
})
return result
def _build_tool_parameters(self, action: Dict) -> Dict:
params = {"type": "object", "properties": {}, "required": []}
@@ -104,23 +207,81 @@ class ToolExecutor:
params["required"].append(k)
return params
def check_pause(
self, tools_dict: Dict, call, llm_class_name: str
) -> Optional[Dict]:
"""Check if a tool call requires pausing for approval or client execution.
Returns a dict describing the pending action if pause is needed, None otherwise.
"""
parser = ToolActionParser(llm_class_name, name_mapping=self._name_to_tool)
tool_id, action_name, call_args = parser.parse_args(call)
call_id = getattr(call, "id", None) or str(uuid.uuid4())
llm_name = getattr(call, "name", "")
if tool_id is None or action_name is None or tool_id not in tools_dict:
return None # Will be handled as error by execute()
tool_data = tools_dict[tool_id]
# Client-side tools
if tool_data.get("client_side"):
return {
"call_id": call_id,
"name": llm_name,
"tool_name": tool_data.get("name", "unknown"),
"tool_id": tool_id,
"action_name": action_name,
"llm_name": llm_name,
"arguments": call_args if isinstance(call_args, dict) else {},
"pause_type": "requires_client_execution",
"thought_signature": getattr(call, "thought_signature", None),
}
# Approval required
if tool_data["name"] == "api_tool":
action_data = tool_data.get("config", {}).get("actions", {}).get(
action_name, {}
)
else:
action_data = next(
(a for a in tool_data.get("actions", []) if a["name"] == action_name),
{},
)
if action_data.get("require_approval"):
return {
"call_id": call_id,
"name": llm_name,
"tool_name": tool_data.get("name", "unknown"),
"tool_id": tool_id,
"action_name": action_name,
"llm_name": llm_name,
"arguments": call_args if isinstance(call_args, dict) else {},
"pause_type": "awaiting_approval",
"thought_signature": getattr(call, "thought_signature", None),
}
return None
def execute(self, tools_dict: Dict, call, llm_class_name: str):
"""Execute a tool call. Yields status events, returns (result, call_id)."""
parser = ToolActionParser(llm_class_name)
parser = ToolActionParser(llm_class_name, name_mapping=self._name_to_tool)
tool_id, action_name, call_args = parser.parse_args(call)
llm_name = getattr(call, "name", "unknown")
call_id = getattr(call, "id", None) or str(uuid.uuid4())
if tool_id is None or action_name is None:
error_message = f"Error: Failed to parse LLM tool call. Tool name: {getattr(call, 'name', 'unknown')}"
error_message = f"Error: Failed to parse LLM tool call. Tool name: {llm_name}"
logger.error(error_message)
tool_call_data = {
"tool_name": "unknown",
"call_id": call_id,
"action_name": getattr(call, "name", "unknown"),
"action_name": llm_name,
"arguments": call_args or {},
"result": f"Failed to parse tool call. Invalid tool name format: {getattr(call, 'name', 'unknown')}",
"result": f"Failed to parse tool call. Invalid tool name format: {llm_name}",
}
yield {"type": "tool_call", "data": {**tool_call_data, "status": "error"}}
self.tool_calls.append(tool_call_data)
@@ -133,7 +294,7 @@ class ToolExecutor:
tool_call_data = {
"tool_name": "unknown",
"call_id": call_id,
"action_name": f"{action_name}_{tool_id}",
"action_name": llm_name,
"arguments": call_args,
"result": f"Tool with ID {tool_id} not found. Available tools: {list(tools_dict.keys())}",
}
@@ -144,7 +305,7 @@ class ToolExecutor:
tool_call_data = {
"tool_name": tools_dict[tool_id]["name"],
"call_id": call_id,
"action_name": f"{action_name}_{tool_id}",
"action_name": llm_name,
"arguments": call_args,
}
yield {"type": "tool_call", "data": {**tool_call_data, "status": "pending"}}
@@ -185,7 +346,21 @@ class ToolExecutor:
target_dict[param] = value
# Load tool (with caching)
tool = self._get_or_load_tool(tool_data, tool_id, action_name)
tool = self._get_or_load_tool(
tool_data, tool_id, action_name,
headers=headers, query_params=query_params,
)
if tool is None:
error_message = (
f"Failed to load tool '{tool_data.get('name')}' (tool_id key={tool_id}): "
"missing 'id' on tool row."
)
logger.error(error_message)
tool_call_data["result"] = error_message
yield {"type": "tool_call", "data": {**tool_call_data, "status": "error"}}
self.tool_calls.append(tool_call_data)
return error_message, call_id
resolved_arguments = (
{"query_params": query_params, "headers": headers, "body": body}
@@ -238,7 +413,10 @@ class ToolExecutor:
return result, call_id
def _get_or_load_tool(self, tool_data: Dict, tool_id: str, action_name: str):
def _get_or_load_tool(
self, tool_data: Dict, tool_id: str, action_name: str,
headers: Optional[Dict] = None, query_params: Optional[Dict] = None,
):
"""Load a tool, using cache when possible."""
cache_key = f"{tool_data['name']}:{tool_id}:{self.user or ''}"
if cache_key in self._loaded_tools:
@@ -251,8 +429,8 @@ class ToolExecutor:
tool_config = {
"url": action_config["url"],
"method": action_config["method"],
"headers": {},
"query_params": {},
"headers": headers or {},
"query_params": query_params or {},
}
if "body_content_type" in action_config:
tool_config["body_content_type"] = action_config.get(
@@ -270,7 +448,16 @@ class ToolExecutor:
tool_config.update(decrypted)
tool_config["auth_credentials"] = decrypted
tool_config.pop("encrypted_credentials", None)
tool_config["tool_id"] = str(tool_data.get("_id", tool_id))
row_id = tool_data.get("id")
if not row_id:
logger.error(
"Tool data missing 'id' for tool name=%s (enumerate-key tool_id=%s); "
"skipping load to avoid binding a non-UUID downstream.",
tool_data.get("name"),
tool_id,
)
return None
tool_config["tool_id"] = str(row_id)
if self.conversation_id:
tool_config["conversation_id"] = self.conversation_id
if tool_data["name"] == "mcp_tool":

View File

@@ -2,6 +2,8 @@ from abc import ABC, abstractmethod
class Tool(ABC):
internal: bool = False
@abstractmethod
def execute_action(self, action_name: str, **kwargs):
pass

View File

@@ -73,7 +73,7 @@ class BraveSearchTool(Tool):
"X-Subscription-Token": self.token,
}
response = requests.get(url, params=params, headers=headers)
response = requests.get(url, params=params, headers=headers, timeout=100)
if response.status_code == 200:
return {
@@ -118,7 +118,7 @@ class BraveSearchTool(Tool):
"X-Subscription-Token": self.token,
}
response = requests.get(url, params=params, headers=headers)
response = requests.get(url, params=params, headers=headers, timeout=100)
if response.status_code == 200:
return {

View File

@@ -28,7 +28,7 @@ class CryptoPriceTool(Tool):
returns price in USD.
"""
url = f"https://min-api.cryptocompare.com/data/price?fsym={symbol.upper()}&tsyms={currency.upper()}"
response = requests.get(url)
response = requests.get(url, timeout=100)
if response.status_code == 200:
data = response.json()
if currency.upper() in data:

View File

@@ -20,6 +20,8 @@ class InternalSearchTool(Tool):
- list_files action: browse the file/folder structure
"""
internal = True
def __init__(self, config: Dict):
self.config = config
self.retrieved_docs: List[Dict] = []
@@ -46,7 +48,7 @@ class InternalSearchTool(Tool):
return self._retriever
def _get_directory_structure(self) -> Optional[Dict]:
"""Load directory structure from MongoDB for the configured sources."""
"""Load directory structure from Postgres for the configured sources."""
if self._dir_structure_loaded:
return self._directory_structure
@@ -57,35 +59,39 @@ class InternalSearchTool(Tool):
return None
try:
from bson.objectid import ObjectId
from application.core.mongo_db import MongoDB
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
sources_collection = db["sources"]
# Per-operation session: this tool runs inside the answer
# generator hot path, so we open a short-lived read
# connection for the batch lookup and release immediately.
from application.storage.db.repositories.sources import (
SourcesRepository,
)
from application.storage.db.session import db_readonly
if isinstance(active_docs, str):
active_docs = [active_docs]
decoded_token = self.config.get("decoded_token") or {}
user_id = decoded_token.get("sub") if decoded_token else None
merged_structure = {}
for doc_id in active_docs:
try:
source_doc = sources_collection.find_one(
{"_id": ObjectId(doc_id)}
)
if not source_doc:
continue
dir_str = source_doc.get("directory_structure")
if dir_str:
if isinstance(dir_str, str):
dir_str = json.loads(dir_str)
source_name = source_doc.get("name", doc_id)
if len(active_docs) > 1:
merged_structure[source_name] = dir_str
else:
merged_structure = dir_str
except Exception as e:
logger.debug(f"Could not load dir structure for {doc_id}: {e}")
with db_readonly() as conn:
repo = SourcesRepository(conn)
for doc_id in active_docs:
try:
source_doc = repo.get_any(str(doc_id), user_id) if user_id else None
if not source_doc:
continue
dir_str = source_doc.get("directory_structure")
if dir_str:
if isinstance(dir_str, str):
dir_str = json.loads(dir_str)
source_name = source_doc.get("name", doc_id)
if len(active_docs) > 1:
merged_structure[source_name] = dir_str
else:
merged_structure = dir_str
except Exception as e:
logger.debug(f"Could not load dir structure for {doc_id}: {e}")
self._directory_structure = merged_structure if merged_structure else None
except Exception as e:
@@ -355,32 +361,48 @@ INTERNAL_TOOL_ENTRY = build_internal_tool_entry(has_directory_structure=False)
def sources_have_directory_structure(source: Dict) -> bool:
"""Check if any of the active sources have directory_structure in MongoDB."""
"""Check if any of the active sources have a ``directory_structure`` row."""
active_docs = source.get("active_docs", [])
if not active_docs:
return False
try:
from bson.objectid import ObjectId
from application.core.mongo_db import MongoDB
# TODO(pg-cutover): SourcesRepository.get_any requires ``user_id``
# scoping, but callers in the agent build path don't always
# thread the decoded token through here. Use a direct
# short-lived SQL lookup instead of the repo until the call
# sites are updated to propagate user context.
from sqlalchemy import text as _text
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
sources_collection = db["sources"]
from application.storage.db.session import db_readonly
if isinstance(active_docs, str):
active_docs = [active_docs]
for doc_id in active_docs:
try:
source_doc = sources_collection.find_one(
{"_id": ObjectId(doc_id)},
{"directory_structure": 1},
)
if source_doc and source_doc.get("directory_structure"):
return True
except Exception:
continue
with db_readonly() as conn:
for doc_id in active_docs:
try:
value = str(doc_id)
if len(value) == 36 and "-" in value:
row = conn.execute(
_text(
"SELECT directory_structure FROM sources "
"WHERE id = CAST(:id AS uuid)"
),
{"id": value},
).fetchone()
else:
row = conn.execute(
_text(
"SELECT directory_structure FROM sources "
"WHERE legacy_mongo_id = :lid"
),
{"lid": value},
).fetchone()
if row is not None and row[0]:
return True
except Exception:
continue
except Exception as e:
logger.debug(f"Could not check directory structure: {e}")

View File

@@ -22,15 +22,12 @@ from redis import Redis
from application.agents.tools.base import Tool
from application.api.user.tasks import mcp_oauth_status_task, mcp_oauth_task
from application.cache import get_redis_instance
from application.core.mongo_db import MongoDB
from application.core.settings import settings
from application.core.url_validation import SSRFError, validate_url
from application.security.encryption import decrypt_credentials
logger = logging.getLogger(__name__)
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
_mcp_clients_cache = {}
@@ -61,7 +58,8 @@ class MCPTool(Tool):
"""
self.config = config
self.user_id = user_id
self.server_url = config.get("server_url", "")
raw_url = config.get("server_url", "")
self.server_url = self._validate_server_url(raw_url) if raw_url else ""
self.transport_type = config.get("transport_type", "auto")
self.auth_type = config.get("auth_type", "none")
self.timeout = config.get("timeout", 30)
@@ -87,6 +85,18 @@ class MCPTool(Tool):
if self.server_url and self.auth_type != "oauth":
self._setup_client()
@staticmethod
def _validate_server_url(server_url: str) -> str:
"""Validate server_url to prevent SSRF to internal networks.
Raises:
ValueError: If the URL points to a private/internal address.
"""
try:
return validate_url(server_url)
except SSRFError as exc:
raise ValueError(f"Invalid MCP server URL: {exc}") from exc
def _resolve_redirect_uri(self, configured_redirect_uri: Optional[str]) -> str:
if configured_redirect_uri:
return configured_redirect_uri.rstrip("/")
@@ -108,8 +118,9 @@ class MCPTool(Tool):
auth_key = ""
if self.auth_type == "oauth":
scopes_str = ",".join(self.oauth_scopes) if self.oauth_scopes else "none"
oauth_identity = self.user_id or self.oauth_task_id or "anonymous"
auth_key = (
f"oauth:{self.oauth_client_name}:{scopes_str}:{self.redirect_uri}"
f"oauth:{oauth_identity}:{self.oauth_client_name}:{scopes_str}:{self.redirect_uri}"
)
elif self.auth_type in ["bearer"]:
token = self.auth_credentials.get(
@@ -146,7 +157,6 @@ class MCPTool(Tool):
scopes=self.oauth_scopes,
redis_client=redis_client,
redirect_uri=self.redirect_uri,
db=db,
user_id=self.user_id,
)
else:
@@ -156,7 +166,6 @@ class MCPTool(Tool):
redis_client=redis_client,
redirect_uri=self.redirect_uri,
task_id=self.oauth_task_id,
db=db,
user_id=self.user_id,
)
elif self.auth_type == "bearer":
@@ -476,7 +485,7 @@ class MCPTool(Tool):
def _test_oauth_connection(self) -> Dict:
storage = DBTokenStorage(
server_url=self.server_url, user_id=self.user_id, db_client=db
server_url=self.server_url, user_id=self.user_id,
)
loop = asyncio.new_event_loop()
try:
@@ -668,7 +677,6 @@ class DocsGPTOAuth(OAuthClientProvider):
scopes: str | list[str] | None = None,
client_name: str = "DocsGPT-MCP",
user_id=None,
db=None,
additional_client_metadata: dict[str, Any] | None = None,
skip_redirect_validation: bool = False,
):
@@ -677,7 +685,6 @@ class DocsGPTOAuth(OAuthClientProvider):
self.redis_prefix = redis_prefix
self.task_id = task_id
self.user_id = user_id
self.db = db
parsed_url = urlparse(mcp_url)
self.server_base_url = f"{parsed_url.scheme}://{parsed_url.netloc}"
@@ -696,7 +703,6 @@ class DocsGPTOAuth(OAuthClientProvider):
storage = DBTokenStorage(
server_url=self.server_base_url,
user_id=self.user_id,
db_client=self.db,
expected_redirect_uri=None if skip_redirect_validation else redirect_uri,
)
@@ -838,54 +844,95 @@ class DBTokenStorage(TokenStorage):
self,
server_url: str,
user_id: str,
db_client,
expected_redirect_uri: Optional[str] = None,
):
self.server_url = server_url
self.user_id = user_id
self.db_client = db_client
self.expected_redirect_uri = expected_redirect_uri
self.collection = db_client["connector_sessions"]
@staticmethod
def get_base_url(url: str) -> str:
parsed = urlparse(url)
return f"{parsed.scheme}://{parsed.netloc}"
def get_db_key(self) -> dict:
return {
"server_url": self.get_base_url(self.server_url),
"user_id": self.user_id,
}
def _pg_provider(self) -> str:
return f"mcp:{self.get_base_url(self.server_url)}"
def _fetch_session_data(self) -> dict:
"""Read the JSONB ``session_data`` blob for this MCP server row."""
from application.storage.db.repositories.connector_sessions import (
ConnectorSessionsRepository,
)
from application.storage.db.session import db_readonly
base_url = self.get_base_url(self.server_url)
with db_readonly() as conn:
row = ConnectorSessionsRepository(conn).get_by_user_and_server_url(
self.user_id, base_url,
)
if not row:
return {}
data = row.get("session_data") or {}
if isinstance(data, str):
try:
data = json.loads(data)
except ValueError:
return {}
return data if isinstance(data, dict) else {}
async def get_tokens(self) -> OAuthToken | None:
doc = await asyncio.to_thread(self.collection.find_one, self.get_db_key())
if not doc or "tokens" not in doc:
data = await asyncio.to_thread(self._fetch_session_data)
if not data or "tokens" not in data:
return None
try:
return OAuthToken.model_validate(doc["tokens"])
return OAuthToken.model_validate(data["tokens"])
except ValidationError as e:
logger.error("Could not load tokens: %s", e)
return None
async def set_tokens(self, tokens: OAuthToken) -> None:
await asyncio.to_thread(
self.collection.update_one,
self.get_db_key(),
{"$set": {"tokens": tokens.model_dump()}},
True,
def _merge(self, patch: dict) -> None:
"""Shallow-merge ``patch`` into this row's ``session_data``.
Threads ``server_url`` through to the repository so it lands in
the scalar column — ``get_by_user_and_server_url`` needs that to
resolve the row (``NULL = 'https://...'`` is UNKNOWN in SQL).
"""
from application.storage.db.repositories.connector_sessions import (
ConnectorSessionsRepository,
)
logger.info("Saved tokens for %s", self.get_base_url(self.server_url))
from application.storage.db.session import db_session
base_url = self.get_base_url(self.server_url)
with db_session() as conn:
ConnectorSessionsRepository(conn).merge_session_data(
self.user_id, self._pg_provider(), base_url, patch,
)
def _delete(self) -> None:
from application.storage.db.repositories.connector_sessions import (
ConnectorSessionsRepository,
)
from application.storage.db.session import db_session
with db_session() as conn:
ConnectorSessionsRepository(conn).delete(
self.user_id, self._pg_provider(),
)
async def set_tokens(self, tokens: OAuthToken) -> None:
base_url = self.get_base_url(self.server_url)
token_dump = tokens.model_dump()
await asyncio.to_thread(self._merge, {"tokens": token_dump})
logger.info("Saved tokens for %s", base_url)
async def get_client_info(self) -> OAuthClientInformationFull | None:
doc = await asyncio.to_thread(self.collection.find_one, self.get_db_key())
if not doc or "client_info" not in doc:
logger.debug(
"No client_info in DB for %s", self.get_base_url(self.server_url)
)
data = await asyncio.to_thread(self._fetch_session_data)
base_url = self.get_base_url(self.server_url)
if not data or "client_info" not in data:
logger.debug("No client_info in DB for %s", base_url)
return None
try:
client_info = OAuthClientInformationFull.model_validate(doc["client_info"])
client_info = OAuthClientInformationFull.model_validate(data["client_info"])
if self.expected_redirect_uri:
stored_uris = [
str(uri).rstrip("/") for uri in client_info.redirect_uris
@@ -894,14 +941,16 @@ class DBTokenStorage(TokenStorage):
if expected_uri not in stored_uris:
logger.warning(
"Redirect URI mismatch for %s: expected=%s stored=%s — clearing.",
self.get_base_url(self.server_url),
base_url,
expected_uri,
stored_uris,
)
# Drop ``tokens`` and ``client_info`` from the JSONB
# blob via merge_session_data's ``None``-drops-key
# semantics — preserves the row + any other keys.
await asyncio.to_thread(
self.collection.update_one,
self.get_db_key(),
{"$unset": {"client_info": "", "tokens": ""}},
self._merge,
{"tokens": None, "client_info": None},
)
return None
return client_info
@@ -916,22 +965,37 @@ class DBTokenStorage(TokenStorage):
async def set_client_info(self, client_info: OAuthClientInformationFull) -> None:
serialized_info = self._serialize_client_info(client_info.model_dump())
base_url = self.get_base_url(self.server_url)
await asyncio.to_thread(
self.collection.update_one,
self.get_db_key(),
{"$set": {"client_info": serialized_info}},
True,
self._merge, {"client_info": serialized_info},
)
logger.info("Saved client info for %s", self.get_base_url(self.server_url))
logger.info("Saved client info for %s", base_url)
async def clear(self) -> None:
await asyncio.to_thread(self.collection.delete_one, self.get_db_key())
await asyncio.to_thread(self._delete)
logger.info("Cleared OAuth cache for %s", self.get_base_url(self.server_url))
@classmethod
async def clear_all(cls, db_client) -> None:
collection = db_client["connector_sessions"]
await asyncio.to_thread(collection.delete_many, {})
async def clear_all(cls, db_client=None) -> None:
"""Delete every MCP-tagged connector session row.
``db_client`` retained for call-site compatibility but unused —
storage is Postgres-only now.
"""
from sqlalchemy import text
from application.storage.db.session import db_session
def _delete_all() -> None:
with db_session() as conn:
conn.execute(
text(
"DELETE FROM connector_sessions "
"WHERE provider LIKE 'mcp:%'"
)
)
await asyncio.to_thread(_delete_all)
logger.info("Cleared all OAuth client cache data.")

View File

@@ -1,12 +1,14 @@
from datetime import datetime
from pathlib import Path
from typing import Any, Dict, List, Optional
import re
import logging
import uuid
from .base import Tool
from application.core.mongo_db import MongoDB
from application.core.settings import settings
from application.storage.db.repositories.memories import MemoriesRepository
from application.storage.db.session import db_readonly, db_session
logger = logging.getLogger(__name__)
class MemoryTool(Tool):
@@ -27,7 +29,7 @@ class MemoryTool(Tool):
self.user_id: Optional[str] = user_id
# Get tool_id from configuration (passed from user_tools._id in production)
# In production, tool_id is the MongoDB ObjectId string from user_tools collection
# In production, tool_id is the UUID string from user_tools.id.
if tool_config and "tool_id" in tool_config:
self.tool_id = tool_config["tool_id"]
elif user_id:
@@ -37,8 +39,35 @@ class MemoryTool(Tool):
# Last resort fallback (shouldn't happen in normal use)
self.tool_id = str(uuid.uuid4())
db = MongoDB.get_client()[settings.MONGO_DB_NAME]
self.collection = db["memories"]
def _pg_enabled(self) -> bool:
"""Return True if this MemoryTool's tool_id is a real ``user_tools.id``.
The ``memories`` PG table has a UUID foreign key to ``user_tools``.
The sentinel ``default_{uid}`` fallback tool_id is not a UUID and
has no row in ``user_tools``, so any storage operation would fail
the foreign-key check. After the Postgres cutover Postgres is the
only store, so for the sentinel case there is nowhere to read or
write — operations become no-ops and the tool returns an
explanatory error to the caller.
"""
tool_id = getattr(self, "tool_id", None)
if not tool_id or not isinstance(tool_id, str):
return False
if tool_id.startswith("default_"):
logger.debug(
"Skipping Postgres operation for MemoryTool with sentinel tool_id=%s",
tool_id,
)
return False
from application.storage.db.base_repository import looks_like_uuid
if not looks_like_uuid(tool_id):
logger.debug(
"Skipping Postgres operation for MemoryTool with non-UUID tool_id=%s",
tool_id,
)
return False
return True
# -----------------------------
# Action implementations
@@ -56,6 +85,12 @@ class MemoryTool(Tool):
if not self.user_id:
return "Error: MemoryTool requires a valid user_id."
if not self._pg_enabled():
return (
"Error: MemoryTool is not configured with a persistent tool_id; "
"memory storage is unavailable for this session."
)
if action_name == "view":
return self._view(
kwargs.get("path", "/"),
@@ -282,14 +317,10 @@ class MemoryTool(Tool):
# Ensure path ends with / for proper prefix matching
search_path = path if path.endswith("/") else path + "/"
# Find all files that start with this directory path
query = {
"user_id": self.user_id,
"tool_id": self.tool_id,
"path": {"$regex": f"^{re.escape(search_path)}"}
}
docs = list(self.collection.find(query, {"path": 1}))
with db_readonly() as conn:
docs = MemoriesRepository(conn).list_by_prefix(
self.user_id, self.tool_id, search_path
)
if not docs:
return f"Directory: {path}\n(empty)"
@@ -310,7 +341,10 @@ class MemoryTool(Tool):
def _view_file(self, path: str, view_range: Optional[List[int]] = None) -> str:
"""View file contents with optional line range."""
doc = self.collection.find_one({"user_id": self.user_id, "tool_id": self.tool_id, "path": path})
with db_readonly() as conn:
doc = MemoriesRepository(conn).get_by_path(
self.user_id, self.tool_id, path
)
if not doc or not doc.get("content"):
return f"Error: File not found: {path}"
@@ -344,16 +378,10 @@ class MemoryTool(Tool):
if validated_path == "/" or validated_path.endswith("/"):
return "Error: Cannot create a file at directory path."
self.collection.update_one(
{"user_id": self.user_id, "tool_id": self.tool_id, "path": validated_path},
{
"$set": {
"content": file_text,
"updated_at": datetime.now()
}
},
upsert=True
)
with db_session() as conn:
MemoriesRepository(conn).upsert(
self.user_id, self.tool_id, validated_path, file_text
)
return f"File created: {validated_path}"
@@ -366,30 +394,29 @@ class MemoryTool(Tool):
if not old_str:
return "Error: old_str is required."
doc = self.collection.find_one({"user_id": self.user_id, "tool_id": self.tool_id, "path": validated_path})
with db_session() as conn:
repo = MemoriesRepository(conn)
doc = repo.get_by_path(self.user_id, self.tool_id, validated_path)
if not doc or not doc.get("content"):
return f"Error: File not found: {validated_path}"
if not doc or not doc.get("content"):
return f"Error: File not found: {validated_path}"
current_content = str(doc["content"])
current_content = str(doc["content"])
# Check if old_str exists (case-insensitive)
if old_str.lower() not in current_content.lower():
return f"Error: String '{old_str}' not found in file."
# Check if old_str exists (case-insensitive)
if old_str.lower() not in current_content.lower():
return f"Error: String '{old_str}' not found in file."
# Replace the string (case-insensitive)
import re as regex_module
updated_content = regex_module.sub(regex_module.escape(old_str), new_str, current_content, flags=regex_module.IGNORECASE)
# Case-insensitive replace
import re as regex_module
updated_content = regex_module.sub(
regex_module.escape(old_str),
new_str,
current_content,
flags=regex_module.IGNORECASE,
)
self.collection.update_one(
{"user_id": self.user_id, "tool_id": self.tool_id, "path": validated_path},
{
"$set": {
"content": updated_content,
"updated_at": datetime.now()
}
}
)
repo.upsert(self.user_id, self.tool_id, validated_path, updated_content)
return f"File updated: {validated_path}"
@@ -402,31 +429,25 @@ class MemoryTool(Tool):
if not insert_text:
return "Error: insert_text is required."
doc = self.collection.find_one({"user_id": self.user_id, "tool_id": self.tool_id, "path": validated_path})
with db_session() as conn:
repo = MemoriesRepository(conn)
doc = repo.get_by_path(self.user_id, self.tool_id, validated_path)
if not doc or not doc.get("content"):
return f"Error: File not found: {validated_path}"
if not doc or not doc.get("content"):
return f"Error: File not found: {validated_path}"
current_content = str(doc["content"])
lines = current_content.split("\n")
current_content = str(doc["content"])
lines = current_content.split("\n")
# Convert to 0-indexed
index = insert_line - 1
if index < 0 or index > len(lines):
return f"Error: Invalid line number. File has {len(lines)} lines."
# Convert to 0-indexed
index = insert_line - 1
if index < 0 or index > len(lines):
return f"Error: Invalid line number. File has {len(lines)} lines."
lines.insert(index, insert_text)
updated_content = "\n".join(lines)
lines.insert(index, insert_text)
updated_content = "\n".join(lines)
self.collection.update_one(
{"user_id": self.user_id, "tool_id": self.tool_id, "path": validated_path},
{
"$set": {
"content": updated_content,
"updated_at": datetime.now()
}
}
)
repo.upsert(self.user_id, self.tool_id, validated_path, updated_content)
return f"Text inserted at line {insert_line} in {validated_path}"
@@ -438,39 +459,36 @@ class MemoryTool(Tool):
if validated_path == "/":
# Delete all files for this user and tool
result = self.collection.delete_many({"user_id": self.user_id, "tool_id": self.tool_id})
return f"Deleted {result.deleted_count} file(s) from memory."
with db_session() as conn:
deleted = MemoriesRepository(conn).delete_all(
self.user_id, self.tool_id
)
return f"Deleted {deleted} file(s) from memory."
# Check if it's a directory (ends with /)
if validated_path.endswith("/"):
# Delete all files in directory
result = self.collection.delete_many({
"user_id": self.user_id,
"tool_id": self.tool_id,
"path": {"$regex": f"^{re.escape(validated_path)}"}
})
return f"Deleted directory and {result.deleted_count} file(s)."
with db_session() as conn:
deleted = MemoriesRepository(conn).delete_by_prefix(
self.user_id, self.tool_id, validated_path
)
return f"Deleted directory and {deleted} file(s)."
# Try to delete as directory first (without trailing slash)
# Check if any files start with this path + /
# Try as directory first (without trailing slash)
search_path = validated_path + "/"
directory_result = self.collection.delete_many({
"user_id": self.user_id,
"tool_id": self.tool_id,
"path": {"$regex": f"^{re.escape(search_path)}"}
})
with db_session() as conn:
repo = MemoriesRepository(conn)
directory_deleted = repo.delete_by_prefix(
self.user_id, self.tool_id, search_path
)
if directory_deleted > 0:
return f"Deleted directory and {directory_deleted} file(s)."
if directory_result.deleted_count > 0:
return f"Deleted directory and {directory_result.deleted_count} file(s)."
# Otherwise delete a single file
file_deleted = repo.delete_by_path(
self.user_id, self.tool_id, validated_path
)
# Delete single file
result = self.collection.delete_one({
"user_id": self.user_id,
"tool_id": self.tool_id,
"path": validated_path
})
if result.deleted_count:
if file_deleted:
return f"Deleted: {validated_path}"
return f"Error: File not found: {validated_path}"
@@ -485,62 +503,46 @@ class MemoryTool(Tool):
if validated_old == "/" or validated_new == "/":
return "Error: Cannot rename root directory."
# Check if renaming a directory
# Directory rename: do all path updates inside one transaction so
# the rename is atomic from the caller's perspective.
if validated_old.endswith("/"):
# Ensure validated_new also ends with / for proper path replacement
if not validated_new.endswith("/"):
validated_new = validated_new + "/"
# Find all files in the old directory
docs = list(self.collection.find({
"user_id": self.user_id,
"tool_id": self.tool_id,
"path": {"$regex": f"^{re.escape(validated_old)}"}
}))
if not docs:
return f"Error: Directory not found: {validated_old}"
# Update paths for all files
for doc in docs:
old_file_path = doc["path"]
new_file_path = old_file_path.replace(validated_old, validated_new, 1)
self.collection.update_one(
{"_id": doc["_id"]},
{"$set": {"path": new_file_path, "updated_at": datetime.now()}}
with db_session() as conn:
repo = MemoriesRepository(conn)
docs = repo.list_by_prefix(
self.user_id, self.tool_id, validated_old
)
if not docs:
return f"Error: Directory not found: {validated_old}"
for doc in docs:
old_file_path = doc["path"]
new_file_path = old_file_path.replace(
validated_old, validated_new, 1
)
repo.update_path(
self.user_id, self.tool_id, old_file_path, new_file_path
)
return f"Renamed directory: {validated_old} -> {validated_new} ({len(docs)} files)"
# Rename single file
doc = self.collection.find_one({
"user_id": self.user_id,
"tool_id": self.tool_id,
"path": validated_old
})
# Single-file rename: lookup, collision check, and update in one txn.
with db_session() as conn:
repo = MemoriesRepository(conn)
doc = repo.get_by_path(self.user_id, self.tool_id, validated_old)
if not doc:
return f"Error: File not found: {validated_old}"
if not doc:
return f"Error: File not found: {validated_old}"
existing = repo.get_by_path(self.user_id, self.tool_id, validated_new)
if existing:
return f"Error: File already exists at {validated_new}"
# Check if new path already exists
existing = self.collection.find_one({
"user_id": self.user_id,
"tool_id": self.tool_id,
"path": validated_new
})
if existing:
return f"Error: File already exists at {validated_new}"
# Delete the old document and create a new one with the new path
self.collection.delete_one({"user_id": self.user_id, "tool_id": self.tool_id, "path": validated_old})
self.collection.insert_one({
"user_id": self.user_id,
"tool_id": self.tool_id,
"path": validated_new,
"content": doc.get("content", ""),
"updated_at": datetime.now()
})
repo.update_path(
self.user_id, self.tool_id, validated_old, validated_new
)
return f"Renamed: {validated_old} -> {validated_new}"

View File

@@ -1,10 +1,16 @@
from datetime import datetime
from typing import Any, Dict, List, Optional
import uuid
from .base import Tool
from application.core.mongo_db import MongoDB
from application.core.settings import settings
from application.storage.db.repositories.notes import NotesRepository
from application.storage.db.session import db_readonly, db_session
# Stable synthetic title used in the Postgres ``notes.title`` column.
# The notes tool stores one note per (user_id, tool_id); there is no
# user-facing title. PG requires ``title`` NOT NULL, so we write a stable
# constant alongside the actual note body in ``content``.
_NOTE_TITLE = "note"
class NotesTool(Tool):
@@ -25,7 +31,6 @@ class NotesTool(Tool):
self.user_id: Optional[str] = user_id
# Get tool_id from configuration (passed from user_tools._id in production)
# In production, tool_id is the MongoDB ObjectId string from user_tools collection
if tool_config and "tool_id" in tool_config:
self.tool_id = tool_config["tool_id"]
elif user_id:
@@ -35,11 +40,25 @@ class NotesTool(Tool):
# Last resort fallback (shouldn't happen in normal use)
self.tool_id = str(uuid.uuid4())
db = MongoDB.get_client()[settings.MONGO_DB_NAME]
self.collection = db["notes"]
self._last_artifact_id: Optional[str] = None
def _pg_enabled(self) -> bool:
"""Return True only when ``tool_id`` is a real ``user_tools.id`` UUID.
``notes.tool_id`` is a UUID FK to ``user_tools``; repo queries
``CAST(:tool_id AS uuid)``. The sentinel ``default_{uid}``
fallback is neither a UUID nor a ``user_tools`` row, so any DB
operation would crash. Mirror MemoryTool's guard and no-op.
"""
tool_id = getattr(self, "tool_id", None)
if not tool_id or not isinstance(tool_id, str):
return False
if tool_id.startswith("default_"):
return False
from application.storage.db.base_repository import looks_like_uuid
return looks_like_uuid(tool_id)
# -----------------------------
# Action implementations
# -----------------------------
@@ -54,7 +73,13 @@ class NotesTool(Tool):
A human-readable string result.
"""
if not self.user_id:
return "Error: NotesTool requires a valid user_id."
return "Error: NotesTool requires a valid user_id."
if not self._pg_enabled():
return (
"Error: NotesTool is not configured with a persistent "
"tool_id; note storage is unavailable for this session."
)
self._last_artifact_id = None
@@ -135,37 +160,45 @@ class NotesTool(Tool):
# -----------------------------
# Internal helpers (single-note)
# -----------------------------
def _fetch_note(self) -> Optional[dict]:
"""Read the note row for this (user, tool) from Postgres."""
with db_readonly() as conn:
return NotesRepository(conn).get_for_user_tool(self.user_id, self.tool_id)
def _get_note(self) -> str:
doc = self.collection.find_one({"user_id": self.user_id, "tool_id": self.tool_id})
if not doc or not doc.get("note"):
doc = self._fetch_note()
# ``content`` is the PG column; expose as ``note`` to callers via the
# textual return value. Frontends that read the artifact via the
# repo dict get ``content`` (PG-native) plus the artifact id below.
body = (doc or {}).get("content")
if not doc or not body:
return "No note found."
if doc.get("_id") is not None:
self._last_artifact_id = str(doc.get("_id"))
return str(doc["note"])
if doc.get("id") is not None:
self._last_artifact_id = str(doc.get("id"))
return str(body)
def _overwrite_note(self, content: str) -> str:
content = (content or "").strip()
if not content:
return "Note content required."
result = self.collection.find_one_and_update(
{"user_id": self.user_id, "tool_id": self.tool_id},
{"$set": {"note": content, "updated_at": datetime.utcnow()}},
upsert=True,
return_document=True,
)
if result and result.get("_id") is not None:
self._last_artifact_id = str(result.get("_id"))
with db_session() as conn:
row = NotesRepository(conn).upsert(
self.user_id, self.tool_id, _NOTE_TITLE, content
)
if row and row.get("id") is not None:
self._last_artifact_id = str(row.get("id"))
return "Note saved."
def _str_replace(self, old_str: str, new_str: str) -> str:
if not old_str:
return "old_str is required."
doc = self.collection.find_one({"user_id": self.user_id, "tool_id": self.tool_id})
if not doc or not doc.get("note"):
doc = self._fetch_note()
existing = (doc or {}).get("content")
if not doc or not existing:
return "No note found."
current_note = str(doc["note"])
current_note = str(existing)
# Case-insensitive search
if old_str.lower() not in current_note.lower():
@@ -175,24 +208,24 @@ class NotesTool(Tool):
import re
updated_note = re.sub(re.escape(old_str), new_str, current_note, flags=re.IGNORECASE)
result = self.collection.find_one_and_update(
{"user_id": self.user_id, "tool_id": self.tool_id},
{"$set": {"note": updated_note, "updated_at": datetime.utcnow()}},
return_document=True,
)
if result and result.get("_id") is not None:
self._last_artifact_id = str(result.get("_id"))
with db_session() as conn:
row = NotesRepository(conn).upsert(
self.user_id, self.tool_id, _NOTE_TITLE, updated_note
)
if row and row.get("id") is not None:
self._last_artifact_id = str(row.get("id"))
return "Note updated."
def _insert(self, line_number: int, text: str) -> str:
if not text:
return "Text is required."
doc = self.collection.find_one({"user_id": self.user_id, "tool_id": self.tool_id})
if not doc or not doc.get("note"):
doc = self._fetch_note()
existing = (doc or {}).get("content")
if not doc or not existing:
return "No note found."
current_note = str(doc["note"])
current_note = str(existing)
lines = current_note.split("\n")
# Convert to 0-indexed and validate
@@ -203,21 +236,23 @@ class NotesTool(Tool):
lines.insert(index, text)
updated_note = "\n".join(lines)
result = self.collection.find_one_and_update(
{"user_id": self.user_id, "tool_id": self.tool_id},
{"$set": {"note": updated_note, "updated_at": datetime.utcnow()}},
return_document=True,
)
if result and result.get("_id") is not None:
self._last_artifact_id = str(result.get("_id"))
with db_session() as conn:
row = NotesRepository(conn).upsert(
self.user_id, self.tool_id, _NOTE_TITLE, updated_note
)
if row and row.get("id") is not None:
self._last_artifact_id = str(row.get("id"))
return "Text inserted."
def _delete_note(self) -> str:
doc = self.collection.find_one_and_delete(
{"user_id": self.user_id, "tool_id": self.tool_id}
)
if not doc:
# Capture the id (for artifact tracking) before deleting.
existing = self._fetch_note()
if not existing:
return "No note found to delete."
if doc.get("_id") is not None:
self._last_artifact_id = str(doc.get("_id"))
with db_session() as conn:
deleted = NotesRepository(conn).delete(self.user_id, self.tool_id)
if not deleted:
return "No note found to delete."
if existing.get("id") is not None:
self._last_artifact_id = str(existing.get("id"))
return "Note deleted."

View File

@@ -71,7 +71,7 @@ class NtfyTool(Tool):
if self.token:
headers["Authorization"] = f"Basic {self.token}"
data = message.encode("utf-8")
response = requests.post(url, headers=headers, data=data)
response = requests.post(url, headers=headers, data=data, timeout=100)
return {"status_code": response.status_code, "message": "Message sent"}
def get_actions_metadata(self):

View File

@@ -1,6 +1,6 @@
import logging
import psycopg2
import psycopg
from application.agents.tools.base import Tool
@@ -33,7 +33,7 @@ class PostgresTool(Tool):
"""
conn = None
try:
conn = psycopg2.connect(self.connection_string)
conn = psycopg.connect(self.connection_string)
cur = conn.cursor()
cur.execute(sql_query)
conn.commit()
@@ -60,7 +60,7 @@ class PostgresTool(Tool):
"response_data": response_data,
}
except psycopg2.Error as e:
except psycopg.Error as e:
error_message = f"Database error: {e}"
logger.error("PostgreSQL execute_sql error: %s", e)
return {
@@ -78,7 +78,7 @@ class PostgresTool(Tool):
"""
conn = None
try:
conn = psycopg2.connect(self.connection_string)
conn = psycopg.connect(self.connection_string)
cur = conn.cursor()
cur.execute(
@@ -120,7 +120,7 @@ class PostgresTool(Tool):
"schema": schema_data,
}
except psycopg2.Error as e:
except psycopg.Error as e:
error_message = f"Database error: {e}"
logger.error("PostgreSQL get_schema error: %s", e)
return {

View File

@@ -31,14 +31,14 @@ class TelegramTool(Tool):
logger.debug("Sending Telegram message to chat_id=%s", chat_id)
url = f"https://api.telegram.org/bot{self.token}/sendMessage"
payload = {"chat_id": chat_id, "text": text}
response = requests.post(url, data=payload)
response = requests.post(url, data=payload, timeout=100)
return {"status_code": response.status_code, "message": "Message sent"}
def _send_image(self, image_url, chat_id):
logger.debug("Sending Telegram image to chat_id=%s", chat_id)
url = f"https://api.telegram.org/bot{self.token}/sendPhoto"
payload = {"chat_id": chat_id, "photo": image_url}
response = requests.post(url, data=payload)
response = requests.post(url, data=payload, timeout=100)
return {"status_code": response.status_code, "message": "Image sent"}
def get_actions_metadata(self):

View File

@@ -36,6 +36,8 @@ class ThinkTool(Tool):
The reasoning content is captured in tool_call data for transparency.
"""
internal = True
def __init__(self, config=None):
pass

View File

@@ -1,10 +1,19 @@
from datetime import datetime
from typing import Any, Dict, List, Optional
import uuid
from .base import Tool
from application.core.mongo_db import MongoDB
from application.core.settings import settings
from application.storage.db.repositories.todos import TodosRepository
from application.storage.db.session import db_readonly, db_session
def _status_from_completed(completed: Any) -> str:
"""Translate the PG ``completed`` boolean to the legacy status string.
The frontend (and prior LLM-facing tool output) expects
``"open"`` / ``"completed"``. Keeping that contract at the tool
boundary insulates callers from the schema change.
"""
return "completed" if bool(completed) else "open"
class TodoListTool(Tool):
@@ -25,7 +34,6 @@ class TodoListTool(Tool):
self.user_id: Optional[str] = user_id
# Get tool_id from configuration (passed from user_tools._id in production)
# In production, tool_id is the MongoDB ObjectId string from user_tools collection
if tool_config and "tool_id" in tool_config:
self.tool_id = tool_config["tool_id"]
elif user_id:
@@ -35,11 +43,27 @@ class TodoListTool(Tool):
# Last resort fallback (shouldn't happen in normal use)
self.tool_id = str(uuid.uuid4())
db = MongoDB.get_client()[settings.MONGO_DB_NAME]
self.collection = db["todos"]
self._last_artifact_id: Optional[str] = None
def _pg_enabled(self) -> bool:
"""Return True only when ``tool_id`` is a real ``user_tools.id`` UUID.
The ``todos`` PG table has a UUID foreign key to ``user_tools`` and
the repo queries ``CAST(:tool_id AS uuid)``. The sentinel
``default_{uid}`` fallback is neither a UUID nor a row in
``user_tools`` — binding it would crash ``invalid input syntax for
type uuid`` and even if it didn't the FK would reject it. Mirror
the MemoryTool guard and no-op in that case.
"""
tool_id = getattr(self, "tool_id", None)
if not tool_id or not isinstance(tool_id, str):
return False
if tool_id.startswith("default_"):
return False
from application.storage.db.base_repository import looks_like_uuid
return looks_like_uuid(tool_id)
# -----------------------------
# Action implementations
# -----------------------------
@@ -56,6 +80,12 @@ class TodoListTool(Tool):
if not self.user_id:
return "Error: TodoListTool requires a valid user_id."
if not self._pg_enabled():
return (
"Error: TodoListTool is not configured with a persistent "
"tool_id; todo storage is unavailable for this session."
)
self._last_artifact_id = None
if action_name == "list":
@@ -191,28 +221,10 @@ class TodoListTool(Tool):
return None
def _get_next_todo_id(self) -> int:
"""Get the next sequential todo_id for this user and tool.
Returns a simple integer (1, 2, 3, ...) scoped to this user/tool.
With 5-10 todos max, scanning is negligible.
"""
query = {"user_id": self.user_id, "tool_id": self.tool_id}
todos = list(self.collection.find(query, {"todo_id": 1}))
# Find the maximum todo_id
max_id = 0
for todo in todos:
todo_id = self._coerce_todo_id(todo.get("todo_id"))
if todo_id is not None:
max_id = max(max_id, todo_id)
return max_id + 1
def _list(self) -> str:
"""List all todos for the user."""
query = {"user_id": self.user_id, "tool_id": self.tool_id}
todos = list(self.collection.find(query))
with db_readonly() as conn:
todos = TodosRepository(conn).list_for_tool(self.user_id, self.tool_id)
if not todos:
return "No todos found."
@@ -221,7 +233,7 @@ class TodoListTool(Tool):
for doc in todos:
todo_id = doc.get("todo_id")
title = doc.get("title", "Untitled")
status = doc.get("status", "open")
status = _status_from_completed(doc.get("completed"))
line = f"[{todo_id}] {title} ({status})"
result_lines.append(line)
@@ -229,27 +241,23 @@ class TodoListTool(Tool):
return "\n".join(result_lines)
def _create(self, title: str) -> str:
"""Create a new todo item."""
"""Create a new todo item.
``TodosRepository.create`` allocates the per-tool monotonic
``todo_id`` inside the same transaction (``COALESCE(MAX(todo_id),0)+1``
scoped to ``tool_id``), so we no longer need a separate read-then-
write step here.
"""
title = (title or "").strip()
if not title:
return "Error: Title is required."
now = datetime.now()
todo_id = self._get_next_todo_id()
with db_session() as conn:
row = TodosRepository(conn).create(self.user_id, self.tool_id, title)
doc = {
"todo_id": todo_id,
"user_id": self.user_id,
"tool_id": self.tool_id,
"title": title,
"status": "open",
"created_at": now,
"updated_at": now,
}
insert_result = self.collection.insert_one(doc)
inserted_id = getattr(insert_result, "inserted_id", None) or doc.get("_id")
if inserted_id is not None:
self._last_artifact_id = str(inserted_id)
todo_id = row.get("todo_id")
if row.get("id") is not None:
self._last_artifact_id = str(row.get("id"))
return f"Todo created with ID {todo_id}: {title}"
def _get(self, todo_id: Optional[Any]) -> str:
@@ -258,21 +266,21 @@ class TodoListTool(Tool):
if parsed_todo_id is None:
return "Error: todo_id must be a positive integer."
query = {"user_id": self.user_id, "tool_id": self.tool_id, "todo_id": parsed_todo_id}
doc = self.collection.find_one(query)
with db_readonly() as conn:
doc = TodosRepository(conn).get_by_tool_and_todo_id(
self.user_id, self.tool_id, parsed_todo_id
)
if not doc:
return f"Error: Todo with ID {parsed_todo_id} not found."
if doc.get("_id") is not None:
self._last_artifact_id = str(doc.get("_id"))
if doc.get("id") is not None:
self._last_artifact_id = str(doc.get("id"))
title = doc.get("title", "Untitled")
status = doc.get("status", "open")
status = _status_from_completed(doc.get("completed"))
result = f"Todo [{parsed_todo_id}]:\nTitle: {title}\nStatus: {status}"
return result
return f"Todo [{parsed_todo_id}]:\nTitle: {title}\nStatus: {status}"
def _update(self, todo_id: Optional[Any], title: str) -> str:
"""Update a todo's title by ID."""
@@ -284,16 +292,19 @@ class TodoListTool(Tool):
if not title:
return "Error: Title is required."
query = {"user_id": self.user_id, "tool_id": self.tool_id, "todo_id": parsed_todo_id}
doc = self.collection.find_one_and_update(
query,
{"$set": {"title": title, "updated_at": datetime.now()}},
)
if not doc:
return f"Error: Todo with ID {parsed_todo_id} not found."
with db_session() as conn:
repo = TodosRepository(conn)
existing = repo.get_by_tool_and_todo_id(
self.user_id, self.tool_id, parsed_todo_id
)
if not existing:
return f"Error: Todo with ID {parsed_todo_id} not found."
repo.update_title_by_tool_and_todo_id(
self.user_id, self.tool_id, parsed_todo_id, title
)
if doc.get("_id") is not None:
self._last_artifact_id = str(doc.get("_id"))
if existing.get("id") is not None:
self._last_artifact_id = str(existing.get("id"))
return f"Todo {parsed_todo_id} updated to: {title}"
@@ -303,16 +314,17 @@ class TodoListTool(Tool):
if parsed_todo_id is None:
return "Error: todo_id must be a positive integer."
query = {"user_id": self.user_id, "tool_id": self.tool_id, "todo_id": parsed_todo_id}
doc = self.collection.find_one_and_update(
query,
{"$set": {"status": "completed", "updated_at": datetime.now()}},
)
if not doc:
return f"Error: Todo with ID {parsed_todo_id} not found."
with db_session() as conn:
repo = TodosRepository(conn)
existing = repo.get_by_tool_and_todo_id(
self.user_id, self.tool_id, parsed_todo_id
)
if not existing:
return f"Error: Todo with ID {parsed_todo_id} not found."
repo.set_completed(self.user_id, self.tool_id, parsed_todo_id, True)
if doc.get("_id") is not None:
self._last_artifact_id = str(doc.get("_id"))
if existing.get("id") is not None:
self._last_artifact_id = str(existing.get("id"))
return f"Todo {parsed_todo_id} marked as completed."
@@ -322,12 +334,18 @@ class TodoListTool(Tool):
if parsed_todo_id is None:
return "Error: todo_id must be a positive integer."
query = {"user_id": self.user_id, "tool_id": self.tool_id, "todo_id": parsed_todo_id}
doc = self.collection.find_one_and_delete(query)
if not doc:
return f"Error: Todo with ID {parsed_todo_id} not found."
with db_session() as conn:
repo = TodosRepository(conn)
existing = repo.get_by_tool_and_todo_id(
self.user_id, self.tool_id, parsed_todo_id
)
if not existing:
return f"Error: Todo with ID {parsed_todo_id} not found."
repo.delete_by_tool_and_todo_id(
self.user_id, self.tool_id, parsed_todo_id
)
if doc.get("_id") is not None:
self._last_artifact_id = str(doc.get("_id"))
if existing.get("id") is not None:
self._last_artifact_id = str(existing.get("id"))
return f"Todo {parsed_todo_id} deleted."

View File

@@ -5,8 +5,9 @@ logger = logging.getLogger(__name__)
class ToolActionParser:
def __init__(self, llm_type):
def __init__(self, llm_type, name_mapping=None):
self.llm_type = llm_type
self.name_mapping = name_mapping
self.parsers = {
"OpenAILLM": self._parse_openai_llm,
"GoogleLLM": self._parse_google_llm,
@@ -16,22 +17,33 @@ class ToolActionParser:
parser = self.parsers.get(self.llm_type, self._parse_openai_llm)
return parser(call)
def _resolve_via_mapping(self, call_name):
"""Look up (tool_id, action_name) from the name mapping if available."""
if self.name_mapping and call_name in self.name_mapping:
return self.name_mapping[call_name]
return None
def _parse_openai_llm(self, call):
try:
call_args = json.loads(call.arguments)
resolved = self._resolve_via_mapping(call.name)
if resolved:
return resolved[0], resolved[1], call_args
# Fallback: legacy split on "_" for backward compatibility
tool_parts = call.name.split("_")
# If the tool name doesn't contain an underscore, it's likely a hallucinated tool
if len(tool_parts) < 2:
logger.warning(
f"Invalid tool name format: {call.name}. Expected format: action_name_tool_id"
f"Invalid tool name format: {call.name}. "
"Could not resolve via mapping or legacy parsing."
)
return None, None, None
tool_id = tool_parts[-1]
action_name = "_".join(tool_parts[:-1])
# Validate that tool_id looks like a numerical ID
if not tool_id.isdigit():
logger.warning(
f"Tool ID '{tool_id}' is not numerical. This might be a hallucinated tool call."
@@ -45,19 +57,24 @@ class ToolActionParser:
def _parse_google_llm(self, call):
try:
call_args = call.arguments
resolved = self._resolve_via_mapping(call.name)
if resolved:
return resolved[0], resolved[1], call_args
# Fallback: legacy split on "_" for backward compatibility
tool_parts = call.name.split("_")
# If the tool name doesn't contain an underscore, it's likely a hallucinated tool
if len(tool_parts) < 2:
logger.warning(
f"Invalid tool name format: {call.name}. Expected format: action_name_tool_id"
f"Invalid tool name format: {call.name}. "
"Could not resolve via mapping or legacy parsing."
)
return None, None, None
tool_id = tool_parts[-1]
action_name = "_".join(tool_parts[:-1])
# Validate that tool_id looks like a numerical ID
if not tool_id.isdigit():
logger.warning(
f"Tool ID '{tool_id}' is not numerical. This might be a hallucinated tool call."

View File

@@ -19,7 +19,7 @@ class ToolManager:
continue
module = importlib.import_module(f"application.agents.tools.{name}")
for member_name, obj in inspect.getmembers(module, inspect.isclass):
if issubclass(obj, Tool) and obj is not Tool:
if issubclass(obj, Tool) and obj is not Tool and not obj.internal:
tool_config = self.config.get(name, {})
self.tools[name] = obj(tool_config)

View File

@@ -12,9 +12,13 @@ from application.agents.workflows.schemas import (
WorkflowRun,
)
from application.agents.workflows.workflow_engine import WorkflowEngine
from application.core.mongo_db import MongoDB
from application.core.settings import settings
from application.logging import log_activity, LogContext
from application.storage.db.base_repository import looks_like_uuid
from application.storage.db.repositories.workflow_edges import WorkflowEdgesRepository
from application.storage.db.repositories.workflow_nodes import WorkflowNodesRepository
from application.storage.db.repositories.workflow_runs import WorkflowRunsRepository
from application.storage.db.repositories.workflows import WorkflowsRepository
from application.storage.db.session import db_readonly, db_session
logger = logging.getLogger(__name__)
@@ -103,10 +107,8 @@ class WorkflowAgent(BaseAgent):
def _load_from_database(self) -> Optional[WorkflowGraph]:
try:
from bson.objectid import ObjectId
if not self.workflow_id or not ObjectId.is_valid(self.workflow_id):
logger.error(f"Invalid workflow ID: {self.workflow_id}")
if not self.workflow_id:
logger.error("Missing workflow ID for load")
return None
owner_id = self.workflow_owner
if not owner_id and isinstance(self.decoded_token, dict):
@@ -117,61 +119,61 @@ class WorkflowAgent(BaseAgent):
)
return None
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
workflows_coll = db["workflows"]
workflow_nodes_coll = db["workflow_nodes"]
workflow_edges_coll = db["workflow_edges"]
workflow_doc = workflows_coll.find_one(
{"_id": ObjectId(self.workflow_id), "user": owner_id}
)
if not workflow_doc:
logger.error(
f"Workflow {self.workflow_id} not found or inaccessible for user {owner_id}"
)
return None
workflow = Workflow(**workflow_doc)
graph_version = workflow_doc.get("current_graph_version", 1)
try:
graph_version = int(graph_version)
if graph_version <= 0:
with db_readonly() as conn:
wf_repo = WorkflowsRepository(conn)
if looks_like_uuid(self.workflow_id):
workflow_row = wf_repo.get(self.workflow_id, owner_id)
else:
workflow_row = wf_repo.get_by_legacy_id(self.workflow_id, owner_id)
if workflow_row is None:
logger.error(
f"Workflow {self.workflow_id} not found or inaccessible "
f"for user {owner_id}"
)
return None
pg_workflow_id = str(workflow_row["id"])
graph_version = workflow_row.get("current_graph_version", 1)
try:
graph_version = int(graph_version)
if graph_version <= 0:
graph_version = 1
except (ValueError, TypeError):
graph_version = 1
except (ValueError, TypeError):
graph_version = 1
nodes_docs = list(
workflow_nodes_coll.find(
{"workflow_id": self.workflow_id, "graph_version": graph_version}
node_rows = WorkflowNodesRepository(conn).find_by_version(
pg_workflow_id, graph_version,
)
)
if not nodes_docs and graph_version == 1:
nodes_docs = list(
workflow_nodes_coll.find(
{
"workflow_id": self.workflow_id,
"graph_version": {"$exists": False},
}
)
edge_rows = WorkflowEdgesRepository(conn).find_by_version(
pg_workflow_id, graph_version,
)
nodes = [WorkflowNode(**doc) for doc in nodes_docs]
edges_docs = list(
workflow_edges_coll.find(
{"workflow_id": self.workflow_id, "graph_version": graph_version}
)
workflow = Workflow(
name=workflow_row.get("name"),
description=workflow_row.get("description"),
)
if not edges_docs and graph_version == 1:
edges_docs = list(
workflow_edges_coll.find(
{
"workflow_id": self.workflow_id,
"graph_version": {"$exists": False},
}
)
nodes = [
WorkflowNode(
id=n["node_id"],
workflow_id=pg_workflow_id,
type=n["node_type"],
title=n.get("title") or "Node",
description=n.get("description"),
position=n.get("position") or {"x": 0, "y": 0},
config=n.get("config") or {},
)
edges = [WorkflowEdge(**doc) for doc in edges_docs]
for n in node_rows
]
edges = [
WorkflowEdge(
id=e["edge_id"],
workflow_id=pg_workflow_id,
source=e.get("source_id"),
target=e.get("target_id"),
sourceHandle=e.get("source_handle"),
targetHandle=e.get("target_handle"),
)
for e in edge_rows
]
return WorkflowGraph(workflow=workflow, nodes=nodes, edges=edges)
except Exception as e:
@@ -181,13 +183,13 @@ class WorkflowAgent(BaseAgent):
def _save_workflow_run(self, query: str) -> None:
if not self._engine:
return
owner_id = self.workflow_owner
if not owner_id and isinstance(self.decoded_token, dict):
owner_id = self.decoded_token.get("sub")
try:
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
workflow_runs_coll = db["workflow_runs"]
run = WorkflowRun(
workflow_id=self.workflow_id or "unknown",
user=owner_id,
status=self._determine_run_status(),
inputs={"query": query},
outputs=self._serialize_state(self._engine.state),
@@ -196,7 +198,28 @@ class WorkflowAgent(BaseAgent):
completed_at=datetime.now(timezone.utc),
)
workflow_runs_coll.insert_one(run.to_mongo_doc())
if not self.workflow_id or not owner_id:
return
with db_session() as conn:
wf_repo = WorkflowsRepository(conn)
if looks_like_uuid(self.workflow_id):
workflow_row = wf_repo.get(self.workflow_id, owner_id)
else:
workflow_row = wf_repo.get_by_legacy_id(
self.workflow_id, owner_id,
)
if workflow_row is None:
return
WorkflowRunsRepository(conn).create(
str(workflow_row["id"]),
owner_id,
run.status.value,
inputs=run.inputs,
result=run.outputs,
steps=[step.model_dump(mode="json") for step in run.steps],
started_at=run.created_at,
ended_at=run.completed_at,
)
except Exception as e:
logger.error(f"Failed to save workflow run: {e}")

View File

@@ -2,7 +2,6 @@ from datetime import datetime, timezone
from enum import Enum
from typing import Any, Dict, List, Literal, Optional, Union
from bson import ObjectId
from pydantic import BaseModel, ConfigDict, Field, field_validator
@@ -81,24 +80,7 @@ class WorkflowEdgeCreate(BaseModel):
class WorkflowEdge(WorkflowEdgeCreate):
mongo_id: Optional[str] = Field(None, alias="_id")
@field_validator("mongo_id", mode="before")
@classmethod
def convert_objectid(cls, v: Any) -> Optional[str]:
if isinstance(v, ObjectId):
return str(v)
return v
def to_mongo_doc(self) -> Dict[str, Any]:
return {
"id": self.id,
"workflow_id": self.workflow_id,
"source_id": self.source_id,
"target_id": self.target_id,
"source_handle": self.source_handle,
"target_handle": self.target_handle,
}
pass
class WorkflowNodeCreate(BaseModel):
@@ -120,25 +102,7 @@ class WorkflowNodeCreate(BaseModel):
class WorkflowNode(WorkflowNodeCreate):
mongo_id: Optional[str] = Field(None, alias="_id")
@field_validator("mongo_id", mode="before")
@classmethod
def convert_objectid(cls, v: Any) -> Optional[str]:
if isinstance(v, ObjectId):
return str(v)
return v
def to_mongo_doc(self) -> Dict[str, Any]:
return {
"id": self.id,
"workflow_id": self.workflow_id,
"type": self.type.value,
"title": self.title,
"description": self.description,
"position": self.position.model_dump(),
"config": self.config,
}
pass
class WorkflowCreate(BaseModel):
@@ -149,26 +113,10 @@ class WorkflowCreate(BaseModel):
class Workflow(WorkflowCreate):
id: Optional[str] = Field(None, alias="_id")
id: Optional[str] = None
created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
updated_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
@field_validator("id", mode="before")
@classmethod
def convert_objectid(cls, v: Any) -> Optional[str]:
if isinstance(v, ObjectId):
return str(v)
return v
def to_mongo_doc(self) -> Dict[str, Any]:
return {
"name": self.name,
"description": self.description,
"user": self.user,
"created_at": self.created_at,
"updated_at": self.updated_at,
}
class WorkflowGraph(BaseModel):
workflow: Workflow
@@ -209,29 +157,12 @@ class WorkflowRunCreate(BaseModel):
class WorkflowRun(BaseModel):
model_config = ConfigDict(extra="allow")
id: Optional[str] = Field(None, alias="_id")
id: Optional[str] = None
workflow_id: str
user: Optional[str] = None
status: ExecutionStatus = ExecutionStatus.PENDING
inputs: Dict[str, str] = Field(default_factory=dict)
outputs: Dict[str, Any] = Field(default_factory=dict)
steps: List[NodeExecutionLog] = Field(default_factory=list)
created_at: datetime = Field(default_factory=lambda: datetime.now(timezone.utc))
completed_at: Optional[datetime] = None
@field_validator("id", mode="before")
@classmethod
def convert_objectid(cls, v: Any) -> Optional[str]:
if isinstance(v, ObjectId):
return str(v)
return v
def to_mongo_doc(self) -> Dict[str, Any]:
return {
"workflow_id": self.workflow_id,
"status": self.status.value,
"inputs": self.inputs,
"outputs": self.outputs,
"steps": [step.model_dump() for step in self.steps],
"created_at": self.created_at,
"completed_at": self.completed_at,
}

View File

@@ -200,6 +200,9 @@ class WorkflowEngine:
node_config = AgentNodeConfig(**node.config.get("config", node.config))
if node_config.sources:
self._retrieve_node_sources(node_config)
if node_config.prompt_template:
formatted_prompt = self._format_template(node_config.prompt_template)
else:
@@ -455,6 +458,29 @@ class WorkflowEngine:
docs_together = "\n\n".join(docs_together_parts) if docs_together_parts else None
return docs, docs_together
def _retrieve_node_sources(self, node_config: AgentNodeConfig) -> None:
"""Retrieve documents from the node's sources for template resolution."""
from application.retriever.retriever_creator import RetrieverCreator
query = self.state.get("query", "")
if not query:
return
try:
retriever = RetrieverCreator.create_retriever(
node_config.retriever or "classic",
source={"active_docs": node_config.sources},
chat_history=[],
prompt="",
chunks=int(node_config.chunks) if node_config.chunks else 2,
decoded_token=self.agent.decoded_token,
)
docs = retriever.search(query)
if docs:
self.agent.retrieved_docs = docs
except Exception:
logger.exception("Failed to retrieve docs for workflow node")
def get_execution_summary(self) -> List[NodeExecutionLog]:
return [
NodeExecutionLog(

52
application/alembic.ini Normal file
View File

@@ -0,0 +1,52 @@
# Alembic configuration for the DocsGPT user-data Postgres database.
#
# The SQLAlchemy URL is deliberately NOT set here — env.py reads it from
# ``application.core.settings.settings.POSTGRES_URI`` so the same config
# source serves the running app and migrations. To run from the project
# root::
#
# alembic -c application/alembic.ini upgrade head
[alembic]
script_location = %(here)s/alembic
prepend_sys_path = ..
version_path_separator = os
# sqlalchemy.url is intentionally left blank — env.py supplies it.
sqlalchemy.url =
[post_write_hooks]
[loggers]
keys = root,sqlalchemy,alembic
[handlers]
keys = console
[formatters]
keys = generic
[logger_root]
level = WARNING
handlers = console
qualname =
[logger_sqlalchemy]
level = WARNING
handlers =
qualname = sqlalchemy.engine
[logger_alembic]
level = INFO
handlers =
qualname = alembic
[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic
[formatter_generic]
format = %(levelname)-5.5s [%(name)s] %(message)s
datefmt = %H:%M:%S

View File

@@ -0,0 +1,82 @@
"""Alembic environment for the DocsGPT user-data Postgres database.
The URL is pulled from ``application.core.settings`` rather than
``alembic.ini`` so that a single ``POSTGRES_URI`` env var drives both the
running app and ``alembic`` CLI invocations.
"""
import sys
from logging.config import fileConfig
from pathlib import Path
# Make the project root importable regardless of cwd. env.py lives at
# <repo>/application/alembic/env.py, so parents[2] is the repo root.
_PROJECT_ROOT = Path(__file__).resolve().parents[2]
if str(_PROJECT_ROOT) not in sys.path:
sys.path.insert(0, str(_PROJECT_ROOT))
from alembic import context # noqa: E402
from sqlalchemy import engine_from_config, pool # noqa: E402
from application.core.settings import settings # noqa: E402
from application.storage.db.models import metadata as target_metadata # noqa: E402
config = context.config
# Populate the runtime URL from settings.
if settings.POSTGRES_URI:
config.set_main_option("sqlalchemy.url", settings.POSTGRES_URI)
if config.config_file_name is not None:
fileConfig(config.config_file_name)
def run_migrations_offline() -> None:
"""Run migrations in 'offline' mode (emits SQL without a live DB)."""
url = config.get_main_option("sqlalchemy.url")
if not url:
raise RuntimeError(
"POSTGRES_URI is not configured. Set it in your .env to a "
"psycopg3 URI such as "
"'postgresql+psycopg://user:pass@host:5432/docsgpt'."
)
context.configure(
url=url,
target_metadata=target_metadata,
literal_binds=True,
dialect_opts={"paramstyle": "named"},
compare_type=True,
)
with context.begin_transaction():
context.run_migrations()
def run_migrations_online() -> None:
"""Run migrations in 'online' mode against a live connection."""
if not config.get_main_option("sqlalchemy.url"):
raise RuntimeError(
"POSTGRES_URI is not configured. Set it in your .env to a "
"psycopg3 URI such as "
"'postgresql+psycopg://user:pass@host:5432/docsgpt'."
)
connectable = engine_from_config(
config.get_section(config.config_ini_section, {}),
prefix="sqlalchemy.",
poolclass=pool.NullPool,
future=True,
)
with connectable.connect() as connection:
context.configure(
connection=connection,
target_metadata=target_metadata,
compare_type=True,
)
with context.begin_transaction():
context.run_migrations()
if context.is_offline_mode():
run_migrations_offline()
else:
run_migrations_online()

View File

@@ -0,0 +1,26 @@
"""${message}
Revision ID: ${up_revision}
Revises: ${down_revision | comma,n}
Create Date: ${create_date}
"""
from typing import Sequence, Union
from alembic import op
import sqlalchemy as sa
${imports if imports else ""}
# revision identifiers, used by Alembic.
revision: str = ${repr(up_revision)}
down_revision: Union[str, None] = ${repr(down_revision)}
branch_labels: Union[str, Sequence[str], None] = ${repr(branch_labels)}
depends_on: Union[str, Sequence[str], None] = ${repr(depends_on)}
def upgrade() -> None:
${upgrades if upgrades else "pass"}
def downgrade() -> None:
${downgrades if downgrades else "pass"}

View File

@@ -0,0 +1,927 @@
"""0001 initial schema — consolidated Phase-1..3 baseline.
Revision ID: 0001_initial
Revises:
Create Date: 2026-04-13
"""
from typing import Sequence, Union
from alembic import op
revision: str = "0001_initial"
down_revision: Union[str, None] = None
branch_labels: Union[str, Sequence[str], None] = None
depends_on: Union[str, Sequence[str], None] = None
def upgrade() -> None:
# ------------------------------------------------------------------
# Extensions
# ------------------------------------------------------------------
op.execute('CREATE EXTENSION IF NOT EXISTS "pgcrypto";')
op.execute('CREATE EXTENSION IF NOT EXISTS "citext";')
# ------------------------------------------------------------------
# Trigger functions
# ------------------------------------------------------------------
op.execute(
"""
CREATE FUNCTION set_updated_at() RETURNS trigger
LANGUAGE plpgsql AS $$
BEGIN
NEW.updated_at = now();
RETURN NEW;
END;
$$;
"""
)
op.execute(
"""
CREATE FUNCTION ensure_user_exists() RETURNS trigger
LANGUAGE plpgsql AS $$
BEGIN
IF NEW.user_id IS NOT NULL THEN
INSERT INTO users (user_id) VALUES (NEW.user_id)
ON CONFLICT (user_id) DO NOTHING;
END IF;
RETURN NEW;
END;
$$;
"""
)
op.execute(
"""
CREATE FUNCTION cleanup_message_attachment_refs() RETURNS trigger
LANGUAGE plpgsql AS $$
BEGIN
UPDATE conversation_messages
SET attachments = array_remove(attachments, OLD.id)
WHERE OLD.id = ANY(attachments);
RETURN OLD;
END;
$$;
"""
)
op.execute(
"""
CREATE FUNCTION cleanup_agent_extra_source_refs() RETURNS trigger
LANGUAGE plpgsql AS $$
BEGIN
UPDATE agents
SET extra_source_ids = array_remove(extra_source_ids, OLD.id)
WHERE OLD.id = ANY(extra_source_ids);
RETURN OLD;
END;
$$;
"""
)
op.execute(
"""
CREATE FUNCTION cleanup_user_agent_prefs() RETURNS trigger
LANGUAGE plpgsql AS $$
DECLARE
agent_id_text text := OLD.id::text;
BEGIN
UPDATE users
SET agent_preferences = jsonb_set(
jsonb_set(
agent_preferences,
'{pinned}',
COALESCE((
SELECT jsonb_agg(e)
FROM jsonb_array_elements(
COALESCE(agent_preferences->'pinned', '[]'::jsonb)
) e
WHERE (e #>> '{}') <> agent_id_text
), '[]'::jsonb)
),
'{shared_with_me}',
COALESCE((
SELECT jsonb_agg(e)
FROM jsonb_array_elements(
COALESCE(agent_preferences->'shared_with_me', '[]'::jsonb)
) e
WHERE (e #>> '{}') <> agent_id_text
), '[]'::jsonb)
)
WHERE agent_preferences->'pinned' @> to_jsonb(agent_id_text)
OR agent_preferences->'shared_with_me' @> to_jsonb(agent_id_text);
RETURN OLD;
END;
$$;
"""
)
op.execute(
"""
CREATE FUNCTION conversation_messages_fill_user_id() RETURNS trigger
LANGUAGE plpgsql AS $$
BEGIN
IF NEW.user_id IS NULL THEN
SELECT user_id INTO NEW.user_id
FROM conversations
WHERE id = NEW.conversation_id;
END IF;
RETURN NEW;
END;
$$;
"""
)
# ------------------------------------------------------------------
# Tables
# ------------------------------------------------------------------
op.execute(
"""
CREATE TABLE users (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL UNIQUE,
agent_preferences JSONB NOT NULL
DEFAULT '{"pinned": [], "shared_with_me": []}'::jsonb,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
"""
)
op.execute(
"""
CREATE TABLE prompts (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
name TEXT NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
legacy_mongo_id TEXT
);
"""
)
op.execute(
"""
CREATE TABLE user_tools (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
name TEXT NOT NULL,
custom_name TEXT,
display_name TEXT,
description TEXT,
config JSONB NOT NULL DEFAULT '{}'::jsonb,
config_requirements JSONB NOT NULL DEFAULT '{}'::jsonb,
actions JSONB NOT NULL DEFAULT '[]'::jsonb,
status BOOLEAN NOT NULL DEFAULT true,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
legacy_mongo_id TEXT
);
"""
)
op.execute(
"""
CREATE TABLE token_usage (
id BIGSERIAL PRIMARY KEY,
user_id TEXT,
api_key TEXT,
agent_id UUID,
prompt_tokens INTEGER NOT NULL DEFAULT 0,
generated_tokens INTEGER NOT NULL DEFAULT 0,
timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
mongo_id TEXT
);
"""
)
op.execute(
"ALTER TABLE token_usage ADD CONSTRAINT token_usage_attribution_chk "
"CHECK (user_id IS NOT NULL OR api_key IS NOT NULL) NOT VALID;"
)
op.execute(
"""
CREATE TABLE user_logs (
id BIGSERIAL PRIMARY KEY,
user_id TEXT,
endpoint TEXT,
timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
data JSONB,
mongo_id TEXT
);
"""
)
op.execute(
"""
CREATE TABLE stack_logs (
id BIGSERIAL PRIMARY KEY,
activity_id TEXT NOT NULL,
endpoint TEXT,
level TEXT,
user_id TEXT,
api_key TEXT,
query TEXT,
stacks JSONB NOT NULL DEFAULT '[]'::jsonb,
timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
mongo_id TEXT
);
"""
)
op.execute(
"""
CREATE TABLE agent_folders (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
name TEXT NOT NULL,
description TEXT,
parent_id UUID REFERENCES agent_folders(id) ON DELETE SET NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
legacy_mongo_id TEXT
);
"""
)
op.execute(
"""
CREATE TABLE sources (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
name TEXT NOT NULL,
language TEXT,
date TIMESTAMPTZ NOT NULL DEFAULT now(),
model TEXT,
type TEXT,
metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
retriever TEXT,
sync_frequency TEXT,
tokens TEXT,
file_path TEXT,
remote_data JSONB,
directory_structure JSONB,
file_name_map JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
legacy_mongo_id TEXT
);
"""
)
op.execute(
"""
CREATE TABLE agents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
name TEXT NOT NULL,
description TEXT,
agent_type TEXT,
status TEXT NOT NULL,
key CITEXT UNIQUE,
image TEXT,
source_id UUID REFERENCES sources(id) ON DELETE SET NULL,
extra_source_ids UUID[] NOT NULL DEFAULT '{}',
chunks INTEGER,
retriever TEXT,
prompt_id UUID REFERENCES prompts(id) ON DELETE SET NULL,
tools JSONB NOT NULL DEFAULT '[]'::jsonb,
json_schema JSONB,
models JSONB,
default_model_id TEXT,
folder_id UUID REFERENCES agent_folders(id) ON DELETE SET NULL,
workflow_id UUID,
limited_token_mode BOOLEAN NOT NULL DEFAULT false,
token_limit INTEGER,
limited_request_mode BOOLEAN NOT NULL DEFAULT false,
request_limit INTEGER,
allow_system_prompt_override BOOLEAN NOT NULL DEFAULT false,
shared BOOLEAN NOT NULL DEFAULT false,
shared_token CITEXT UNIQUE,
shared_metadata JSONB,
incoming_webhook_token CITEXT UNIQUE,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
last_used_at TIMESTAMPTZ,
legacy_mongo_id TEXT
);
"""
)
op.execute(
"ALTER TABLE token_usage ADD CONSTRAINT token_usage_agent_fk "
"FOREIGN KEY (agent_id) REFERENCES agents(id) ON DELETE SET NULL;"
)
op.execute(
"""
CREATE TABLE attachments (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
filename TEXT NOT NULL,
upload_path TEXT NOT NULL,
mime_type TEXT,
size BIGINT,
content TEXT,
token_count INTEGER,
openai_file_id TEXT,
google_file_uri TEXT,
metadata JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
legacy_mongo_id TEXT
);
"""
)
op.execute(
"""
CREATE TABLE memories (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
tool_id UUID REFERENCES user_tools(id) ON DELETE CASCADE,
path TEXT NOT NULL,
content TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
"""
)
op.execute(
"""
CREATE TABLE todos (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
tool_id UUID REFERENCES user_tools(id) ON DELETE CASCADE,
todo_id INTEGER,
title TEXT NOT NULL,
completed BOOLEAN NOT NULL DEFAULT false,
metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
legacy_mongo_id TEXT
);
"""
)
op.execute(
"""
CREATE TABLE notes (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
tool_id UUID REFERENCES user_tools(id) ON DELETE CASCADE,
title TEXT NOT NULL,
content TEXT NOT NULL,
metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
legacy_mongo_id TEXT
);
"""
)
op.execute(
"""
CREATE TABLE connector_sessions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
provider TEXT NOT NULL,
server_url TEXT,
session_token TEXT UNIQUE,
user_email TEXT,
status TEXT,
token_info JSONB,
session_data JSONB NOT NULL DEFAULT '{}'::jsonb,
expires_at TIMESTAMPTZ,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
legacy_mongo_id TEXT
);
"""
)
op.execute(
"""
CREATE TABLE conversations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
agent_id UUID REFERENCES agents(id) ON DELETE SET NULL,
name TEXT,
api_key TEXT,
is_shared_usage BOOLEAN NOT NULL DEFAULT false,
shared_token TEXT,
date TIMESTAMPTZ NOT NULL DEFAULT now(),
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
shared_with TEXT[] NOT NULL DEFAULT '{}'::text[],
compression_metadata JSONB,
legacy_mongo_id TEXT,
CONSTRAINT conversations_api_key_nonempty_chk
CHECK (api_key IS NULL OR api_key <> '')
);
"""
)
op.execute(
"""
CREATE TABLE conversation_messages (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
conversation_id UUID NOT NULL REFERENCES conversations(id) ON DELETE CASCADE,
position INTEGER NOT NULL,
prompt TEXT,
response TEXT,
thought TEXT,
sources JSONB NOT NULL DEFAULT '[]'::jsonb,
tool_calls JSONB NOT NULL DEFAULT '[]'::jsonb,
attachments UUID[] NOT NULL DEFAULT '{}'::uuid[],
model_id TEXT,
message_metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
feedback JSONB,
timestamp TIMESTAMPTZ NOT NULL DEFAULT now(),
user_id TEXT NOT NULL,
updated_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
"""
)
op.execute(
"""
CREATE TABLE shared_conversations (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
conversation_id UUID NOT NULL REFERENCES conversations(id) ON DELETE CASCADE,
user_id TEXT NOT NULL,
is_promptable BOOLEAN NOT NULL DEFAULT false,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
uuid UUID NOT NULL,
first_n_queries INTEGER NOT NULL DEFAULT 0,
api_key TEXT,
prompt_id UUID REFERENCES prompts(id) ON DELETE SET NULL,
chunks INTEGER,
CONSTRAINT shared_conversations_api_key_nonempty_chk
CHECK (api_key IS NULL OR api_key <> '')
);
"""
)
op.execute(
"""
CREATE TABLE pending_tool_state (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
conversation_id UUID NOT NULL REFERENCES conversations(id) ON DELETE CASCADE,
user_id TEXT NOT NULL,
messages JSONB NOT NULL,
pending_tool_calls JSONB NOT NULL,
tools_dict JSONB NOT NULL,
tool_schemas JSONB NOT NULL,
agent_config JSONB NOT NULL,
client_tools JSONB,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
expires_at TIMESTAMPTZ NOT NULL
);
"""
)
op.execute(
"""
CREATE TABLE workflows (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
user_id TEXT NOT NULL,
name TEXT NOT NULL,
description TEXT,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT now(),
current_graph_version INTEGER NOT NULL DEFAULT 1,
legacy_mongo_id TEXT
);
"""
)
# Backfill the agents.workflow_id FK now that workflows exists.
# The column was created without a FK (forward reference to a table
# that hadn't been declared yet); add the constraint here so workflow
# deletion still cascades through to agent unset.
op.execute(
"ALTER TABLE agents ADD CONSTRAINT agents_workflow_fk "
"FOREIGN KEY (workflow_id) REFERENCES workflows(id) ON DELETE SET NULL;"
)
op.execute(
"""
CREATE TABLE workflow_nodes (
id UUID DEFAULT gen_random_uuid() NOT NULL,
workflow_id UUID NOT NULL REFERENCES workflows(id) ON DELETE CASCADE,
graph_version INTEGER NOT NULL,
node_type TEXT NOT NULL,
config JSONB NOT NULL DEFAULT '{}'::jsonb,
node_id TEXT NOT NULL,
title TEXT,
description TEXT,
position JSONB NOT NULL DEFAULT '{"x": 0, "y": 0}'::jsonb,
legacy_mongo_id TEXT,
PRIMARY KEY (id),
CONSTRAINT workflow_nodes_id_wf_ver_key
UNIQUE (id, workflow_id, graph_version)
);
"""
)
op.execute(
"""
CREATE TABLE workflow_edges (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
workflow_id UUID NOT NULL REFERENCES workflows(id) ON DELETE CASCADE,
graph_version INTEGER NOT NULL,
from_node_id UUID NOT NULL,
to_node_id UUID NOT NULL,
config JSONB NOT NULL DEFAULT '{}'::jsonb,
edge_id TEXT NOT NULL,
source_handle TEXT,
target_handle TEXT,
CONSTRAINT workflow_edges_from_node_fk
FOREIGN KEY (from_node_id, workflow_id, graph_version)
REFERENCES workflow_nodes(id, workflow_id, graph_version) ON DELETE CASCADE,
CONSTRAINT workflow_edges_to_node_fk
FOREIGN KEY (to_node_id, workflow_id, graph_version)
REFERENCES workflow_nodes(id, workflow_id, graph_version) ON DELETE CASCADE
);
"""
)
op.execute(
"""
CREATE TABLE workflow_runs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
workflow_id UUID NOT NULL REFERENCES workflows(id) ON DELETE CASCADE,
user_id TEXT NOT NULL,
status TEXT NOT NULL,
started_at TIMESTAMPTZ NOT NULL DEFAULT now(),
ended_at TIMESTAMPTZ,
result JSONB,
inputs JSONB,
steps JSONB NOT NULL DEFAULT '[]'::jsonb,
legacy_mongo_id TEXT,
CONSTRAINT workflow_runs_status_chk
CHECK (status IN ('pending', 'running', 'completed', 'failed'))
);
"""
)
# ------------------------------------------------------------------
# Indexes
# ------------------------------------------------------------------
op.execute("CREATE INDEX agent_folders_user_idx ON agent_folders (user_id);")
op.execute("CREATE INDEX agents_user_idx ON agents (user_id);")
op.execute("CREATE INDEX agents_shared_idx ON agents (shared) WHERE shared = true;")
op.execute("CREATE INDEX agents_status_idx ON agents (status);")
op.execute("CREATE INDEX agents_source_id_idx ON agents (source_id);")
op.execute("CREATE INDEX agents_prompt_id_idx ON agents (prompt_id);")
op.execute("CREATE INDEX agents_folder_id_idx ON agents (folder_id);")
op.execute(
"CREATE UNIQUE INDEX agents_legacy_mongo_id_uidx "
"ON agents (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
)
op.execute("CREATE INDEX attachments_user_idx ON attachments (user_id);")
op.execute(
"CREATE UNIQUE INDEX attachments_legacy_mongo_id_uidx "
"ON attachments (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
)
op.execute(
# MCP and OAuth connectors share the ``provider`` slot, so the
# dedup key is ``(user_id, server_url, provider)``: MCP rows
# differentiate by server_url (one per MCP server), OAuth rows
# have server_url = NULL and differentiate by provider alone.
# COALESCE lets NULL server_url participate in the constraint.
"CREATE UNIQUE INDEX connector_sessions_user_endpoint_uidx "
"ON connector_sessions (user_id, COALESCE(server_url, ''), provider);"
)
op.execute(
"CREATE INDEX connector_sessions_expiry_idx "
"ON connector_sessions (expires_at) WHERE expires_at IS NOT NULL;"
)
op.execute(
"CREATE INDEX connector_sessions_server_url_idx "
"ON connector_sessions (server_url) WHERE server_url IS NOT NULL;"
)
op.execute(
"CREATE UNIQUE INDEX connector_sessions_legacy_mongo_id_uidx "
"ON connector_sessions (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
)
op.execute(
"CREATE UNIQUE INDEX conversation_messages_conv_pos_uidx "
"ON conversation_messages (conversation_id, position);"
)
op.execute(
"CREATE INDEX conversation_messages_user_ts_idx "
"ON conversation_messages (user_id, timestamp DESC);"
)
op.execute("CREATE INDEX conversations_user_date_idx ON conversations (user_id, date DESC);")
op.execute("CREATE INDEX conversations_agent_idx ON conversations (agent_id);")
op.execute(
"CREATE UNIQUE INDEX conversations_shared_token_uidx "
"ON conversations (shared_token) WHERE shared_token IS NOT NULL;"
)
op.execute(
"CREATE INDEX conversations_api_key_date_idx "
"ON conversations (api_key, date DESC) WHERE api_key IS NOT NULL;"
)
op.execute(
"CREATE UNIQUE INDEX conversations_legacy_mongo_id_uidx "
"ON conversations (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
)
op.execute(
"CREATE UNIQUE INDEX memories_user_tool_path_uidx "
"ON memories (user_id, tool_id, path);"
)
op.execute(
"CREATE UNIQUE INDEX memories_user_path_null_tool_uidx "
"ON memories (user_id, path) WHERE tool_id IS NULL;"
)
op.execute(
"CREATE INDEX memories_path_prefix_idx "
"ON memories (user_id, tool_id, path text_pattern_ops);"
)
op.execute("CREATE INDEX memories_tool_id_idx ON memories (tool_id);")
op.execute("CREATE UNIQUE INDEX notes_user_tool_uidx ON notes (user_id, tool_id);")
op.execute("CREATE INDEX notes_tool_id_idx ON notes (tool_id);")
op.execute(
"CREATE UNIQUE INDEX notes_legacy_mongo_id_uidx "
"ON notes (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
)
op.execute(
"CREATE UNIQUE INDEX pending_tool_state_conv_user_uidx "
"ON pending_tool_state (conversation_id, user_id);"
)
op.execute(
"CREATE INDEX pending_tool_state_expires_idx ON pending_tool_state (expires_at);"
)
op.execute("CREATE INDEX prompts_user_id_idx ON prompts (user_id);")
op.execute(
"CREATE UNIQUE INDEX prompts_legacy_mongo_id_uidx "
"ON prompts (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
)
op.execute("CREATE INDEX shared_conversations_user_idx ON shared_conversations (user_id);")
op.execute("CREATE INDEX shared_conversations_conv_idx ON shared_conversations (conversation_id);")
op.execute(
"CREATE INDEX shared_conversations_prompt_id_idx ON shared_conversations (prompt_id);"
)
op.execute(
"CREATE UNIQUE INDEX shared_conversations_uuid_uidx ON shared_conversations (uuid);"
)
op.execute(
"CREATE UNIQUE INDEX shared_conversations_dedup_uidx "
"ON shared_conversations (conversation_id, user_id, is_promptable, first_n_queries, COALESCE(api_key, ''));"
)
op.execute("CREATE INDEX sources_user_idx ON sources (user_id);")
op.execute(
"CREATE UNIQUE INDEX sources_legacy_mongo_id_uidx "
"ON sources (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
)
op.execute(
"CREATE UNIQUE INDEX user_tools_legacy_mongo_id_uidx "
"ON user_tools (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
)
op.execute(
"CREATE UNIQUE INDEX agent_folders_legacy_mongo_id_uidx "
"ON agent_folders (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
)
op.execute("CREATE INDEX agent_folders_parent_idx ON agent_folders (parent_id);")
op.execute("CREATE INDEX agents_workflow_idx ON agents (workflow_id);")
op.execute('CREATE INDEX stack_logs_timestamp_idx ON stack_logs ("timestamp" DESC);')
op.execute('CREATE INDEX stack_logs_user_ts_idx ON stack_logs (user_id, "timestamp" DESC);')
op.execute('CREATE INDEX stack_logs_level_ts_idx ON stack_logs (level, "timestamp" DESC);')
op.execute("CREATE INDEX stack_logs_activity_idx ON stack_logs (activity_id);")
op.execute(
"CREATE UNIQUE INDEX stack_logs_mongo_id_uidx "
"ON stack_logs (mongo_id) WHERE mongo_id IS NOT NULL;"
)
op.execute("CREATE INDEX todos_user_tool_idx ON todos (user_id, tool_id);")
op.execute("CREATE INDEX todos_tool_id_idx ON todos (tool_id);")
op.execute(
"CREATE UNIQUE INDEX todos_legacy_mongo_id_uidx "
"ON todos (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
)
op.execute(
"CREATE UNIQUE INDEX todos_tool_todo_id_uidx "
"ON todos (tool_id, todo_id) WHERE todo_id IS NOT NULL;"
)
op.execute('CREATE INDEX token_usage_user_ts_idx ON token_usage (user_id, "timestamp" DESC);')
op.execute('CREATE INDEX token_usage_key_ts_idx ON token_usage (api_key, "timestamp" DESC);')
op.execute('CREATE INDEX token_usage_agent_ts_idx ON token_usage (agent_id, "timestamp" DESC);')
op.execute(
"CREATE UNIQUE INDEX token_usage_mongo_id_uidx "
"ON token_usage (mongo_id) WHERE mongo_id IS NOT NULL;"
)
op.execute('CREATE INDEX user_logs_user_ts_idx ON user_logs (user_id, "timestamp" DESC);')
op.execute(
"CREATE UNIQUE INDEX user_logs_mongo_id_uidx "
"ON user_logs (mongo_id) WHERE mongo_id IS NOT NULL;"
)
op.execute("CREATE INDEX user_tools_user_id_idx ON user_tools (user_id);")
op.execute("CREATE INDEX workflow_edges_from_node_idx ON workflow_edges (from_node_id);")
op.execute("CREATE INDEX workflow_edges_to_node_idx ON workflow_edges (to_node_id);")
op.execute(
"CREATE UNIQUE INDEX workflow_edges_wf_ver_eid_uidx "
"ON workflow_edges (workflow_id, graph_version, edge_id);"
)
op.execute(
"CREATE UNIQUE INDEX workflow_nodes_wf_ver_nid_uidx "
"ON workflow_nodes (workflow_id, graph_version, node_id);"
)
op.execute(
"CREATE UNIQUE INDEX workflow_nodes_legacy_mongo_id_uidx "
"ON workflow_nodes (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
)
op.execute("CREATE INDEX workflow_runs_workflow_idx ON workflow_runs (workflow_id);")
op.execute("CREATE INDEX workflow_runs_user_idx ON workflow_runs (user_id);")
op.execute(
"CREATE INDEX workflow_runs_status_started_idx "
"ON workflow_runs (status, started_at DESC);"
)
op.execute(
"CREATE UNIQUE INDEX workflow_runs_legacy_mongo_id_uidx "
"ON workflow_runs (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
)
op.execute("CREATE INDEX workflows_user_idx ON workflows (user_id);")
op.execute(
"CREATE UNIQUE INDEX workflows_legacy_mongo_id_uidx "
"ON workflows (legacy_mongo_id) WHERE legacy_mongo_id IS NOT NULL;"
)
# ------------------------------------------------------------------
# user_id foreign keys (deferrable so backfills can stage rows)
# ------------------------------------------------------------------
user_fk_tables = (
"agent_folders",
"agents",
"attachments",
"connector_sessions",
"conversation_messages",
"conversations",
"memories",
"notes",
"pending_tool_state",
"prompts",
"shared_conversations",
"sources",
"stack_logs",
"todos",
"token_usage",
"user_logs",
"user_tools",
"workflow_runs",
"workflows",
)
for table in user_fk_tables:
op.execute(
f"ALTER TABLE {table} "
f"ADD CONSTRAINT {table}_user_id_fk "
f"FOREIGN KEY (user_id) REFERENCES users(user_id) "
f"ON DELETE RESTRICT DEFERRABLE INITIALLY IMMEDIATE;"
)
# ------------------------------------------------------------------
# Triggers
# ------------------------------------------------------------------
updated_at_tables = (
"agent_folders",
"agents",
"conversation_messages",
"conversations",
"memories",
"notes",
"prompts",
"sources",
"todos",
"user_tools",
"users",
"workflows",
)
for table in updated_at_tables:
op.execute(
f"CREATE TRIGGER {table}_set_updated_at "
f"BEFORE UPDATE ON {table} "
f"FOR EACH ROW WHEN (OLD.* IS DISTINCT FROM NEW.*) "
f"EXECUTE FUNCTION set_updated_at();"
)
ensure_user_tables = (
"agent_folders",
"agents",
"attachments",
"connector_sessions",
"conversation_messages",
"conversations",
"memories",
"notes",
"pending_tool_state",
"prompts",
"shared_conversations",
"sources",
"stack_logs",
"todos",
"token_usage",
"user_logs",
"user_tools",
"workflow_runs",
"workflows",
)
for table in ensure_user_tables:
op.execute(
f"CREATE TRIGGER {table}_ensure_user "
f"BEFORE INSERT OR UPDATE OF user_id ON {table} "
f"FOR EACH ROW EXECUTE FUNCTION ensure_user_exists();"
)
op.execute(
"CREATE TRIGGER conversation_messages_fill_user "
"BEFORE INSERT ON conversation_messages "
"FOR EACH ROW EXECUTE FUNCTION conversation_messages_fill_user_id();"
)
op.execute(
"CREATE TRIGGER attachments_cleanup_message_refs "
"AFTER DELETE ON attachments "
"FOR EACH ROW EXECUTE FUNCTION cleanup_message_attachment_refs();"
)
op.execute(
"CREATE TRIGGER agents_cleanup_user_prefs "
"AFTER DELETE ON agents "
"FOR EACH ROW EXECUTE FUNCTION cleanup_user_agent_prefs();"
)
op.execute(
"CREATE TRIGGER sources_cleanup_agent_extra_refs "
"AFTER DELETE ON sources "
"FOR EACH ROW EXECUTE FUNCTION cleanup_agent_extra_source_refs();"
)
# ------------------------------------------------------------------
# Seed sentinel __system__ user (system/template sources attribute here)
# ------------------------------------------------------------------
op.execute(
"INSERT INTO users (user_id) VALUES ('__system__') "
"ON CONFLICT (user_id) DO NOTHING;"
)
def downgrade() -> None:
# Nuclear downgrade: drop everything this migration created. The
# ordering drops FK-bearing children before parents; CASCADE would
# also work but explicit ordering is easier to reason about in code
# review.
tables_in_drop_order = (
"workflow_edges",
"workflow_runs",
"workflow_nodes",
"workflows",
"pending_tool_state",
"shared_conversations",
"conversation_messages",
"conversations",
"connector_sessions",
"notes",
"todos",
"memories",
"attachments",
"agents",
"sources",
"agent_folders",
"stack_logs",
"user_logs",
"token_usage",
"user_tools",
"prompts",
"users",
)
for table in tables_in_drop_order:
op.execute(f"DROP TABLE IF EXISTS {table} CASCADE;")
for fn in (
"conversation_messages_fill_user_id",
"cleanup_user_agent_prefs",
"cleanup_agent_extra_source_refs",
"cleanup_message_attachment_refs",
"ensure_user_exists",
"set_updated_at",
):
op.execute(f"DROP FUNCTION IF EXISTS {fn}();")

View File

@@ -74,57 +74,76 @@ class AnswerResource(Resource, BaseAnswerResource):
decoded_token = getattr(request, "decoded_token", None)
processor = StreamProcessor(data, decoded_token)
try:
agent = processor.build_agent(data.get("question", ""))
if not processor.decoded_token:
return make_response({"error": "Unauthorized"}, 401)
# ---- Continuation mode ----
if data.get("tool_actions"):
(
agent,
messages,
tools_dict,
pending_tool_calls,
tool_actions,
) = processor.resume_from_tool_actions(
data["tool_actions"], data["conversation_id"]
)
if not processor.decoded_token:
return make_response({"error": "Unauthorized"}, 401)
if error := self.check_usage(processor.agent_config):
return error
stream = self.complete_stream(
question="",
agent=agent,
conversation_id=processor.conversation_id,
user_api_key=processor.agent_config.get("user_api_key"),
decoded_token=processor.decoded_token,
agent_id=processor.agent_id,
model_id=processor.model_id,
_continuation={
"messages": messages,
"tools_dict": tools_dict,
"pending_tool_calls": pending_tool_calls,
"tool_actions": tool_actions,
},
)
else:
# ---- Normal mode ----
agent = processor.build_agent(data.get("question", ""))
if not processor.decoded_token:
return make_response({"error": "Unauthorized"}, 401)
if error := self.check_usage(processor.agent_config):
return error
if error := self.check_usage(processor.agent_config):
return error
stream = self.complete_stream(
question=data["question"],
agent=agent,
conversation_id=processor.conversation_id,
user_api_key=processor.agent_config.get("user_api_key"),
decoded_token=processor.decoded_token,
isNoneDoc=data.get("isNoneDoc"),
index=None,
should_save_conversation=data.get("save_conversation", True),
agent_id=processor.agent_id,
is_shared_usage=processor.is_shared_usage,
shared_token=processor.shared_token,
model_id=processor.model_id,
)
stream = self.complete_stream(
question=data["question"],
agent=agent,
conversation_id=processor.conversation_id,
user_api_key=processor.agent_config.get("user_api_key"),
decoded_token=processor.decoded_token,
isNoneDoc=data.get("isNoneDoc"),
index=None,
should_save_conversation=data.get("save_conversation", True),
agent_id=processor.agent_id,
is_shared_usage=processor.is_shared_usage,
shared_token=processor.shared_token,
model_id=processor.model_id,
)
stream_result = self.process_response_stream(stream)
if len(stream_result) == 7:
(
conversation_id,
response,
sources,
tool_calls,
thought,
error,
structured_info,
) = stream_result
else:
conversation_id, response, sources, tool_calls, thought, error = (
stream_result
)
structured_info = None
if stream_result["error"]:
return make_response({"error": stream_result["error"]}, 400)
if error:
return make_response({"error": error}, 400)
result = {
"conversation_id": conversation_id,
"answer": response,
"sources": sources,
"tool_calls": tool_calls,
"thought": thought,
"conversation_id": stream_result["conversation_id"],
"answer": stream_result["answer"],
"sources": stream_result["sources"],
"tool_calls": stream_result["tool_calls"],
"thought": stream_result["thought"],
}
if structured_info:
result.update(structured_info)
extra_info = stream_result.get("extra")
if extra_info:
result.update(extra_info)
except Exception as e:
logger.error(
f"/api/answer - error: {str(e)} - traceback: {traceback.format_exc()}",

View File

@@ -6,6 +6,7 @@ from typing import Any, Dict, Generator, List, Optional
from flask import jsonify, make_response, Response
from flask_restx import Namespace
from application.api.answer.services.continuation_service import ContinuationService
from application.api.answer.services.conversation_service import ConversationService
from application.core.model_utils import (
get_api_key_for_provider,
@@ -13,10 +14,13 @@ from application.core.model_utils import (
get_provider_from_model_id,
)
from application.core.mongo_db import MongoDB
from application.core.settings import settings
from application.error import sanitize_api_error
from application.llm.llm_creator import LLMCreator
from application.storage.db.repositories.agents import AgentsRepository
from application.storage.db.repositories.token_usage import TokenUsageRepository
from application.storage.db.repositories.user_logs import UserLogsRepository
from application.storage.db.session import db_readonly, db_session
from application.utils import check_required_fields
logger = logging.getLogger(__name__)
@@ -29,17 +33,22 @@ class BaseAnswerResource:
"""Shared base class for answer endpoints"""
def __init__(self):
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
self.db = db
self.user_logs_collection = db["user_logs"]
self.default_model_id = get_default_model_id()
self.conversation_service = ConversationService()
def validate_request(
self, data: Dict[str, Any], require_conversation_id: bool = False
) -> Optional[Response]:
"""Common request validation"""
"""Common request validation.
Continuation requests (``tool_actions`` present) require
``conversation_id`` but not ``question``.
"""
if data.get("tool_actions"):
# Continuation mode — question is not required
if missing := check_required_fields(data, ["conversation_id"]):
return missing
return None
required_fields = ["question"]
if require_conversation_id:
required_fields.append("conversation_id")
@@ -81,8 +90,8 @@ class BaseAnswerResource:
api_key = agent_config.get("user_api_key")
if not api_key:
return None
agents_collection = self.db["agents"]
agent = agents_collection.find_one({"key": api_key})
with db_readonly() as conn:
agent = AgentsRepository(conn).find_by_key(api_key)
if not agent:
return make_response(
@@ -103,41 +112,32 @@ class BaseAnswerResource:
)
token_limit = int(
agent.get("token_limit", settings.DEFAULT_AGENT_LIMITS["token_limit"])
agent.get("token_limit") or settings.DEFAULT_AGENT_LIMITS["token_limit"]
)
request_limit = int(
agent.get("request_limit", settings.DEFAULT_AGENT_LIMITS["request_limit"])
agent.get("request_limit") or settings.DEFAULT_AGENT_LIMITS["request_limit"]
)
token_usage_collection = self.db["token_usage"]
end_date = datetime.datetime.now()
end_date = datetime.datetime.now(datetime.timezone.utc)
start_date = end_date - datetime.timedelta(hours=24)
match_query = {
"timestamp": {"$gte": start_date, "$lte": end_date},
"api_key": api_key,
}
if limited_token_mode:
token_pipeline = [
{"$match": match_query},
{
"$group": {
"_id": None,
"total_tokens": {
"$sum": {"$add": ["$prompt_tokens", "$generated_tokens"]}
},
}
},
]
token_result = list(token_usage_collection.aggregate(token_pipeline))
daily_token_usage = token_result[0]["total_tokens"] if token_result else 0
if limited_token_mode or limited_request_mode:
with db_readonly() as conn:
token_repo = TokenUsageRepository(conn)
if limited_token_mode:
daily_token_usage = token_repo.sum_tokens_in_range(
start=start_date, end=end_date, api_key=api_key,
)
else:
daily_token_usage = 0
if limited_request_mode:
daily_request_usage = token_repo.count_in_range(
start=start_date, end=end_date, api_key=api_key,
)
else:
daily_request_usage = 0
else:
daily_token_usage = 0
if limited_request_mode:
daily_request_usage = token_usage_collection.count_documents(match_query)
else:
daily_request_usage = 0
if not limited_token_mode and not limited_request_mode:
return None
@@ -177,6 +177,7 @@ class BaseAnswerResource:
is_shared_usage: bool = False,
shared_token: Optional[str] = None,
model_id: Optional[str] = None,
_continuation: Optional[Dict] = None,
) -> Generator[str, None, None]:
"""
Generator function that streams the complete conversation response.
@@ -207,8 +208,19 @@ class BaseAnswerResource:
schema_info = None
structured_chunks = []
query_metadata = {}
paused = False
for line in agent.gen(query=question):
if _continuation:
gen_iter = agent.gen_continuation(
messages=_continuation["messages"],
tools_dict=_continuation["tools_dict"],
pending_tool_calls=_continuation["pending_tool_calls"],
tool_actions=_continuation["tool_actions"],
)
else:
gen_iter = agent.gen(query=question)
for line in gen_iter:
if "metadata" in line:
query_metadata.update(line["metadata"])
elif "answer" in line:
@@ -244,15 +256,21 @@ class BaseAnswerResource:
data = json.dumps({"type": "thought", "thought": line["thought"]})
yield f"data: {data}\n\n"
elif "type" in line:
if line.get("type") == "error":
if line.get("type") == "tool_calls_pending":
# Save continuation state and end the stream
paused = True
data = json.dumps(line)
yield f"data: {data}\n\n"
elif line.get("type") == "error":
sanitized_error = {
"type": "error",
"error": sanitize_api_error(line.get("error", "An error occurred"))
}
data = json.dumps(sanitized_error)
yield f"data: {data}\n\n"
else:
data = json.dumps(line)
yield f"data: {data}\n\n"
yield f"data: {data}\n\n"
if is_structured and structured_chunks:
structured_data = {
"type": "structured_answer",
@@ -262,6 +280,93 @@ class BaseAnswerResource:
}
data = json.dumps(structured_data)
yield f"data: {data}\n\n"
# ---- Paused: save continuation state and end stream early ----
if paused:
continuation = getattr(agent, "_pending_continuation", None)
if continuation:
# Ensure we have a conversation_id — create a partial
# conversation if this is the first turn.
if not conversation_id and should_save_conversation:
try:
provider = (
get_provider_from_model_id(model_id)
if model_id
else settings.LLM_PROVIDER
)
sys_api_key = get_api_key_for_provider(
provider or settings.LLM_PROVIDER
)
llm = LLMCreator.create_llm(
provider or settings.LLM_PROVIDER,
api_key=sys_api_key,
user_api_key=user_api_key,
decoded_token=decoded_token,
model_id=model_id,
agent_id=agent_id,
)
conversation_id = (
self.conversation_service.save_conversation(
None,
question,
response_full,
thought,
source_log_docs,
tool_calls,
llm,
model_id or self.default_model_id,
decoded_token,
api_key=user_api_key,
agent_id=agent_id,
is_shared_usage=is_shared_usage,
shared_token=shared_token,
)
)
except Exception as e:
logger.error(
f"Failed to create conversation for continuation: {e}",
exc_info=True,
)
if conversation_id:
try:
cont_service = ContinuationService()
cont_service.save_state(
conversation_id=str(conversation_id),
user=decoded_token.get("sub", "local"),
messages=continuation["messages"],
pending_tool_calls=continuation["pending_tool_calls"],
tools_dict=continuation["tools_dict"],
tool_schemas=getattr(agent, "tools", []),
agent_config={
"model_id": model_id or self.default_model_id,
"llm_name": getattr(agent, "llm_name", settings.LLM_PROVIDER),
"api_key": getattr(agent, "api_key", None),
"user_api_key": user_api_key,
"agent_id": agent_id,
"agent_type": agent.__class__.__name__,
"prompt": getattr(agent, "prompt", ""),
"json_schema": getattr(agent, "json_schema", None),
"retriever_config": getattr(agent, "retriever_config", None),
},
client_tools=getattr(
agent.tool_executor, "client_tools", None
),
)
except Exception as e:
logger.error(
f"Failed to save continuation state: {str(e)}",
exc_info=True,
)
id_data = {"type": "id", "id": str(conversation_id)}
data = json.dumps(id_data)
yield f"data: {data}\n\n"
data = json.dumps({"type": "end"})
yield f"data: {data}\n\n"
return
if isNoneDoc:
for doc in source_log_docs:
doc["source"] = "None"
@@ -352,7 +457,18 @@ class BaseAnswerResource:
for key, value in log_data.items():
if isinstance(value, str) and len(value) > 10000:
log_data[key] = value[:10000]
self.user_logs_collection.insert_one(log_data)
try:
with db_session() as conn:
UserLogsRepository(conn).insert(
user_id=log_data.get("user"),
endpoint="stream_answer",
data=log_data,
)
except Exception as log_err:
logger.error(
f"Failed to persist stream_answer user log: {log_err}",
exc_info=True,
)
data = json.dumps({"type": "end"})
yield f"data: {data}\n\n"
@@ -425,8 +541,13 @@ class BaseAnswerResource:
yield f"data: {data}\n\n"
return
def process_response_stream(self, stream):
"""Process the stream response for non-streaming endpoint"""
def process_response_stream(self, stream) -> Dict[str, Any]:
"""Process the stream response for non-streaming endpoint.
Returns:
Dict with keys: conversation_id, answer, sources, tool_calls,
thought, error, and optional extra.
"""
conversation_id = ""
response_full = ""
source_log_docs = []
@@ -435,6 +556,7 @@ class BaseAnswerResource:
stream_ended = False
is_structured = False
schema_info = None
pending_tool_calls = None
for line in stream:
try:
@@ -453,11 +575,22 @@ class BaseAnswerResource:
source_log_docs = event["source"]
elif event["type"] == "tool_calls":
tool_calls = event["tool_calls"]
elif event["type"] == "tool_calls_pending":
pending_tool_calls = event.get("data", {}).get(
"pending_tool_calls", []
)
elif event["type"] == "thought":
thought = event["thought"]
elif event["type"] == "error":
logger.error(f"Error from stream: {event['error']}")
return None, None, None, None, event["error"], None
return {
"conversation_id": None,
"answer": None,
"sources": None,
"tool_calls": None,
"thought": None,
"error": event["error"],
}
elif event["type"] == "end":
stream_ended = True
except (json.JSONDecodeError, KeyError) as e:
@@ -465,18 +598,30 @@ class BaseAnswerResource:
continue
if not stream_ended:
logger.error("Stream ended unexpectedly without an 'end' event.")
return None, None, None, None, "Stream ended unexpectedly", None
result = (
conversation_id,
response_full,
source_log_docs,
tool_calls,
thought,
None,
)
return {
"conversation_id": None,
"answer": None,
"sources": None,
"tool_calls": None,
"thought": None,
"error": "Stream ended unexpectedly",
}
result: Dict[str, Any] = {
"conversation_id": conversation_id,
"answer": response_full,
"sources": source_log_docs,
"tool_calls": tool_calls,
"thought": thought,
"error": None,
}
if pending_tool_calls is not None:
result["extra"] = {"pending_tool_calls": pending_tool_calls}
if is_structured:
result = result + ({"structured": True, "schema": schema_info},)
result["extra"] = {"structured": True, "schema": schema_info}
return result
def error_stream_generate(self, err_response):

View File

@@ -4,11 +4,10 @@ from typing import Any, Dict, List
from flask import make_response, request
from flask_restx import fields, Resource
from bson.dbref import DBRef
from application.api.answer.routes.base import answer_ns
from application.core.mongo_db import MongoDB
from application.core.settings import settings
from application.storage.db.repositories.agents import AgentsRepository
from application.storage.db.session import db_readonly
from application.vectorstore.vector_creator import VectorCreator
logger = logging.getLogger(__name__)
@@ -18,12 +17,6 @@ logger = logging.getLogger(__name__)
class SearchResource(Resource):
"""Fast search endpoint for retrieving relevant documents"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
mongo = MongoDB.get_client()
self.db = mongo[settings.MONGO_DB_NAME]
self.agents_collection = self.db["agents"]
search_model = answer_ns.model(
"SearchModel",
{
@@ -40,37 +33,23 @@ class SearchResource(Resource):
)
def _get_sources_from_api_key(self, api_key: str) -> List[str]:
"""Get source IDs connected to the API key/agent.
"""
agent_data = self.agents_collection.find_one({"key": api_key})
"""Get source IDs connected to the API key/agent."""
with db_readonly() as conn:
agent_data = AgentsRepository(conn).find_by_key(api_key)
if not agent_data:
return []
source_ids = []
source_ids: List[str] = []
# extra_source_ids is a PG ARRAY(UUID) of source UUIDs.
extra = agent_data.get("extra_source_ids") or []
for src in extra:
if src:
source_ids.append(str(src))
# Handle multiple sources (only if non-empty)
sources = agent_data.get("sources", [])
if sources and isinstance(sources, list) and len(sources) > 0:
for source_ref in sources:
# Skip "default" - it's a placeholder, not an actual vectorstore
if source_ref == "default":
continue
elif isinstance(source_ref, DBRef):
source_doc = self.db.dereference(source_ref)
if source_doc:
source_ids.append(str(source_doc["_id"]))
# Handle single source (legacy) - check if sources was empty or didn't yield results
if not source_ids:
source = agent_data.get("source")
if isinstance(source, DBRef):
source_doc = self.db.dereference(source)
if source_doc:
source_ids.append(str(source_doc["_id"]))
# Skip "default" - it's a placeholder, not an actual vectorstore
elif source and source != "default":
source_ids.append(source)
single = agent_data.get("source_id")
if single:
source_ids.append(str(single))
return source_ids
@@ -161,7 +140,8 @@ class SearchResource(Resource):
return make_response({"error": "api_key is required"}, 400)
# Validate API key
agent = self.agents_collection.find_one({"key": api_key})
with db_readonly() as conn:
agent = AgentsRepository(conn).find_by_key(api_key)
if not agent:
return make_response({"error": "Invalid API key"}, 401)

View File

@@ -79,7 +79,47 @@ class StreamResource(Resource, BaseAnswerResource):
return error
decoded_token = getattr(request, "decoded_token", None)
processor = StreamProcessor(data, decoded_token)
try:
# ---- Continuation mode ----
if data.get("tool_actions"):
(
agent,
messages,
tools_dict,
pending_tool_calls,
tool_actions,
) = processor.resume_from_tool_actions(
data["tool_actions"], data["conversation_id"]
)
if not processor.decoded_token:
return Response(
self.error_stream_generate("Unauthorized"),
status=401,
mimetype="text/event-stream",
)
if error := self.check_usage(processor.agent_config):
return error
return Response(
self.complete_stream(
question="",
agent=agent,
conversation_id=processor.conversation_id,
user_api_key=processor.agent_config.get("user_api_key"),
decoded_token=processor.decoded_token,
agent_id=processor.agent_id,
model_id=processor.model_id,
_continuation={
"messages": messages,
"tools_dict": tools_dict,
"pending_tool_calls": pending_tool_calls,
"tool_actions": tool_actions,
},
),
mimetype="text/event-stream",
)
# ---- Normal mode ----
agent = processor.build_agent(data["question"])
if not processor.decoded_token:
return Response(

View File

@@ -1,5 +1,6 @@
"""Message reconstruction utilities for compression."""
import json
import logging
import uuid
from typing import Dict, List, Optional
@@ -49,28 +50,35 @@ class MessageBuilder:
if include_tool_calls and "tool_calls" in query:
for tool_call in query["tool_calls"]:
call_id = tool_call.get("call_id") or str(uuid.uuid4())
function_call_dict = {
"function_call": {
"name": tool_call.get("action_name"),
"args": tool_call.get("arguments"),
"call_id": call_id,
}
}
function_response_dict = {
"function_response": {
"name": tool_call.get("action_name"),
"response": {"result": tool_call.get("result")},
"call_id": call_id,
}
}
messages.append(
{"role": "assistant", "content": [function_call_dict]}
args = tool_call.get("arguments")
args_str = (
json.dumps(args)
if isinstance(args, dict)
else (args or "{}")
)
messages.append(
{"role": "tool", "content": [function_response_dict]}
messages.append({
"role": "assistant",
"content": None,
"tool_calls": [{
"id": call_id,
"type": "function",
"function": {
"name": tool_call.get("action_name", ""),
"arguments": args_str,
},
}],
})
result = tool_call.get("result")
result_str = (
json.dumps(result)
if not isinstance(result, str)
else (result or "")
)
messages.append({
"role": "tool",
"tool_call_id": call_id,
"content": result_str,
})
# If no recent queries (everything was compressed), add a continuation user message
if len(recent_queries) == 0 and compressed_summary:
@@ -180,28 +188,35 @@ class MessageBuilder:
if include_tool_calls and "tool_calls" in query:
for tool_call in query["tool_calls"]:
call_id = tool_call.get("call_id") or str(uuid.uuid4())
function_call_dict = {
"function_call": {
"name": tool_call.get("action_name"),
"args": tool_call.get("arguments"),
"call_id": call_id,
}
}
function_response_dict = {
"function_response": {
"name": tool_call.get("action_name"),
"response": {"result": tool_call.get("result")},
"call_id": call_id,
}
}
rebuilt_messages.append(
{"role": "assistant", "content": [function_call_dict]}
args = tool_call.get("arguments")
args_str = (
json.dumps(args)
if isinstance(args, dict)
else (args or "{}")
)
rebuilt_messages.append(
{"role": "tool", "content": [function_response_dict]}
rebuilt_messages.append({
"role": "assistant",
"content": None,
"tool_calls": [{
"id": call_id,
"type": "function",
"function": {
"name": tool_call.get("action_name", ""),
"arguments": args_str,
},
}],
})
result = tool_call.get("result")
result_str = (
json.dumps(result)
if not isinstance(result, str)
else (result or "")
)
rebuilt_messages.append({
"role": "tool",
"tool_call_id": call_id,
"content": result_str,
})
# If no recent queries (everything was compressed), add a continuation user message
if len(recent_queries) == 0 and compressed_summary:

View File

@@ -0,0 +1,157 @@
"""Service for saving and restoring tool-call continuation state.
When a stream pauses (tool needs approval or client-side execution),
the full execution state is persisted to Postgres so the client can
resume later by sending tool_actions.
"""
import logging
from typing import Any, Dict, List, Optional
from uuid import UUID
from application.storage.db.base_repository import looks_like_uuid
from application.storage.db.repositories.conversations import ConversationsRepository
from application.storage.db.repositories.pending_tool_state import (
PendingToolStateRepository,
)
from application.storage.db.session import db_readonly, db_session
logger = logging.getLogger(__name__)
# TTL for pending states — auto-cleaned after this period
PENDING_STATE_TTL_SECONDS = 30 * 60 # 30 minutes
def _make_serializable(obj: Any) -> Any:
"""Recursively coerce non-JSON values into JSON-safe forms.
Handles ``uuid.UUID`` (from PG columns), ``bytes``, and recurses into
dicts/lists. Post-Mongo-cutover the ObjectId branch is gone — none of
our writers produce them anymore.
"""
if isinstance(obj, UUID):
return str(obj)
if isinstance(obj, dict):
return {str(k): _make_serializable(v) for k, v in obj.items()}
if isinstance(obj, list):
return [_make_serializable(v) for v in obj]
if isinstance(obj, bytes):
return obj.decode("utf-8", errors="replace")
return obj
class ContinuationService:
"""Manages pending tool-call state in Postgres."""
def __init__(self):
# No-op constructor retained for call-site compatibility. State
# lives in Postgres now; each operation opens its own short-lived
# session rather than holding a connection on the service.
pass
def save_state(
self,
conversation_id: str,
user: str,
messages: List[Dict],
pending_tool_calls: List[Dict],
tools_dict: Dict,
tool_schemas: List[Dict],
agent_config: Dict,
client_tools: Optional[List[Dict]] = None,
) -> str:
"""Save execution state for later continuation.
``conversation_id`` may be a Postgres UUID or the legacy Mongo
``ObjectId`` string — the latter is resolved via
``conversations.legacy_mongo_id`` to find the matching row.
Args:
conversation_id: The conversation this state belongs to.
user: Owner user ID.
messages: Full messages array at the pause point.
pending_tool_calls: Tool calls awaiting client action.
tools_dict: Serializable tools configuration dict.
tool_schemas: LLM-formatted tool schemas (agent.tools).
agent_config: Config needed to recreate the agent on resume.
client_tools: Client-provided tool schemas for client-side execution.
Returns:
The string ID (conversation_id as provided) of the saved state.
"""
with db_session() as conn:
conv = ConversationsRepository(conn).get_by_legacy_id(conversation_id)
if conv is not None:
pg_conv_id = conv["id"]
elif looks_like_uuid(conversation_id):
pg_conv_id = conversation_id
else:
# Unresolvable legacy ObjectId — downstream ``CAST AS uuid``
# would raise and poison the save. Surface the mismatch so
# the caller can decide (the stream loop in routes/base.py
# already wraps this in try/except).
raise ValueError(
f"Cannot save continuation state: conversation_id "
f"{conversation_id!r} is neither a PG UUID nor a "
f"backfilled legacy Mongo id."
)
PendingToolStateRepository(conn).save_state(
pg_conv_id,
user,
messages=_make_serializable(messages),
pending_tool_calls=_make_serializable(pending_tool_calls),
tools_dict=_make_serializable(tools_dict),
tool_schemas=_make_serializable(tool_schemas),
agent_config=_make_serializable(agent_config),
client_tools=_make_serializable(client_tools) if client_tools else None,
)
logger.info(
f"Saved continuation state for conversation {conversation_id} "
f"with {len(pending_tool_calls)} pending tool call(s)"
)
return conversation_id
def load_state(
self, conversation_id: str, user: str
) -> Optional[Dict[str, Any]]:
"""Load pending continuation state.
Returns:
The state dict, or None if no pending state exists.
"""
with db_readonly() as conn:
conv = ConversationsRepository(conn).get_by_legacy_id(conversation_id)
if conv is not None:
pg_conv_id = conv["id"]
elif looks_like_uuid(conversation_id):
pg_conv_id = conversation_id
else:
# Unresolvable legacy ObjectId → no state can exist for it.
return None
doc = PendingToolStateRepository(conn).load_state(pg_conv_id, user)
if not doc:
return None
return doc
def delete_state(self, conversation_id: str, user: str) -> bool:
"""Delete pending state after successful resumption.
Returns:
True if a row was deleted.
"""
with db_session() as conn:
conv = ConversationsRepository(conn).get_by_legacy_id(conversation_id)
if conv is not None:
pg_conv_id = conv["id"]
elif looks_like_uuid(conversation_id):
pg_conv_id = conversation_id
else:
# Unresolvable legacy ObjectId → nothing to delete.
return False
deleted = PendingToolStateRepository(conn).delete_state(pg_conv_id, user)
if deleted:
logger.info(
f"Deleted continuation state for conversation {conversation_id}"
)
return deleted

View File

@@ -1,44 +1,51 @@
"""Conversation persistence service backed by Postgres.
Handles create / append / update / compression for conversations during
the answer-streaming path. Connections are opened per-operation rather
than held for the duration of a stream.
"""
import logging
from datetime import datetime, timezone
from typing import Any, Dict, List, Optional
from application.core.mongo_db import MongoDB
from sqlalchemy import text as sql_text
from application.core.settings import settings
from bson import ObjectId
from application.storage.db.base_repository import looks_like_uuid
from application.storage.db.repositories.agents import AgentsRepository
from application.storage.db.repositories.conversations import ConversationsRepository
from application.storage.db.session import db_readonly, db_session
logger = logging.getLogger(__name__)
class ConversationService:
def __init__(self):
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
self.conversations_collection = db["conversations"]
self.agents_collection = db["agents"]
def get_conversation(
self, conversation_id: str, user_id: str
) -> Optional[Dict[str, Any]]:
"""Retrieve a conversation with proper access control"""
"""Retrieve a conversation with owner-or-shared access control.
Returns a dict in the legacy Mongo shape — ``queries`` is a list
of message dicts (prompt/response/...) — for compatibility with
the streaming pipeline that consumes this shape.
"""
if not conversation_id or not user_id:
return None
try:
conversation = self.conversations_collection.find_one(
{
"_id": ObjectId(conversation_id),
"$or": [{"user": user_id}, {"shared_with": user_id}],
}
)
if not conversation:
logger.warning(
f"Conversation not found or unauthorized - ID: {conversation_id}, User: {user_id}"
)
return None
conversation["_id"] = str(conversation["_id"])
return conversation
with db_readonly() as conn:
repo = ConversationsRepository(conn)
conv = repo.get_any(conversation_id, user_id)
if conv is None:
logger.warning(
f"Conversation not found or unauthorized - ID: {conversation_id}, User: {user_id}"
)
return None
messages = repo.get_messages(str(conv["id"]))
conv["queries"] = messages
conv["_id"] = str(conv["id"])
return conv
except Exception as e:
logger.error(f"Error fetching conversation: {str(e)}", exc_info=True)
return None
@@ -62,7 +69,11 @@ class ConversationService:
attachment_ids: Optional[List[str]] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> str:
"""Save or update a conversation in the database"""
"""Save or update a conversation in Postgres.
Returns the string conversation id (PG UUID as string, or the
caller-provided id if it was already a UUID).
"""
if decoded_token is None:
raise ValueError("Invalid or missing authentication token")
user_id = decoded_token.get("sub")
@@ -70,78 +81,47 @@ class ConversationService:
raise ValueError("User ID not found in token")
current_time = datetime.now(timezone.utc)
# clean up in sources array such that we save max 1k characters for text part
# Trim huge inline source text to a reasonable max before persist.
for source in sources:
if "text" in source and isinstance(source["text"], str):
source["text"] = source["text"][:1000]
message_payload = {
"prompt": question,
"response": response,
"thought": thought,
"sources": sources,
"tool_calls": tool_calls,
"attachments": attachment_ids,
"model_id": model_id,
"timestamp": current_time,
}
if metadata:
message_payload["metadata"] = metadata
if conversation_id is not None and index is not None:
# Update existing conversation with new query
result = self.conversations_collection.update_one(
{
"_id": ObjectId(conversation_id),
"user": user_id,
f"queries.{index}": {"$exists": True},
},
{
"$set": {
f"queries.{index}.prompt": question,
f"queries.{index}.response": response,
f"queries.{index}.thought": thought,
f"queries.{index}.sources": sources,
f"queries.{index}.tool_calls": tool_calls,
f"queries.{index}.timestamp": current_time,
f"queries.{index}.attachments": attachment_ids,
f"queries.{index}.model_id": model_id,
**(
{f"queries.{index}.metadata": metadata}
if metadata
else {}
),
}
},
)
if result.matched_count == 0:
raise ValueError("Conversation not found or unauthorized")
self.conversations_collection.update_one(
{
"_id": ObjectId(conversation_id),
"user": user_id,
f"queries.{index}": {"$exists": True},
},
{"$push": {"queries": {"$each": [], "$slice": index + 1}}},
)
with db_session() as conn:
repo = ConversationsRepository(conn)
conv = repo.get_any(conversation_id, user_id)
if conv is None:
raise ValueError("Conversation not found or unauthorized")
conv_pg_id = str(conv["id"])
repo.update_message_at(conv_pg_id, index, message_payload)
repo.truncate_after(conv_pg_id, index)
return conversation_id
elif conversation_id:
# Append new message to existing conversation
result = self.conversations_collection.update_one(
{"_id": ObjectId(conversation_id), "user": user_id},
{
"$push": {
"queries": {
"prompt": question,
"response": response,
"thought": thought,
"sources": sources,
"tool_calls": tool_calls,
"timestamp": current_time,
"attachments": attachment_ids,
"model_id": model_id,
**({"metadata": metadata} if metadata else {}),
}
}
},
)
if result.matched_count == 0:
raise ValueError("Conversation not found or unauthorized")
with db_session() as conn:
repo = ConversationsRepository(conn)
conv = repo.get_any(conversation_id, user_id)
if conv is None:
raise ValueError("Conversation not found or unauthorized")
conv_pg_id = str(conv["id"])
# append_message expects 'metadata' key either way; normalise.
append_payload = dict(message_payload)
append_payload.setdefault("metadata", metadata or {})
repo.append_message(conv_pg_id, append_payload)
return conversation_id
else:
# Create new conversation
messages_summary = [
{
"role": "system",
@@ -163,70 +143,64 @@ class ConversationService:
if not completion or not completion.strip():
completion = question[:50] if question else "New Conversation"
query_doc = {
"prompt": question,
"response": response,
"thought": thought,
"sources": sources,
"tool_calls": tool_calls,
"timestamp": current_time,
"attachments": attachment_ids,
"model_id": model_id,
}
if metadata:
query_doc["metadata"] = metadata
conversation_data = {
"user": user_id,
"date": current_time,
"name": completion,
"queries": [query_doc],
}
resolved_api_key: Optional[str] = None
resolved_agent_id: Optional[str] = None
if api_key:
if agent_id:
conversation_data["agent_id"] = agent_id
if is_shared_usage:
conversation_data["is_shared_usage"] = is_shared_usage
conversation_data["shared_token"] = shared_token
agent = self.agents_collection.find_one({"key": api_key})
with db_readonly() as conn:
agent = AgentsRepository(conn).find_by_key(api_key)
if agent:
conversation_data["api_key"] = agent["key"]
result = self.conversations_collection.insert_one(conversation_data)
return str(result.inserted_id)
resolved_api_key = agent.get("key")
if agent_id:
resolved_agent_id = agent_id
with db_session() as conn:
repo = ConversationsRepository(conn)
conv = repo.create(
user_id,
completion,
agent_id=resolved_agent_id,
api_key=resolved_api_key,
is_shared_usage=bool(resolved_agent_id and is_shared_usage),
shared_token=(
shared_token
if (resolved_agent_id and is_shared_usage)
else None
),
)
conv_pg_id = str(conv["id"])
append_payload = dict(message_payload)
append_payload.setdefault("metadata", metadata or {})
repo.append_message(conv_pg_id, append_payload)
return conv_pg_id
def update_compression_metadata(
self, conversation_id: str, compression_metadata: Dict[str, Any]
) -> None:
"""
Update conversation with compression metadata.
"""Persist compression flags and append a compression point.
Uses $push with $slice to keep only the most recent compression points,
preventing unbounded array growth. Since each compression incorporates
previous compressions, older points become redundant.
Args:
conversation_id: Conversation ID
compression_metadata: Compression point data
Mirrors the Mongo-era ``$set`` + ``$push $slice`` on
``compression_metadata`` but goes through the PG repo API.
"""
try:
self.conversations_collection.update_one(
{"_id": ObjectId(conversation_id)},
{
"$set": {
"compression_metadata.is_compressed": True,
"compression_metadata.last_compression_at": compression_metadata.get(
"timestamp"
),
},
"$push": {
"compression_metadata.compression_points": {
"$each": [compression_metadata],
"$slice": -settings.COMPRESSION_MAX_HISTORY_POINTS,
}
},
},
)
with db_session() as conn:
repo = ConversationsRepository(conn)
# conversation_id here comes from the streaming pipeline
# which has already resolved it; accept either UUID or
# legacy id for safety.
conv = repo.get_by_legacy_id(conversation_id)
conv_pg_id = (
str(conv["id"]) if conv is not None else conversation_id
)
repo.set_compression_flags(
conv_pg_id,
is_compressed=True,
last_compression_at=compression_metadata.get("timestamp"),
)
repo.append_compression_point(
conv_pg_id,
compression_metadata,
max_points=settings.COMPRESSION_MAX_HISTORY_POINTS,
)
logger.info(
f"Updated compression metadata for conversation {conversation_id}"
)
@@ -239,34 +213,34 @@ class ConversationService:
def append_compression_message(
self, conversation_id: str, compression_metadata: Dict[str, Any]
) -> None:
"""
Append a synthetic compression summary entry into the conversation history.
This makes the summary visible in the DB alongside normal queries.
"""
"""Append a synthetic compression summary message to the conversation."""
try:
summary = compression_metadata.get("compressed_summary", "")
if not summary:
return
timestamp = compression_metadata.get("timestamp", datetime.now(timezone.utc))
self.conversations_collection.update_one(
{"_id": ObjectId(conversation_id)},
{
"$push": {
"queries": {
"prompt": "[Context Compression Summary]",
"response": summary,
"thought": "",
"sources": [],
"tool_calls": [],
"timestamp": timestamp,
"attachments": [],
"model_id": compression_metadata.get("model_used"),
}
}
},
timestamp = compression_metadata.get(
"timestamp", datetime.now(timezone.utc)
)
with db_session() as conn:
repo = ConversationsRepository(conn)
conv = repo.get_by_legacy_id(conversation_id)
conv_pg_id = (
str(conv["id"]) if conv is not None else conversation_id
)
repo.append_message(conv_pg_id, {
"prompt": "[Context Compression Summary]",
"response": summary,
"thought": "",
"sources": [],
"tool_calls": [],
"attachments": [],
"model_id": compression_metadata.get("model_used"),
"timestamp": timestamp,
})
logger.info(
f"Appended compression summary to conversation {conversation_id}"
)
logger.info(f"Appended compression summary to conversation {conversation_id}")
except Exception as e:
logger.error(
f"Error appending compression summary: {str(e)}", exc_info=True
@@ -275,20 +249,30 @@ class ConversationService:
def get_compression_metadata(
self, conversation_id: str
) -> Optional[Dict[str, Any]]:
"""
Get compression metadata for a conversation.
Args:
conversation_id: Conversation ID
Returns:
Compression metadata dict or None
"""
"""Fetch the stored compression metadata JSONB blob for a conversation."""
try:
conversation = self.conversations_collection.find_one(
{"_id": ObjectId(conversation_id)}, {"compression_metadata": 1}
)
return conversation.get("compression_metadata") if conversation else None
with db_readonly() as conn:
repo = ConversationsRepository(conn)
conv = repo.get_by_legacy_id(conversation_id)
if conv is None:
# Fallback to UUID lookup without user scoping — the
# caller already holds an authenticated conversation
# id from the streaming path. Gate on id shape so a
# non-UUID (legacy ObjectId that wasn't backfilled)
# doesn't reach CAST — the cast raises and spams the
# logs with a stack trace on every call.
if not looks_like_uuid(conversation_id):
return None
result = conn.execute(
sql_text(
"SELECT compression_metadata FROM conversations "
"WHERE id = CAST(:id AS uuid)"
),
{"id": conversation_id},
)
row = result.fetchone()
return row[0] if row is not None else None
return conv.get("compression_metadata") if conv else None
except Exception as e:
logger.error(
f"Error getting compression metadata: {str(e)}", exc_info=True

View File

@@ -5,10 +5,6 @@ import os
from pathlib import Path
from typing import Any, Dict, Optional, Set
from bson.dbref import DBRef
from bson.objectid import ObjectId
from application.agents.agent_creator import AgentCreator
from application.api.answer.services.compression import CompressionOrchestrator
from application.api.answer.services.compression.token_counter import TokenCounter
@@ -20,8 +16,16 @@ from application.core.model_utils import (
get_provider_from_model_id,
validate_model_id,
)
from application.core.mongo_db import MongoDB
from application.core.settings import settings
from sqlalchemy import text as sql_text
from application.storage.db.base_repository import looks_like_uuid, row_to_dict
from application.storage.db.repositories.agents import AgentsRepository
from application.storage.db.repositories.attachments import AttachmentsRepository
from application.storage.db.repositories.prompts import PromptsRepository
from application.storage.db.repositories.sources import SourcesRepository
from application.storage.db.repositories.user_tools import UserToolsRepository
from application.storage.db.session import db_readonly, db_session
from application.retriever.retriever_creator import RetrieverCreator
from application.utils import (
calculate_doc_token_budget,
@@ -32,28 +36,41 @@ logger = logging.getLogger(__name__)
def get_prompt(prompt_id: str, prompts_collection=None) -> str:
"""Get a prompt by preset name or Postgres ID (UUID or legacy ObjectId).
The ``prompts_collection`` parameter is retained for backwards
compatibility with call sites that still pass it positionally; it is
ignored post-cutover.
"""
Get a prompt by preset name or MongoDB ID
"""
del prompts_collection # unused — retained for call-site compatibility
# Callers may pass a ``uuid.UUID`` (from a PG ``prompt_id`` column) or a
# plain string ("default"/"creative"/legacy ObjectId). Normalise to str
# so both the preset lookup and the UUID-vs-legacy branching work.
# ``None`` / empty means "use the default prompt" — agents that never
# set a custom prompt land here (PG ``agents.prompt_id`` is NULL).
if prompt_id is None or prompt_id == "":
prompt_id = "default"
elif not isinstance(prompt_id, str):
prompt_id = str(prompt_id)
current_dir = Path(__file__).resolve().parents[3]
prompts_dir = current_dir / "prompts"
# Maps for classic agent types
CLASSIC_PRESETS = {
"default": "chat_combine_default.txt",
"creative": "chat_combine_creative.txt",
"strict": "chat_combine_strict.txt",
"reduce": "chat_reduce_prompt.txt",
}
# Agentic counterparts — same styles, but with search tool instructions
AGENTIC_PRESETS = {
"default": "agentic/default.txt",
"creative": "agentic/creative.txt",
"strict": "agentic/strict.txt",
}
preset_mapping = {**CLASSIC_PRESETS, **{f"agentic_{k}": v for k, v in AGENTIC_PRESETS.items()}}
preset_mapping = {
**CLASSIC_PRESETS,
**{f"agentic_{k}": v for k, v in AGENTIC_PRESETS.items()},
}
if prompt_id in preset_mapping:
file_path = os.path.join(prompts_dir, preset_mapping[prompt_id])
@@ -63,14 +80,18 @@ def get_prompt(prompt_id: str, prompts_collection=None) -> str:
except FileNotFoundError:
raise FileNotFoundError(f"Prompt file not found: {file_path}")
try:
if prompts_collection is None:
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
prompts_collection = db["prompts"]
prompt_doc = prompts_collection.find_one({"_id": ObjectId(prompt_id)})
with db_readonly() as conn:
repo = PromptsRepository(conn)
prompt_doc = None
if looks_like_uuid(prompt_id):
prompt_doc = repo.get_for_rendering(prompt_id)
if prompt_doc is None:
prompt_doc = repo.get_by_legacy_id(prompt_id)
if not prompt_doc:
raise ValueError(f"Prompt with ID {prompt_id} not found")
return prompt_doc["content"]
except ValueError:
raise
except Exception as e:
raise ValueError(f"Invalid prompt ID: {prompt_id}") from e
@@ -79,12 +100,9 @@ class StreamProcessor:
def __init__(
self, request_data: Dict[str, Any], decoded_token: Optional[Dict[str, Any]]
):
mongo = MongoDB.get_client()
self.db = mongo[settings.MONGO_DB_NAME]
self.agents_collection = self.db["agents"]
self.attachments_collection = self.db["attachments"]
self.prompts_collection = self.db["prompts"]
# Legacy attribute retained as None for any external callers that
# introspect the processor; all DB access uses per-op connections.
self.prompts_collection = None
self.data = request_data
self.decoded_token = decoded_token
self.initial_user_id = (
@@ -112,6 +130,7 @@ class StreamProcessor:
self._required_tool_actions: Optional[Dict[str, Set[Optional[str]]]] = None
self.compressed_summary: Optional[str] = None
self.compressed_summary_tokens: int = 0
self._agent_data: Optional[Dict[str, Any]] = None
def initialize(self):
"""Initialize all required components for processing"""
@@ -243,17 +262,21 @@ class StreamProcessor:
if not attachment_ids:
return []
attachments = []
for attachment_id in attachment_ids:
try:
attachment_doc = self.attachments_collection.find_one(
{"_id": ObjectId(attachment_id), "user": user_id}
)
if attachment_doc:
attachments.append(attachment_doc)
except Exception as e:
logger.error(
f"Error retrieving attachment {attachment_id}: {e}", exc_info=True
)
try:
with db_readonly() as conn:
repo = AttachmentsRepository(conn)
for attachment_id in attachment_ids:
try:
attachment_doc = repo.get_any(str(attachment_id), user_id)
if attachment_doc:
attachments.append(attachment_doc)
except Exception as e:
logger.error(
f"Error retrieving attachment {attachment_id}: {e}",
exc_info=True,
)
except Exception as e:
logger.error(f"Error opening attachments connection: {e}", exc_info=True)
return attachments
def _validate_and_set_model(self):
@@ -284,97 +307,127 @@ class StreamProcessor:
self.model_id = get_default_model_id()
def _get_agent_key(self, agent_id: Optional[str], user_id: Optional[str]) -> tuple:
"""Get API key for agent with access control"""
"""Get API key for agent with access control."""
if not agent_id:
return None, False, None
try:
agent = self.agents_collection.find_one({"_id": ObjectId(agent_id)})
with db_readonly() as conn:
# Lookup without user scoping — access control is done
# against ``user_id`` / ``shared_with`` / ``shared`` flags
# right below, matching the legacy Mongo semantics.
repo = AgentsRepository(conn)
agent = None
if looks_like_uuid(str(agent_id)):
result = conn.execute(
sql_text(
"SELECT * FROM agents WHERE id = CAST(:id AS uuid)"
),
{"id": str(agent_id)},
)
row = result.fetchone()
if row is not None:
agent = row_to_dict(row)
if agent is None:
agent = repo.get_by_legacy_id(str(agent_id))
if agent is None:
raise Exception("Agent not found")
is_owner = agent.get("user") == user_id
is_shared_with_user = agent.get(
"shared_publicly", False
) or user_id in agent.get("shared_with", [])
agent_owner = agent.get("user_id")
is_owner = agent_owner == user_id
is_shared_with_user = bool(agent.get("shared", False))
if not (is_owner or is_shared_with_user):
raise Exception("Unauthorized access to the agent")
if is_owner:
self.agents_collection.update_one(
{"_id": ObjectId(agent_id)},
{
"$set": {
"lastUsedAt": datetime.datetime.now(datetime.timezone.utc)
}
},
)
return str(agent["key"]), not is_owner, agent.get("shared_token")
now = datetime.datetime.now(datetime.timezone.utc)
try:
with db_session() as conn:
AgentsRepository(conn).update(
str(agent["id"]), agent_owner,
{"last_used_at": now},
)
except Exception:
logger.warning(
"Failed to update last_used_at for agent",
exc_info=True,
)
return (
str(agent["key"]) if agent.get("key") else None,
not is_owner,
agent.get("shared_token"),
)
except Exception as e:
logger.error(f"Error in get_agent_key: {str(e)}", exc_info=True)
raise
def _get_data_from_api_key(self, api_key: str) -> Dict[str, Any]:
data = self.agents_collection.find_one({"key": api_key})
if not data:
raise Exception("Invalid API Key, please generate a new key", 401)
source = data.get("source")
if isinstance(source, DBRef):
source_doc = self.db.dereference(source)
if source_doc:
data["source"] = str(source_doc["_id"])
data["retriever"] = source_doc.get("retriever", data.get("retriever"))
data["chunks"] = source_doc.get("chunks", data.get("chunks"))
with db_readonly() as conn:
agent = AgentsRepository(conn).find_by_key(api_key)
if not agent:
raise Exception("Invalid API Key, please generate a new key", 401)
sources_repo = SourcesRepository(conn)
# The repo dict uses "user_id" — the streaming path expects
# a "user" key (legacy Mongo shape) for identity propagation.
data: Dict[str, Any] = dict(agent)
data["user"] = agent.get("user_id")
# Resolve the primary source row (if any) for retriever/chunks.
source_id = agent.get("source_id")
if source_id:
source_doc = sources_repo.get(str(source_id), agent.get("user_id"))
if source_doc:
data["source"] = str(source_doc["id"])
data["retriever"] = source_doc.get(
"retriever", data.get("retriever")
)
data["chunks"] = source_doc.get("chunks", data.get("chunks"))
else:
data["source"] = None
else:
data["source"] = None
elif source == "default":
data["source"] = "default"
else:
data["source"] = None
sources = data.get("sources", [])
if sources and isinstance(sources, list):
sources_list = []
for i, source_ref in enumerate(sources):
if source_ref == "default":
processed_source = {
"id": "default",
"retriever": "classic",
"chunks": data.get("chunks", "2"),
}
sources_list.append(processed_source)
elif isinstance(source_ref, DBRef):
source_doc = self.db.dereference(source_ref)
extra = agent.get("extra_source_ids") or []
if extra:
for sid in extra:
source_doc = sources_repo.get(str(sid), agent.get("user_id"))
if source_doc:
processed_source = {
"id": str(source_doc["_id"]),
"retriever": source_doc.get("retriever", "classic"),
"chunks": source_doc.get("chunks", data.get("chunks", "2")),
}
sources_list.append(processed_source)
data["sources"] = sources_list
else:
data["sources"] = []
sources_list.append(
{
"id": str(source_doc["id"]),
"retriever": source_doc.get("retriever", "classic"),
"chunks": source_doc.get(
"chunks", data.get("chunks", "2")
),
}
)
data["sources"] = sources_list
data["default_model_id"] = data.get("default_model_id", "")
return data
def _configure_source(self):
"""Configure the source based on agent data"""
api_key = self.data.get("api_key") or self.agent_key
"""Configure the source based on agent data.
if api_key:
agent_data = self._get_data_from_api_key(api_key)
The literal string ``"default"`` is a placeholder meaning "no
ingested source" and is normalized to an empty source so that no
retrieval is attempted.
"""
if self._agent_data:
agent_data = self._agent_data
if agent_data.get("sources") and len(agent_data["sources"]) > 0:
source_ids = [
source["id"] for source in agent_data["sources"] if source.get("id")
source["id"]
for source in agent_data["sources"]
if source.get("id") and source["id"] != "default"
]
if source_ids:
self.source = {"active_docs": source_ids}
else:
self.source = {}
self.all_sources = agent_data["sources"]
elif agent_data.get("source"):
self.all_sources = [
s for s in agent_data["sources"] if s.get("id") != "default"
]
elif agent_data.get("source") and agent_data["source"] != "default":
self.source = {"active_docs": agent_data["source"]}
self.all_sources = [
{
@@ -387,11 +440,24 @@ class StreamProcessor:
self.all_sources = []
return
if "active_docs" in self.data:
self.source = {"active_docs": self.data["active_docs"]}
active_docs = self.data["active_docs"]
if active_docs and active_docs != "default":
self.source = {"active_docs": active_docs}
else:
self.source = {}
return
self.source = {}
self.all_sources = []
def _has_active_docs(self) -> bool:
"""Return True if a real document source is configured for retrieval."""
active_docs = self.source.get("active_docs") if self.source else None
if not active_docs:
return False
if active_docs == "default":
return False
return True
def _resolve_agent_id(self) -> Optional[str]:
"""Resolve agent_id from request, then fall back to conversation context."""
request_agent_id = self.data.get("agent_id")
@@ -433,48 +499,45 @@ class StreamProcessor:
effective_key = self.data.get("api_key") or self.agent_key
if effective_key:
data_key = self._get_data_from_api_key(effective_key)
if data_key.get("_id"):
self.agent_id = str(data_key.get("_id"))
self._agent_data = self._get_data_from_api_key(effective_key)
if self._agent_data.get("_id"):
self.agent_id = str(self._agent_data.get("_id"))
self.agent_config.update(
{
"prompt_id": data_key.get("prompt_id", "default"),
"agent_type": data_key.get("agent_type", settings.AGENT_NAME),
"prompt_id": self._agent_data.get("prompt_id", "default"),
"agent_type": self._agent_data.get("agent_type", settings.AGENT_NAME),
"user_api_key": effective_key,
"json_schema": data_key.get("json_schema"),
"default_model_id": data_key.get("default_model_id", ""),
"models": data_key.get("models", []),
"json_schema": self._agent_data.get("json_schema"),
"default_model_id": self._agent_data.get("default_model_id", ""),
"models": self._agent_data.get("models", []),
"allow_system_prompt_override": self._agent_data.get(
"allow_system_prompt_override", False
),
}
)
# Set identity context
if self.data.get("api_key"):
# External API key: use the key owner's identity
self.initial_user_id = data_key.get("user")
self.decoded_token = {"sub": data_key.get("user")}
self.initial_user_id = self._agent_data.get("user")
self.decoded_token = {"sub": self._agent_data.get("user")}
elif self.is_shared_usage:
# Shared agent: keep the caller's identity
pass
else:
# Owner using their own agent
self.decoded_token = {"sub": data_key.get("user")}
self.decoded_token = {"sub": self._agent_data.get("user")}
if data_key.get("source"):
self.source = {"active_docs": data_key["source"]}
if data_key.get("workflow"):
self.agent_config["workflow"] = data_key["workflow"]
self.agent_config["workflow_owner"] = data_key.get("user")
if data_key.get("retriever"):
self.retriever_config["retriever_name"] = data_key["retriever"]
if data_key.get("chunks") is not None:
try:
self.retriever_config["chunks"] = int(data_key["chunks"])
except (ValueError, TypeError):
logger.warning(
f"Invalid chunks value: {data_key['chunks']}, using default value 2"
)
self.retriever_config["chunks"] = 2
# PG row exposes the workflow as ``workflow_id`` (UUID column);
# legacy Mongo shape used the key ``workflow``. Accept either so
# API-key-invoked workflow agents bind correctly downstream.
wf_ref = self._agent_data.get("workflow") or self._agent_data.get(
"workflow_id"
)
if wf_ref:
self.agent_config["workflow"] = str(wf_ref)
self.agent_config["workflow_owner"] = self._agent_data.get("user")
else:
# No API key — default/workflow configuration
agent_type = settings.AGENT_NAME
@@ -497,14 +560,45 @@ class StreamProcessor:
)
def _configure_retriever(self):
"""Assemble retriever config with precedence: request > agent > default."""
doc_token_limit = calculate_doc_token_budget(model_id=self.model_id)
# Start with defaults
retriever_name = "classic"
chunks = 2
# Layer agent-level config (if present)
if self._agent_data:
if self._agent_data.get("retriever"):
retriever_name = self._agent_data["retriever"]
if self._agent_data.get("chunks") is not None:
try:
chunks = int(self._agent_data["chunks"])
except (ValueError, TypeError):
logger.warning(
f"Invalid agent chunks value: {self._agent_data['chunks']}, "
"using default value 2"
)
# Explicit request values win over agent config
if "retriever" in self.data:
retriever_name = self.data["retriever"]
if "chunks" in self.data:
try:
chunks = int(self.data["chunks"])
except (ValueError, TypeError):
logger.warning(
f"Invalid request chunks value: {self.data['chunks']}, "
"using default value 2"
)
self.retriever_config = {
"retriever_name": self.data.get("retriever", "classic"),
"chunks": int(self.data.get("chunks", 2)),
"retriever_name": retriever_name,
"chunks": chunks,
"doc_token_limit": doc_token_limit,
}
# isNoneDoc without an API key forces no retrieval
api_key = self.data.get("api_key") or self.agent_key
if not api_key and "isNoneDoc" in self.data and self.data["isNoneDoc"]:
self.retriever_config["chunks"] = 0
@@ -528,6 +622,9 @@ class StreamProcessor:
if self.data.get("isNoneDoc", False) and not self.agent_id:
logger.info("Pre-fetch skipped: isNoneDoc=True")
return None, None
if not self._has_active_docs():
logger.info("Pre-fetch skipped: no active docs configured")
return None, None
try:
retriever = self.create_retriever()
logger.info(
@@ -574,12 +671,9 @@ class StreamProcessor:
filtering_enabled = required_tool_actions is not None
try:
user_tools_collection = self.db["user_tools"]
user_id = self.initial_user_id or "local"
user_tools = list(
user_tools_collection.find({"user": user_id, "status": True})
)
with db_readonly() as conn:
user_tools = UserToolsRepository(conn).list_active_for_user(user_id)
if not user_tools:
return None
@@ -771,6 +865,121 @@ class StreamProcessor:
logger.warning(f"Failed to fetch memory tool data: {str(e)}")
return None
def resume_from_tool_actions(
self,
tool_actions: list,
conversation_id: str,
):
"""Resume a paused agent from saved continuation state.
Loads the pending state from MongoDB, recreates the agent with
the saved configuration, and returns an agent ready to call
``gen_continuation()``.
Args:
tool_actions: Client-provided actions (approvals / results).
conversation_id: The conversation being resumed.
Returns:
Tuple of (agent, messages, tools_dict, pending_tool_calls, tool_actions).
"""
from application.api.answer.services.continuation_service import (
ContinuationService,
)
from application.agents.agent_creator import AgentCreator
from application.agents.tool_executor import ToolExecutor
from application.llm.handlers.handler_creator import LLMHandlerCreator
from application.llm.llm_creator import LLMCreator
cont_service = ContinuationService()
state = cont_service.load_state(conversation_id, self.initial_user_id)
if not state:
raise ValueError("No pending tool state found for this conversation")
messages = state["messages"]
pending_tool_calls = state["pending_tool_calls"]
tools_dict = state["tools_dict"]
tool_schemas = state.get("tool_schemas", [])
agent_config = state["agent_config"]
model_id = agent_config.get("model_id")
llm_name = agent_config.get("llm_name", settings.LLM_PROVIDER)
api_key = agent_config.get("api_key")
user_api_key = agent_config.get("user_api_key")
agent_id = agent_config.get("agent_id")
prompt = agent_config.get("prompt", "")
json_schema = agent_config.get("json_schema")
retriever_config = agent_config.get("retriever_config")
# Recreate dependencies
system_api_key = api_key or get_api_key_for_provider(llm_name)
llm = LLMCreator.create_llm(
llm_name,
api_key=system_api_key,
user_api_key=user_api_key,
decoded_token=self.decoded_token,
model_id=model_id,
agent_id=agent_id,
)
llm_handler = LLMHandlerCreator.create_handler(llm_name or "default")
tool_executor = ToolExecutor(
user_api_key=user_api_key,
user=self.initial_user_id,
decoded_token=self.decoded_token,
)
tool_executor.conversation_id = conversation_id
# Restore client tools so they stay available for subsequent LLM calls
saved_client_tools = state.get("client_tools")
if saved_client_tools:
tool_executor.client_tools = saved_client_tools
# Re-merge into tools_dict (they may have been stripped during serialization)
tool_executor.merge_client_tools(tools_dict, saved_client_tools)
agent_type = agent_config.get("agent_type", "ClassicAgent")
# Map class names back to agent creator keys
type_map = {
"ClassicAgent": "classic",
"AgenticAgent": "agentic",
"ResearchAgent": "research",
"WorkflowAgent": "workflow",
}
agent_key = type_map.get(agent_type, "classic")
agent_kwargs = {
"endpoint": "stream",
"llm_name": llm_name,
"model_id": model_id,
"api_key": system_api_key,
"agent_id": agent_id,
"user_api_key": user_api_key,
"prompt": prompt,
"chat_history": [],
"decoded_token": self.decoded_token,
"json_schema": json_schema,
"llm": llm,
"llm_handler": llm_handler,
"tool_executor": tool_executor,
}
if agent_key in ("agentic", "research") and retriever_config:
agent_kwargs["retriever_config"] = retriever_config
agent = AgentCreator.create_agent(agent_key, **agent_kwargs)
agent.conversation_id = conversation_id
agent.initial_user_id = self.initial_user_id
agent.tools = tool_schemas
# Store config for the route layer
self.model_id = model_id
self.agent_id = agent_id
self.agent_config["user_api_key"] = user_api_key
self.conversation_id = conversation_id
# Delete state so it can't be replayed
cont_service.delete_state(conversation_id, self.initial_user_id)
return agent, messages, tools_dict, pending_tool_calls, tool_actions
def create_agent(
self,
docs_together: Optional[str] = None,
@@ -795,15 +1004,23 @@ class StreamProcessor:
raw_prompt = get_prompt(prompt_id, self.prompts_collection)
self._prompt_content = raw_prompt
rendered_prompt = self.prompt_renderer.render_prompt(
prompt_content=raw_prompt,
user_id=self.initial_user_id,
request_id=self.data.get("request_id"),
passthrough_data=self.data.get("passthrough"),
docs=docs,
docs_together=docs_together,
tools_data=tools_data,
)
# Allow API callers to override the system prompt when the agent
# has opted in via allow_system_prompt_override.
if (
self.agent_config.get("allow_system_prompt_override", False)
and self.data.get("system_prompt_override")
):
rendered_prompt = self.data["system_prompt_override"]
else:
rendered_prompt = self.prompt_renderer.render_prompt(
prompt_content=raw_prompt,
user_id=self.initial_user_id,
request_id=self.data.get("request_id"),
passthrough_data=self.data.get("passthrough"),
docs=docs,
docs_together=docs_together,
tools_data=tools_data,
)
provider = (
get_provider_from_model_id(self.model_id)
@@ -817,8 +1034,10 @@ class StreamProcessor:
from application.llm.handlers.handler_creator import LLMHandlerCreator
from application.agents.tool_executor import ToolExecutor
# Compute backup models: agent's configured models minus the active one
agent_models = self.agent_config.get("models", [])
# Compute backup models: agent's configured models minus the active one.
# PG agents may carry an explicit ``models: NULL`` (not absent), so
# ``.get("models", [])`` isn't enough — coerce None → [].
agent_models = self.agent_config.get("models") or []
backup_models = [m for m in agent_models if m != self.model_id]
llm = LLMCreator.create_llm(
@@ -841,6 +1060,10 @@ class StreamProcessor:
decoded_token=self.decoded_token,
)
tool_executor.conversation_id = self.conversation_id
# Pass client-side tools so they get merged in get_tools()
client_tools = self.data.get("client_tools")
if client_tools:
tool_executor.client_tools = client_tools
# Base agent kwargs
agent_kwargs = {

View File

@@ -1,12 +1,10 @@
import base64
import datetime
import html
import json
import uuid
from urllib.parse import urlencode
from bson.objectid import ObjectId
from flask import (
Blueprint,
current_app,
@@ -17,22 +15,18 @@ from flask import (
from flask_restx import fields, Namespace, Resource
from application.api import api
from application.api.user.tasks import (
ingest_connector_task,
)
from application.core.mongo_db import MongoDB
from application.core.settings import settings
from application.api import api
from application.parser.connectors.connector_creator import ConnectorCreator
from application.storage.db.repositories.connector_sessions import (
ConnectorSessionsRepository,
)
from application.storage.db.repositories.sources import SourcesRepository
from application.storage.db.session import db_readonly, db_session
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
sources_collection = db["sources"]
sessions_collection = db["connector_sessions"]
connector = Blueprint("connector", __name__)
connectors_ns = Namespace("connectors", description="Connector operations", path="/")
api.add_namespace(connectors_ns)
@@ -68,16 +62,14 @@ class ConnectorAuth(Resource):
return make_response(jsonify({"success": False, "error": "Unauthorized"}), 401)
user_id = decoded_token.get('sub')
now = datetime.datetime.now(datetime.timezone.utc)
result = sessions_collection.insert_one({
"provider": provider,
"user": user_id,
"status": "pending",
"created_at": now
})
with db_session() as conn:
session_row = ConnectorSessionsRepository(conn).upsert(
user_id, provider, status="pending",
)
session_pg_id = str(session_row["id"])
state_dict = {
"provider": provider,
"object_id": str(result.inserted_id)
"object_id": session_pg_id,
}
state = base64.urlsafe_b64encode(json.dumps(state_dict).encode()).decode()
@@ -160,17 +152,25 @@ class ConnectorsCallback(Resource):
sanitized_token_info = auth.sanitize_token_info(token_info)
sessions_collection.find_one_and_update(
{"_id": ObjectId(state_object_id), "provider": provider},
{
"$set": {
"session_token": session_token,
"token_info": sanitized_token_info,
"user_email": user_email,
"status": "authorized"
}
}
)
# ``object_id`` in the OAuth state is the PG session row
# UUID (new flow) or a legacy Mongo ObjectId (pre-cutover
# issued state). Try UUID update first; fall back to
# legacy id path.
patch = {
"session_token": session_token,
"token_info": sanitized_token_info,
"user_email": user_email,
"status": "authorized",
}
with db_session() as conn:
repo = ConnectorSessionsRepository(conn)
if state_object_id:
value = str(state_object_id)
updated = False
if len(value) == 36 and "-" in value:
updated = repo.update(value, patch)
if not updated:
repo.update_by_legacy_id(value, patch)
# Redirect to success page with session token and user email
return redirect(build_callback_redirect({
@@ -222,8 +222,11 @@ class ConnectorFiles(Resource):
if not decoded_token:
return make_response(jsonify({"success": False, "error": "Unauthorized"}), 401)
user = decoded_token.get('sub')
session = sessions_collection.find_one({"session_token": session_token, "user": user})
if not session:
with db_readonly() as conn:
session = ConnectorSessionsRepository(conn).get_by_session_token(
session_token,
)
if not session or session.get("user_id") != user:
return make_response(jsonify({"success": False, "error": "Invalid or unauthorized session"}), 401)
loader = ConnectorCreator.create_connector(provider, session_token)
@@ -288,8 +291,11 @@ class ConnectorValidateSession(Resource):
return make_response(jsonify({"success": False, "error": "Unauthorized"}), 401)
user = decoded_token.get('sub')
session = sessions_collection.find_one({"session_token": session_token, "user": user})
if not session or "token_info" not in session:
with db_readonly() as conn:
session = ConnectorSessionsRepository(conn).get_by_session_token(
session_token,
)
if not session or session.get("user_id") != user or not session.get("token_info"):
return make_response(jsonify({"success": False, "error": "Invalid or expired session"}), 401)
token_info = session["token_info"]
@@ -300,10 +306,11 @@ class ConnectorValidateSession(Resource):
try:
refreshed_token_info = auth.refresh_access_token(token_info.get('refresh_token'))
sanitized_token_info = auth.sanitize_token_info(refreshed_token_info)
sessions_collection.update_one(
{"session_token": session_token},
{"$set": {"token_info": sanitized_token_info}}
)
with db_session() as conn:
repo = ConnectorSessionsRepository(conn)
row = repo.get_by_session_token(session_token)
if row:
repo.update(str(row["id"]), {"token_info": sanitized_token_info})
token_info = sanitized_token_info
is_expired = False
except Exception as refresh_error:
@@ -347,8 +354,11 @@ class ConnectorDisconnect(Resource):
if session_token:
sessions_collection.delete_one({"session_token": session_token})
with db_session() as conn:
ConnectorSessionsRepository(conn).delete_by_session_token(
session_token,
)
return make_response(jsonify({"success": True}), 200)
except Exception as e:
current_app.logger.error(f"Error disconnecting connector session: {e}", exc_info=True)
@@ -385,32 +395,28 @@ class ConnectorSync(Resource):
}),
400
)
source = sources_collection.find_one({"_id": ObjectId(source_id)})
user_id = decoded_token.get('sub')
with db_readonly() as conn:
source = SourcesRepository(conn).get_any(source_id, user_id)
if not source:
return make_response(
jsonify({
"success": False,
"error": "Source not found"
}),
}),
404
)
if source.get('user') != decoded_token.get('sub'):
return make_response(
jsonify({
"success": False,
"error": "Unauthorized access to source"
}),
403
)
# ``get_any`` already scopes by ``user_id``; an extra guard
# here would be dead code.
remote_data = {}
try:
if source.get('remote_data'):
remote_data = json.loads(source.get('remote_data'))
except json.JSONDecodeError:
current_app.logger.error(f"Invalid remote_data format for source {source_id}")
remote_data = {}
remote_data = source.get('remote_data') or {}
if isinstance(remote_data, str):
try:
remote_data = json.loads(remote_data)
except json.JSONDecodeError:
current_app.logger.error(f"Invalid remote_data format for source {source_id}")
remote_data = {}
source_type = remote_data.get('provider')
if not source_type:
@@ -438,7 +444,7 @@ class ConnectorSync(Resource):
recursive=recursive,
retriever=source.get('retriever', 'classic'),
operation_mode="sync",
doc_id=source_id,
doc_id=str(source.get('id') or source_id),
sync_frequency=source.get('sync_frequency', 'never')
)

View File

@@ -3,18 +3,16 @@ import datetime
import json
from flask import Blueprint, request, send_from_directory, jsonify
from werkzeug.utils import secure_filename
from bson.objectid import ObjectId
import logging
from application.core.mongo_db import MongoDB
from application.core.settings import settings
from application.storage.db.base_repository import looks_like_uuid
from application.storage.db.repositories.sources import SourcesRepository
from application.storage.db.session import db_session
from application.storage.storage_creator import StorageCreator
logger = logging.getLogger(__name__)
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
conversations_collection = db["conversations"]
sources_collection = db["sources"]
current_dir = os.path.dirname(
os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
@@ -26,12 +24,20 @@ internal = Blueprint("internal", __name__)
@internal.before_request
def verify_internal_key():
"""Verify INTERNAL_KEY for all internal endpoint requests."""
if settings.INTERNAL_KEY:
internal_key = request.headers.get("X-Internal-Key")
if not internal_key or internal_key != settings.INTERNAL_KEY:
logger.warning(f"Unauthorized internal API access attempt from {request.remote_addr}")
return jsonify({"error": "Unauthorized", "message": "Invalid or missing internal key"}), 401
"""Verify INTERNAL_KEY for all internal endpoint requests.
Deny by default: if INTERNAL_KEY is not configured, reject all requests.
"""
if not settings.INTERNAL_KEY:
logger.warning(
f"Internal API request rejected from {request.remote_addr}: "
"INTERNAL_KEY is not configured"
)
return jsonify({"error": "Unauthorized", "message": "Internal API is not configured"}), 401
internal_key = request.headers.get("X-Internal-Key")
if not internal_key or internal_key != settings.INTERNAL_KEY:
logger.warning(f"Unauthorized internal API access attempt from {request.remote_addr}")
return jsonify({"error": "Unauthorized", "message": "Invalid or missing internal key"}), 401
@internal.route("/api/download", methods=["get"])
@@ -48,21 +54,21 @@ def upload_index_files():
"""Upload two files(index.faiss, index.pkl) to the user's folder."""
if "user" not in request.form:
return {"status": "no user"}
user = request.form["user"]
user = request.form["user"]
if "name" not in request.form:
return {"status": "no name"}
job_name = request.form["name"]
tokens = request.form["tokens"]
retriever = request.form["retriever"]
id = request.form["id"]
source_id = request.form["id"]
type = request.form["type"]
remote_data = request.form["remote_data"] if "remote_data" in request.form else None
sync_frequency = request.form["sync_frequency"] if "sync_frequency" in request.form else None
file_path = request.form.get("file_path")
directory_structure = request.form.get("directory_structure")
file_name_map = request.form.get("file_name_map")
if directory_structure:
try:
directory_structure = json.loads(directory_structure)
@@ -81,8 +87,8 @@ def upload_index_files():
file_name_map = None
storage = StorageCreator.get_storage()
index_base_path = f"indexes/{id}"
index_base_path = f"indexes/{source_id}"
if settings.VECTOR_STORE == "faiss":
if "file_faiss" not in request.files:
logger.error("No file_faiss part")
@@ -103,46 +109,48 @@ def upload_index_files():
storage.save_file(file_faiss, faiss_storage_path)
storage.save_file(file_pkl, pkl_storage_path)
now = datetime.datetime.now(datetime.timezone.utc)
update_fields = {
"name": job_name,
"type": type,
"language": job_name,
"date": now,
"model": settings.EMBEDDINGS_NAME,
"tokens": tokens,
"retriever": retriever,
"remote_data": remote_data,
"sync_frequency": sync_frequency,
"file_path": file_path,
"directory_structure": directory_structure,
}
if file_name_map is not None:
update_fields["file_name_map"] = file_name_map
existing_entry = sources_collection.find_one({"_id": ObjectId(id)})
if existing_entry:
update_fields = {
"user": user,
"name": job_name,
"language": job_name,
"date": datetime.datetime.now(),
"model": settings.EMBEDDINGS_NAME,
"type": type,
"tokens": tokens,
"retriever": retriever,
"remote_data": remote_data,
"sync_frequency": sync_frequency,
"file_path": file_path,
"directory_structure": directory_structure,
}
if file_name_map is not None:
update_fields["file_name_map"] = file_name_map
sources_collection.update_one(
{"_id": ObjectId(id)},
{"$set": update_fields},
)
else:
insert_doc = {
"_id": ObjectId(id),
"user": user,
"name": job_name,
"language": job_name,
"date": datetime.datetime.now(),
"model": settings.EMBEDDINGS_NAME,
"type": type,
"tokens": tokens,
"retriever": retriever,
"remote_data": remote_data,
"sync_frequency": sync_frequency,
"file_path": file_path,
"directory_structure": directory_structure,
}
if file_name_map is not None:
insert_doc["file_name_map"] = file_name_map
sources_collection.insert_one(insert_doc)
with db_session() as conn:
repo = SourcesRepository(conn)
existing = None
if looks_like_uuid(source_id):
existing = repo.get(source_id, user)
if existing is None:
existing = repo.get_by_legacy_id(source_id, user)
if existing is not None:
repo.update(str(existing["id"]), user, update_fields)
else:
repo.create(
job_name,
source_id=source_id if looks_like_uuid(source_id) else None,
user_id=user,
type=type,
tokens=tokens,
retriever=retriever,
remote_data=remote_data,
sync_frequency=sync_frequency,
file_path=file_path,
directory_structure=directory_structure,
file_name_map=file_name_map,
language=job_name,
model=settings.EMBEDDINGS_NAME,
date=now,
legacy_mongo_id=None if looks_like_uuid(source_id) else str(source_id),
)
return {"status": "ok"}

View File

@@ -3,27 +3,50 @@ Agent folders management routes.
Provides virtual folder organization for agents (Google Drive-like structure).
"""
import datetime
from bson.objectid import ObjectId
from flask import current_app, jsonify, make_response, request
from flask_restx import Namespace, Resource, fields
from sqlalchemy import text as _sql_text
from application.api import api
from application.api.user.base import (
agent_folders_collection,
agents_collection,
)
from application.storage.db.base_repository import looks_like_uuid
from application.storage.db.repositories.agent_folders import AgentFoldersRepository
from application.storage.db.repositories.agents import AgentsRepository
from application.storage.db.session import db_readonly, db_session
agents_folders_ns = Namespace(
"agents_folders", description="Agent folder management", path="/api/agents/folders"
)
def _resolve_folder_id(repo: AgentFoldersRepository, folder_id: str, user: str):
"""Resolve a folder id that may be either a UUID or legacy Mongo ObjectId."""
if not folder_id:
return None
if looks_like_uuid(folder_id):
row = repo.get(folder_id, user)
if row is not None:
return row
return repo.get_by_legacy_id(folder_id, user)
def _folder_error_response(message: str, err: Exception):
current_app.logger.error(f"{message}: {err}", exc_info=True)
return make_response(jsonify({"success": False, "message": message}), 400)
def _serialize_folder(f: dict) -> dict:
created_at = f.get("created_at")
updated_at = f.get("updated_at")
return {
"id": str(f["id"]),
"name": f.get("name"),
"parent_id": str(f["parent_id"]) if f.get("parent_id") else None,
"created_at": created_at.isoformat() if hasattr(created_at, "isoformat") else created_at,
"updated_at": updated_at.isoformat() if hasattr(updated_at, "isoformat") else updated_at,
}
@agents_folders_ns.route("/")
class AgentFolders(Resource):
@api.doc(description="Get all folders for the user")
@@ -33,17 +56,9 @@ class AgentFolders(Resource):
return make_response(jsonify({"success": False}), 401)
user = decoded_token.get("sub")
try:
folders = list(agent_folders_collection.find({"user": user}))
result = [
{
"id": str(f["_id"]),
"name": f["name"],
"parent_id": f.get("parent_id"),
"created_at": f.get("created_at", "").isoformat() if f.get("created_at") else None,
"updated_at": f.get("updated_at", "").isoformat() if f.get("updated_at") else None,
}
for f in folders
]
with db_readonly() as conn:
folders = AgentFoldersRepository(conn).list_for_user(user)
result = [_serialize_folder(f) for f in folders]
return make_response(jsonify({"folders": result}), 200)
except Exception as err:
return _folder_error_response("Failed to fetch folders", err)
@@ -67,24 +82,34 @@ class AgentFolders(Resource):
if not data or not data.get("name"):
return make_response(jsonify({"success": False, "message": "Folder name is required"}), 400)
parent_id = data.get("parent_id")
if parent_id:
parent = agent_folders_collection.find_one({"_id": ObjectId(parent_id), "user": user})
if not parent:
return make_response(jsonify({"success": False, "message": "Parent folder not found"}), 404)
parent_id_input = data.get("parent_id")
description = data.get("description")
try:
now = datetime.datetime.now(datetime.timezone.utc)
folder = {
"user": user,
"name": data["name"],
"parent_id": parent_id,
"created_at": now,
"updated_at": now,
}
result = agent_folders_collection.insert_one(folder)
with db_session() as conn:
repo = AgentFoldersRepository(conn)
pg_parent_id = None
if parent_id_input:
parent = _resolve_folder_id(repo, parent_id_input, user)
if not parent:
return make_response(
jsonify({"success": False, "message": "Parent folder not found"}),
404,
)
pg_parent_id = str(parent["id"])
folder = repo.create(
user, data["name"],
description=description,
parent_id=pg_parent_id,
)
return make_response(
jsonify({"id": str(result.inserted_id), "name": data["name"], "parent_id": parent_id}),
jsonify(
{
"id": str(folder["id"]),
"name": folder["name"],
"parent_id": pg_parent_id,
}
),
201,
)
except Exception as err:
@@ -100,26 +125,51 @@ class AgentFolder(Resource):
return make_response(jsonify({"success": False}), 401)
user = decoded_token.get("sub")
try:
folder = agent_folders_collection.find_one({"_id": ObjectId(folder_id), "user": user})
if not folder:
return make_response(jsonify({"success": False, "message": "Folder not found"}), 404)
agents = list(agents_collection.find({"user": user, "folder_id": folder_id}))
agents_list = [
{"id": str(a["_id"]), "name": a["name"], "description": a.get("description", "")}
for a in agents
]
subfolders = list(agent_folders_collection.find({"user": user, "parent_id": folder_id}))
subfolders_list = [{"id": str(sf["_id"]), "name": sf["name"]} for sf in subfolders]
with db_readonly() as conn:
folders_repo = AgentFoldersRepository(conn)
folder = _resolve_folder_id(folders_repo, folder_id, user)
if not folder:
return make_response(
jsonify({"success": False, "message": "Folder not found"}),
404,
)
pg_folder_id = str(folder["id"])
agents_rows = conn.execute(
_sql_text(
"SELECT id, name, description FROM agents "
"WHERE user_id = :user_id AND folder_id = CAST(:fid AS uuid) "
"ORDER BY created_at DESC"
),
{"user_id": user, "fid": pg_folder_id},
).fetchall()
agents_list = [
{
"id": str(row._mapping["id"]),
"name": row._mapping["name"],
"description": row._mapping.get("description", "") or "",
}
for row in agents_rows
]
subfolders = folders_repo.list_children(pg_folder_id, user)
subfolders_list = [
{"id": str(sf["id"]), "name": sf["name"]}
for sf in subfolders
]
return make_response(
jsonify({
"id": str(folder["_id"]),
"name": folder["name"],
"parent_id": folder.get("parent_id"),
"agents": agents_list,
"subfolders": subfolders_list,
}),
jsonify(
{
"id": pg_folder_id,
"name": folder["name"],
"parent_id": (
str(folder["parent_id"]) if folder.get("parent_id") else None
),
"agents": agents_list,
"subfolders": subfolders_list,
}
),
200,
)
except Exception as err:
@@ -136,19 +186,57 @@ class AgentFolder(Resource):
return make_response(jsonify({"success": False, "message": "No data provided"}), 400)
try:
update_fields = {"updated_at": datetime.datetime.now(datetime.timezone.utc)}
if "name" in data:
update_fields["name"] = data["name"]
if "parent_id" in data:
if data["parent_id"] == folder_id:
return make_response(jsonify({"success": False, "message": "Cannot set folder as its own parent"}), 400)
update_fields["parent_id"] = data["parent_id"]
with db_session() as conn:
repo = AgentFoldersRepository(conn)
folder = _resolve_folder_id(repo, folder_id, user)
if not folder:
return make_response(
jsonify({"success": False, "message": "Folder not found"}),
404,
)
pg_folder_id = str(folder["id"])
update_fields: dict = {}
if "name" in data:
update_fields["name"] = data["name"]
if "description" in data:
update_fields["description"] = data["description"]
if "parent_id" in data:
parent_input = data.get("parent_id")
if parent_input:
if parent_input == folder_id or parent_input == pg_folder_id:
return make_response(
jsonify(
{
"success": False,
"message": "Cannot set folder as its own parent",
}
),
400,
)
parent = _resolve_folder_id(repo, parent_input, user)
if not parent:
return make_response(
jsonify({"success": False, "message": "Parent folder not found"}),
404,
)
if str(parent["id"]) == pg_folder_id:
return make_response(
jsonify(
{
"success": False,
"message": "Cannot set folder as its own parent",
}
),
400,
)
update_fields["parent_id"] = str(parent["id"])
else:
update_fields["parent_id"] = None
if update_fields:
repo.update(pg_folder_id, user, update_fields)
result = agent_folders_collection.update_one(
{"_id": ObjectId(folder_id), "user": user}, {"$set": update_fields}
)
if result.matched_count == 0:
return make_response(jsonify({"success": False, "message": "Folder not found"}), 404)
return make_response(jsonify({"success": True}), 200)
except Exception as err:
return _folder_error_response("Failed to update folder", err)
@@ -160,15 +248,24 @@ class AgentFolder(Resource):
return make_response(jsonify({"success": False}), 401)
user = decoded_token.get("sub")
try:
agents_collection.update_many(
{"user": user, "folder_id": folder_id}, {"$unset": {"folder_id": ""}}
)
agent_folders_collection.update_many(
{"user": user, "parent_id": folder_id}, {"$unset": {"parent_id": ""}}
)
result = agent_folders_collection.delete_one({"_id": ObjectId(folder_id), "user": user})
if result.deleted_count == 0:
return make_response(jsonify({"success": False, "message": "Folder not found"}), 404)
with db_session() as conn:
repo = AgentFoldersRepository(conn)
folder = _resolve_folder_id(repo, folder_id, user)
if not folder:
return make_response(
jsonify({"success": False, "message": "Folder not found"}),
404,
)
pg_folder_id = str(folder["id"])
# Clear folder assignments from agents; self-FK
# ``ON DELETE SET NULL`` handles child folders.
AgentsRepository(conn).clear_folder_for_all(pg_folder_id, user)
deleted = repo.delete(pg_folder_id, user)
if not deleted:
return make_response(
jsonify({"success": False, "message": "Folder not found"}),
404,
)
return make_response(jsonify({"success": True}), 200)
except Exception as err:
return _folder_error_response("Failed to delete folder", err)
@@ -195,26 +292,29 @@ class MoveAgentToFolder(Resource):
if not data or not data.get("agent_id"):
return make_response(jsonify({"success": False, "message": "Agent ID is required"}), 400)
agent_id = data["agent_id"]
folder_id = data.get("folder_id")
agent_id_input = data["agent_id"]
folder_id_input = data.get("folder_id")
try:
agent = agents_collection.find_one({"_id": ObjectId(agent_id), "user": user})
if not agent:
return make_response(jsonify({"success": False, "message": "Agent not found"}), 404)
if folder_id:
folder = agent_folders_collection.find_one({"_id": ObjectId(folder_id), "user": user})
if not folder:
return make_response(jsonify({"success": False, "message": "Folder not found"}), 404)
agents_collection.update_one(
{"_id": ObjectId(agent_id)}, {"$set": {"folder_id": folder_id}}
)
else:
agents_collection.update_one(
{"_id": ObjectId(agent_id)}, {"$unset": {"folder_id": ""}}
)
with db_session() as conn:
agents_repo = AgentsRepository(conn)
agent = agents_repo.get_any(agent_id_input, user)
if not agent:
return make_response(
jsonify({"success": False, "message": "Agent not found"}),
404,
)
pg_folder_id = None
if folder_id_input:
folders_repo = AgentFoldersRepository(conn)
folder = _resolve_folder_id(folders_repo, folder_id_input, user)
if not folder:
return make_response(
jsonify({"success": False, "message": "Folder not found"}),
404,
)
pg_folder_id = str(folder["id"])
agents_repo.set_folder(str(agent["id"]), user, pg_folder_id)
return make_response(jsonify({"success": True}), 200)
except Exception as err:
return _folder_error_response("Failed to move agent", err)
@@ -242,25 +342,25 @@ class BulkMoveAgents(Resource):
return make_response(jsonify({"success": False, "message": "Agent IDs are required"}), 400)
agent_ids = data["agent_ids"]
folder_id = data.get("folder_id")
folder_id_input = data.get("folder_id")
try:
if folder_id:
folder = agent_folders_collection.find_one({"_id": ObjectId(folder_id), "user": user})
if not folder:
return make_response(jsonify({"success": False, "message": "Folder not found"}), 404)
object_ids = [ObjectId(aid) for aid in agent_ids]
if folder_id:
agents_collection.update_many(
{"_id": {"$in": object_ids}, "user": user},
{"$set": {"folder_id": folder_id}},
)
else:
agents_collection.update_many(
{"_id": {"$in": object_ids}, "user": user},
{"$unset": {"folder_id": ""}},
)
with db_session() as conn:
agents_repo = AgentsRepository(conn)
pg_folder_id = None
if folder_id_input:
folders_repo = AgentFoldersRepository(conn)
folder = _resolve_folder_id(folders_repo, folder_id_input, user)
if not folder:
return make_response(
jsonify({"success": False, "message": "Folder not found"}),
404,
)
pg_folder_id = str(folder["id"])
for agent_id_input in agent_ids:
agent = agents_repo.get_any(agent_id_input, user)
if agent is not None:
agents_repo.set_folder(str(agent["id"]), user, pg_folder_id)
return make_response(jsonify({"success": True}), 200)
except Exception as err:
return _folder_error_response("Failed to move agents", err)

File diff suppressed because it is too large Load Diff

View File

@@ -3,21 +3,17 @@
import datetime
import secrets
from bson import DBRef
from bson.objectid import ObjectId
from flask import current_app, jsonify, make_response, request
from flask_restx import fields, Namespace, Resource
from sqlalchemy import text as _sql_text
from application.api import api
from application.core.settings import settings
from application.api.user.base import (
agents_collection,
db,
ensure_user_doc,
resolve_tool_details,
user_tools_collection,
users_collection,
)
from application.api.user.base import resolve_tool_details
from application.storage.db.base_repository import looks_like_uuid
from application.storage.db.repositories.agents import AgentsRepository
from application.storage.db.repositories.users import UsersRepository
from application.storage.db.session import db_readonly, db_session
from application.utils import generate_image_url
agents_sharing_ns = Namespace(
@@ -25,6 +21,38 @@ agents_sharing_ns = Namespace(
)
def _serialize_agent_basic(agent: dict) -> dict:
"""Shape a PG agent row into the API response dict."""
source_id = agent.get("source_id")
return {
"id": str(agent["id"]),
"user": agent.get("user_id", ""),
"name": agent.get("name", ""),
"image": (
generate_image_url(agent["image"]) if agent.get("image") else ""
),
"description": agent.get("description", ""),
"source": str(source_id) if source_id else "",
"chunks": str(agent["chunks"]) if agent.get("chunks") is not None else "0",
"retriever": agent.get("retriever", "classic") or "classic",
"prompt_id": str(agent["prompt_id"]) if agent.get("prompt_id") else "default",
"tools": agent.get("tools", []) or [],
"tool_details": resolve_tool_details(agent.get("tools", []) or []),
"agent_type": agent.get("agent_type", "") or "",
"status": agent.get("status", "") or "",
"json_schema": agent.get("json_schema"),
"limited_token_mode": agent.get("limited_token_mode", False),
"token_limit": agent.get("token_limit") or settings.DEFAULT_AGENT_LIMITS["token_limit"],
"limited_request_mode": agent.get("limited_request_mode", False),
"request_limit": agent.get("request_limit") or settings.DEFAULT_AGENT_LIMITS["request_limit"],
"created_at": agent.get("created_at", ""),
"updated_at": agent.get("updated_at", ""),
"shared": bool(agent.get("shared", False)),
"shared_token": agent.get("shared_token", "") or "",
"shared_metadata": agent.get("shared_metadata", {}) or {},
}
@agents_sharing_ns.route("/shared_agent")
class SharedAgent(Resource):
@api.doc(
@@ -41,70 +69,33 @@ class SharedAgent(Resource):
jsonify({"success": False, "message": "Token or ID is required"}), 400
)
try:
query = {
"shared_publicly": True,
"shared_token": shared_token,
}
shared_agent = agents_collection.find_one(query)
with db_readonly() as conn:
shared_agent = AgentsRepository(conn).find_by_shared_token(
shared_token,
)
if not shared_agent:
return make_response(
jsonify({"success": False, "message": "Shared agent not found"}),
404,
)
agent_id = str(shared_agent["_id"])
data = {
"id": agent_id,
"user": shared_agent.get("user", ""),
"name": shared_agent.get("name", ""),
"image": (
generate_image_url(shared_agent["image"])
if shared_agent.get("image")
else ""
),
"description": shared_agent.get("description", ""),
"source": (
str(source_doc["_id"])
if isinstance(shared_agent.get("source"), DBRef)
and (source_doc := db.dereference(shared_agent.get("source")))
else ""
),
"chunks": shared_agent.get("chunks", "0"),
"retriever": shared_agent.get("retriever", "classic"),
"prompt_id": shared_agent.get("prompt_id", "default"),
"tools": shared_agent.get("tools", []),
"tool_details": resolve_tool_details(shared_agent.get("tools", [])),
"agent_type": shared_agent.get("agent_type", ""),
"status": shared_agent.get("status", ""),
"json_schema": shared_agent.get("json_schema"),
"limited_token_mode": shared_agent.get("limited_token_mode", False),
"token_limit": shared_agent.get("token_limit", settings.DEFAULT_AGENT_LIMITS["token_limit"]),
"limited_request_mode": shared_agent.get("limited_request_mode", False),
"request_limit": shared_agent.get("request_limit", settings.DEFAULT_AGENT_LIMITS["request_limit"]),
"created_at": shared_agent.get("createdAt", ""),
"updated_at": shared_agent.get("updatedAt", ""),
"shared": shared_agent.get("shared_publicly", False),
"shared_token": shared_agent.get("shared_token", ""),
"shared_metadata": shared_agent.get("shared_metadata", {}),
}
agent_id = str(shared_agent["id"])
data = _serialize_agent_basic(shared_agent)
if data["tools"]:
enriched_tools = []
for tool in data["tools"]:
tool_data = user_tools_collection.find_one({"_id": ObjectId(tool)})
if tool_data:
enriched_tools.append(tool_data.get("name", ""))
for detail in data["tool_details"]:
enriched_tools.append(detail.get("name", ""))
data["tools"] = enriched_tools
decoded_token = getattr(request, "decoded_token", None)
if decoded_token:
user_id = decoded_token.get("sub")
owner_id = shared_agent.get("user")
owner_id = shared_agent.get("user_id")
if user_id != owner_id:
ensure_user_doc(user_id)
users_collection.update_one(
{"user_id": user_id},
{"$addToSet": {"agent_preferences.shared_with_me": agent_id}},
)
with db_session() as conn:
users_repo = UsersRepository(conn)
users_repo.upsert(user_id)
users_repo.add_shared(user_id, agent_id)
return make_response(jsonify(data), 200)
except Exception as err:
current_app.logger.error(f"Error retrieving shared agent: {err}")
@@ -121,52 +112,73 @@ class SharedAgents(Resource):
return make_response(jsonify({"success": False}), 401)
user_id = decoded_token.get("sub")
user_doc = ensure_user_doc(user_id)
shared_with_ids = user_doc.get("agent_preferences", {}).get(
"shared_with_me", []
)
shared_object_ids = [ObjectId(id) for id in shared_with_ids]
shared_agents_cursor = agents_collection.find(
{"_id": {"$in": shared_object_ids}, "shared_publicly": True}
)
shared_agents = list(shared_agents_cursor)
found_ids_set = {str(agent["_id"]) for agent in shared_agents}
stale_ids = [id for id in shared_with_ids if id not in found_ids_set]
if stale_ids:
users_collection.update_one(
{"user_id": user_id},
{"$pullAll": {"agent_preferences.shared_with_me": stale_ids}},
with db_session() as conn:
users_repo = UsersRepository(conn)
user_doc = users_repo.upsert(user_id)
shared_with_ids = (
user_doc.get("agent_preferences", {}).get("shared_with_me", [])
if isinstance(user_doc.get("agent_preferences"), dict)
else []
)
pinned_ids = set(user_doc.get("agent_preferences", {}).get("pinned", []))
# Keep only UUID-shaped ids; ObjectId leftovers are stripped below.
uuid_ids = [sid for sid in shared_with_ids if looks_like_uuid(sid)]
non_uuid_ids = [sid for sid in shared_with_ids if not looks_like_uuid(sid)]
list_shared_agents = [
{
"id": str(agent["_id"]),
"name": agent.get("name", ""),
"description": agent.get("description", ""),
"image": (
generate_image_url(agent["image"]) if agent.get("image") else ""
),
"tools": agent.get("tools", []),
"tool_details": resolve_tool_details(agent.get("tools", [])),
"agent_type": agent.get("agent_type", ""),
"status": agent.get("status", ""),
"json_schema": agent.get("json_schema"),
"limited_token_mode": agent.get("limited_token_mode", False),
"token_limit": agent.get("token_limit", settings.DEFAULT_AGENT_LIMITS["token_limit"]),
"limited_request_mode": agent.get("limited_request_mode", False),
"request_limit": agent.get("request_limit", settings.DEFAULT_AGENT_LIMITS["request_limit"]),
"created_at": agent.get("createdAt", ""),
"updated_at": agent.get("updatedAt", ""),
"pinned": str(agent["_id"]) in pinned_ids,
"shared": agent.get("shared_publicly", False),
"shared_token": agent.get("shared_token", ""),
"shared_metadata": agent.get("shared_metadata", {}),
}
for agent in shared_agents
]
if uuid_ids:
result = conn.execute(
_sql_text(
"SELECT * FROM agents "
"WHERE id = ANY(CAST(:ids AS uuid[])) "
"AND shared = true"
),
{"ids": uuid_ids},
)
shared_agents = [dict(row._mapping) for row in result.fetchall()]
else:
shared_agents = []
found_ids_set = {str(agent["id"]) for agent in shared_agents}
stale_ids = [sid for sid in uuid_ids if sid not in found_ids_set]
stale_ids.extend(non_uuid_ids)
if stale_ids:
users_repo.remove_shared_bulk(user_id, stale_ids)
pinned_ids = set(
user_doc.get("agent_preferences", {}).get("pinned", [])
if isinstance(user_doc.get("agent_preferences"), dict)
else []
)
list_shared_agents = []
for agent in shared_agents:
agent_id_str = str(agent["id"])
list_shared_agents.append(
{
"id": agent_id_str,
"name": agent.get("name", ""),
"description": agent.get("description", ""),
"image": (
generate_image_url(agent["image"]) if agent.get("image") else ""
),
"tools": agent.get("tools", []) or [],
"tool_details": resolve_tool_details(
agent.get("tools", []) or []
),
"agent_type": agent.get("agent_type", "") or "",
"status": agent.get("status", "") or "",
"json_schema": agent.get("json_schema"),
"limited_token_mode": agent.get("limited_token_mode", False),
"token_limit": agent.get("token_limit") or settings.DEFAULT_AGENT_LIMITS["token_limit"],
"limited_request_mode": agent.get("limited_request_mode", False),
"request_limit": agent.get("request_limit") or settings.DEFAULT_AGENT_LIMITS["request_limit"],
"created_at": agent.get("created_at", ""),
"updated_at": agent.get("updated_at", ""),
"pinned": agent_id_str in pinned_ids,
"shared": bool(agent.get("shared", False)),
"shared_token": agent.get("shared_token", "") or "",
"shared_metadata": agent.get("shared_metadata", {}) or {},
}
)
return make_response(jsonify(list_shared_agents), 200)
except Exception as err:
@@ -220,44 +232,43 @@ class ShareAgent(Resource):
),
400,
)
shared_token = None
try:
try:
agent_oid = ObjectId(agent_id)
except Exception:
return make_response(
jsonify({"success": False, "message": "Invalid agent ID"}), 400
)
agent = agents_collection.find_one({"_id": agent_oid, "user": user})
if not agent:
return make_response(
jsonify({"success": False, "message": "Agent not found"}), 404
)
if shared:
shared_metadata = {
"shared_by": username,
"shared_at": datetime.datetime.now(datetime.timezone.utc),
}
shared_token = secrets.token_urlsafe(32)
agents_collection.update_one(
{"_id": agent_oid, "user": user},
{
"$set": {
"shared_publicly": shared,
"shared_metadata": shared_metadata,
with db_session() as conn:
repo = AgentsRepository(conn)
agent = repo.get_any(agent_id, user)
if not agent:
return make_response(
jsonify({"success": False, "message": "Agent not found"}), 404
)
if shared:
shared_metadata = {
"shared_by": username,
"shared_at": datetime.datetime.now(
datetime.timezone.utc
).isoformat(),
}
shared_token = secrets.token_urlsafe(32)
repo.update(
str(agent["id"]), user,
{
"shared": True,
"shared_token": shared_token,
}
},
)
else:
agents_collection.update_one(
{"_id": agent_oid, "user": user},
{"$set": {"shared_publicly": shared, "shared_token": None}},
{"$unset": {"shared_metadata": ""}},
)
"shared_metadata": shared_metadata,
},
)
else:
repo.update(
str(agent["id"]), user,
{
"shared": False,
"shared_token": None,
"shared_metadata": None,
},
)
except Exception as err:
current_app.logger.error(f"Error sharing/unsharing agent: {err}", exc_info=True)
return make_response(jsonify({"success": False, "error": "Failed to update agent sharing status"}), 400)
shared_token = shared_token if shared else None
return make_response(
jsonify({"success": True, "shared_token": shared_token}), 200
)

View File

@@ -2,14 +2,15 @@
import secrets
from bson.objectid import ObjectId
from flask import current_app, jsonify, make_response, request
from flask_restx import Namespace, Resource
from application.api import api
from application.api.user.base import agents_collection, require_agent
from application.api.user.base import require_agent
from application.api.user.tasks import process_agent_webhook
from application.core.settings import settings
from application.storage.db.repositories.agents import AgentsRepository
from application.storage.db.session import db_readonly, db_session
agents_webhooks_ns = Namespace(
@@ -34,9 +35,8 @@ class AgentWebhook(Resource):
jsonify({"success": False, "message": "ID is required"}), 400
)
try:
agent = agents_collection.find_one(
{"_id": ObjectId(agent_id), "user": user}
)
with db_readonly() as conn:
agent = AgentsRepository(conn).get_any(agent_id, user)
if not agent:
return make_response(
jsonify({"success": False, "message": "Agent not found"}), 404
@@ -44,10 +44,11 @@ class AgentWebhook(Resource):
webhook_token = agent.get("incoming_webhook_token")
if not webhook_token:
webhook_token = secrets.token_urlsafe(32)
agents_collection.update_one(
{"_id": ObjectId(agent_id), "user": user},
{"$set": {"incoming_webhook_token": webhook_token}},
)
with db_session() as conn:
AgentsRepository(conn).update(
str(agent["id"]), user,
{"incoming_webhook_token": webhook_token},
)
base_url = settings.API_URL.rstrip("/")
full_webhook_url = f"{base_url}/api/webhooks/agents/{webhook_token}"
except Exception as err:

View File

@@ -2,26 +2,84 @@
import datetime
from bson.objectid import ObjectId
from flask import current_app, jsonify, make_response, request
from flask_restx import fields, Namespace, Resource
from sqlalchemy import text as _sql_text
from application.api import api
from application.api.user.base import (
agents_collection,
conversations_collection,
generate_date_range,
generate_hourly_range,
generate_minute_range,
token_usage_collection,
user_logs_collection,
)
from application.storage.db.repositories.agents import AgentsRepository
from application.storage.db.repositories.token_usage import TokenUsageRepository
from application.storage.db.repositories.user_logs import UserLogsRepository
from application.storage.db.session import db_readonly
analytics_ns = Namespace(
"analytics", description="Analytics and reporting operations", path="/api"
)
_FILTER_BUCKETS = {
"last_hour": ("minute", "%Y-%m-%d %H:%M:00", "YYYY-MM-DD HH24:MI:00"),
"last_24_hour": ("hour", "%Y-%m-%d %H:00", "YYYY-MM-DD HH24:00"),
"last_7_days": ("day", "%Y-%m-%d", "YYYY-MM-DD"),
"last_15_days": ("day", "%Y-%m-%d", "YYYY-MM-DD"),
"last_30_days": ("day", "%Y-%m-%d", "YYYY-MM-DD"),
}
def _range_for_filter(filter_option: str):
"""Return ``(start_date, end_date, bucket_unit, pg_fmt)`` for the filter.
Returns ``None`` on invalid filter.
"""
if filter_option not in _FILTER_BUCKETS:
return None
end_date = datetime.datetime.now(datetime.timezone.utc)
bucket_unit, _py_fmt, pg_fmt = _FILTER_BUCKETS[filter_option]
if filter_option == "last_hour":
start_date = end_date - datetime.timedelta(hours=1)
elif filter_option == "last_24_hour":
start_date = end_date - datetime.timedelta(hours=24)
else:
days = {
"last_7_days": 6,
"last_15_days": 14,
"last_30_days": 29,
}[filter_option]
start_date = end_date - datetime.timedelta(days=days)
start_date = start_date.replace(hour=0, minute=0, second=0, microsecond=0)
end_date = end_date.replace(
hour=23, minute=59, second=59, microsecond=999999
)
return start_date, end_date, bucket_unit, pg_fmt
def _intervals_for_filter(filter_option, start_date, end_date):
if filter_option == "last_hour":
return generate_minute_range(start_date, end_date)
if filter_option == "last_24_hour":
return generate_hourly_range(start_date, end_date)
return generate_date_range(start_date, end_date)
def _resolve_api_key(conn, api_key_id, user_id):
"""Look up the ``agents.key`` value for a given agent id.
Scoped by ``user_id`` so an authenticated caller can't probe another
user's agents. Accepts either UUID or legacy Mongo ObjectId shape.
"""
if not api_key_id:
return None
agent = AgentsRepository(conn).get_any(api_key_id, user_id)
return (agent or {}).get("key") if agent else None
@analytics_ns.route("/get_message_analytics")
class GetMessageAnalytics(Resource):
get_message_analytics_model = api.model(
@@ -32,13 +90,7 @@ class GetMessageAnalytics(Resource):
required=False,
description="Filter option for analytics",
default="last_30_days",
enum=[
"last_hour",
"last_24_hour",
"last_7_days",
"last_15_days",
"last_30_days",
],
enum=list(_FILTER_BUCKETS.keys()),
),
},
)
@@ -50,88 +102,54 @@ class GetMessageAnalytics(Resource):
if not decoded_token:
return make_response(jsonify({"success": False}), 401)
user = decoded_token.get("sub")
data = request.get_json()
data = request.get_json() or {}
api_key_id = data.get("api_key_id")
filter_option = data.get("filter_option", "last_30_days")
window = _range_for_filter(filter_option)
if window is None:
return make_response(
jsonify({"success": False, "message": "Invalid option"}), 400
)
start_date, end_date, _bucket_unit, pg_fmt = window
try:
api_key = (
agents_collection.find_one({"_id": ObjectId(api_key_id), "user": user})[
"key"
with db_readonly() as conn:
api_key = _resolve_api_key(conn, api_key_id, user)
# Count messages per bucket, filtered by the conversation's
# owner (user_id) and optionally the agent api_key. The
# ``user_id`` filter is always applied post-cutover to
# prevent cross-tenant leakage on admin dashboards.
clauses = [
"c.user_id = :user_id",
"m.timestamp >= :start",
"m.timestamp <= :end",
]
if api_key_id
else None
)
except Exception as err:
current_app.logger.error(f"Error getting API key: {err}", exc_info=True)
return make_response(jsonify({"success": False}), 400)
end_date = datetime.datetime.now(datetime.timezone.utc)
if filter_option == "last_hour":
start_date = end_date - datetime.timedelta(hours=1)
group_format = "%Y-%m-%d %H:%M:00"
elif filter_option == "last_24_hour":
start_date = end_date - datetime.timedelta(hours=24)
group_format = "%Y-%m-%d %H:00"
else:
if filter_option in ["last_7_days", "last_15_days", "last_30_days"]:
filter_days = (
6
if filter_option == "last_7_days"
else 14 if filter_option == "last_15_days" else 29
)
else:
return make_response(
jsonify({"success": False, "message": "Invalid option"}), 400
)
start_date = end_date - datetime.timedelta(days=filter_days)
start_date = start_date.replace(hour=0, minute=0, second=0, microsecond=0)
end_date = end_date.replace(
hour=23, minute=59, second=59, microsecond=999999
)
group_format = "%Y-%m-%d"
try:
match_stage = {
"$match": {
"user": user,
params: dict = {
"user_id": user,
"start": start_date,
"end": end_date,
"fmt": pg_fmt,
}
}
if api_key:
match_stage["$match"]["api_key"] = api_key
pipeline = [
match_stage,
{"$unwind": "$queries"},
{
"$match": {
"queries.timestamp": {"$gte": start_date, "$lte": end_date}
}
},
{
"$group": {
"_id": {
"$dateToString": {
"format": group_format,
"date": "$queries.timestamp",
}
},
"count": {"$sum": 1},
}
},
{"$sort": {"_id": 1}},
]
if api_key:
clauses.append("c.api_key = :api_key")
params["api_key"] = api_key
where = " AND ".join(clauses)
sql = (
"SELECT to_char(m.timestamp AT TIME ZONE 'UTC', :fmt) AS bucket, "
"COUNT(*) AS count "
"FROM conversation_messages m "
"JOIN conversations c ON c.id = m.conversation_id "
f"WHERE {where} "
"GROUP BY bucket ORDER BY bucket ASC"
)
rows = conn.execute(_sql_text(sql), params).fetchall()
message_data = conversations_collection.aggregate(pipeline)
if filter_option == "last_hour":
intervals = generate_minute_range(start_date, end_date)
elif filter_option == "last_24_hour":
intervals = generate_hourly_range(start_date, end_date)
else:
intervals = generate_date_range(start_date, end_date)
intervals = _intervals_for_filter(filter_option, start_date, end_date)
daily_messages = {interval: 0 for interval in intervals}
for entry in message_data:
daily_messages[entry["_id"]] = entry["count"]
for row in rows:
daily_messages[row._mapping["bucket"]] = int(row._mapping["count"])
except Exception as err:
current_app.logger.error(
f"Error getting message analytics: {err}", exc_info=True
@@ -152,13 +170,7 @@ class GetTokenAnalytics(Resource):
required=False,
description="Filter option for analytics",
default="last_30_days",
enum=[
"last_hour",
"last_24_hour",
"last_7_days",
"last_15_days",
"last_30_days",
],
enum=list(_FILTER_BUCKETS.keys()),
),
},
)
@@ -170,123 +182,36 @@ class GetTokenAnalytics(Resource):
if not decoded_token:
return make_response(jsonify({"success": False}), 401)
user = decoded_token.get("sub")
data = request.get_json()
data = request.get_json() or {}
api_key_id = data.get("api_key_id")
filter_option = data.get("filter_option", "last_30_days")
try:
api_key = (
agents_collection.find_one({"_id": ObjectId(api_key_id), "user": user})[
"key"
]
if api_key_id
else None
window = _range_for_filter(filter_option)
if window is None:
return make_response(
jsonify({"success": False, "message": "Invalid option"}), 400
)
except Exception as err:
current_app.logger.error(f"Error getting API key: {err}", exc_info=True)
return make_response(jsonify({"success": False}), 400)
end_date = datetime.datetime.now(datetime.timezone.utc)
start_date, end_date, bucket_unit, _pg_fmt = window
if filter_option == "last_hour":
start_date = end_date - datetime.timedelta(hours=1)
group_format = "%Y-%m-%d %H:%M:00"
group_stage = {
"$group": {
"_id": {
"minute": {
"$dateToString": {
"format": group_format,
"date": "$timestamp",
}
}
},
"total_tokens": {
"$sum": {"$add": ["$prompt_tokens", "$generated_tokens"]}
},
}
}
elif filter_option == "last_24_hour":
start_date = end_date - datetime.timedelta(hours=24)
group_format = "%Y-%m-%d %H:00"
group_stage = {
"$group": {
"_id": {
"hour": {
"$dateToString": {
"format": group_format,
"date": "$timestamp",
}
}
},
"total_tokens": {
"$sum": {"$add": ["$prompt_tokens", "$generated_tokens"]}
},
}
}
else:
if filter_option in ["last_7_days", "last_15_days", "last_30_days"]:
filter_days = (
6
if filter_option == "last_7_days"
else (14 if filter_option == "last_15_days" else 29)
)
else:
return make_response(
jsonify({"success": False, "message": "Invalid option"}), 400
)
start_date = end_date - datetime.timedelta(days=filter_days)
start_date = start_date.replace(hour=0, minute=0, second=0, microsecond=0)
end_date = end_date.replace(
hour=23, minute=59, second=59, microsecond=999999
)
group_format = "%Y-%m-%d"
group_stage = {
"$group": {
"_id": {
"day": {
"$dateToString": {
"format": group_format,
"date": "$timestamp",
}
}
},
"total_tokens": {
"$sum": {"$add": ["$prompt_tokens", "$generated_tokens"]}
},
}
}
try:
match_stage = {
"$match": {
"user_id": user,
"timestamp": {"$gte": start_date, "$lte": end_date},
}
}
if api_key:
match_stage["$match"]["api_key"] = api_key
token_usage_data = token_usage_collection.aggregate(
[
match_stage,
group_stage,
{"$sort": {"_id": 1}},
]
)
with db_readonly() as conn:
api_key = _resolve_api_key(conn, api_key_id, user)
# ``bucketed_totals`` applies user_id / api_key filters
# directly — no need to reshape a Mongo pipeline.
rows = TokenUsageRepository(conn).bucketed_totals(
bucket_unit=bucket_unit,
user_id=user,
api_key=api_key,
timestamp_gte=start_date,
timestamp_lt=end_date,
)
if filter_option == "last_hour":
intervals = generate_minute_range(start_date, end_date)
elif filter_option == "last_24_hour":
intervals = generate_hourly_range(start_date, end_date)
else:
intervals = generate_date_range(start_date, end_date)
intervals = _intervals_for_filter(filter_option, start_date, end_date)
daily_token_usage = {interval: 0 for interval in intervals}
for entry in token_usage_data:
if filter_option == "last_hour":
daily_token_usage[entry["_id"]["minute"]] = entry["total_tokens"]
elif filter_option == "last_24_hour":
daily_token_usage[entry["_id"]["hour"]] = entry["total_tokens"]
else:
daily_token_usage[entry["_id"]["day"]] = entry["total_tokens"]
for entry in rows:
daily_token_usage[entry["bucket"]] = int(
entry["prompt_tokens"] + entry["generated_tokens"]
)
except Exception as err:
current_app.logger.error(
f"Error getting token analytics: {err}", exc_info=True
@@ -307,13 +232,7 @@ class GetFeedbackAnalytics(Resource):
required=False,
description="Filter option for analytics",
default="last_30_days",
enum=[
"last_hour",
"last_24_hour",
"last_7_days",
"last_15_days",
"last_30_days",
],
enum=list(_FILTER_BUCKETS.keys()),
),
},
)
@@ -325,128 +244,64 @@ class GetFeedbackAnalytics(Resource):
if not decoded_token:
return make_response(jsonify({"success": False}), 401)
user = decoded_token.get("sub")
data = request.get_json()
data = request.get_json() or {}
api_key_id = data.get("api_key_id")
filter_option = data.get("filter_option", "last_30_days")
window = _range_for_filter(filter_option)
if window is None:
return make_response(
jsonify({"success": False, "message": "Invalid option"}), 400
)
start_date, end_date, _bucket_unit, pg_fmt = window
try:
api_key = (
agents_collection.find_one({"_id": ObjectId(api_key_id), "user": user})[
"key"
with db_readonly() as conn:
api_key = _resolve_api_key(conn, api_key_id, user)
# Feedback lives inside the ``conversation_messages.feedback``
# JSONB as ``{"text": "like"|"dislike", "timestamp": "..."}``.
# There is no scalar ``feedback_timestamp`` column — extract
# the timestamp from the JSONB and cast it to timestamptz for
# the range filter + bucket grouping.
clauses = [
"c.user_id = :user_id",
"m.feedback IS NOT NULL",
"(m.feedback->>'timestamp')::timestamptz >= :start",
"(m.feedback->>'timestamp')::timestamptz <= :end",
]
if api_key_id
else None
)
except Exception as err:
current_app.logger.error(f"Error getting API key: {err}", exc_info=True)
return make_response(jsonify({"success": False}), 400)
end_date = datetime.datetime.now(datetime.timezone.utc)
if filter_option == "last_hour":
start_date = end_date - datetime.timedelta(hours=1)
group_format = "%Y-%m-%d %H:%M:00"
date_field = {
"$dateToString": {
"format": group_format,
"date": "$queries.feedback_timestamp",
params: dict = {
"user_id": user,
"start": start_date,
"end": end_date,
"fmt": pg_fmt,
}
}
elif filter_option == "last_24_hour":
start_date = end_date - datetime.timedelta(hours=24)
group_format = "%Y-%m-%d %H:00"
date_field = {
"$dateToString": {
"format": group_format,
"date": "$queries.feedback_timestamp",
}
}
else:
if filter_option in ["last_7_days", "last_15_days", "last_30_days"]:
filter_days = (
6
if filter_option == "last_7_days"
else (14 if filter_option == "last_15_days" else 29)
if api_key:
clauses.append("c.api_key = :api_key")
params["api_key"] = api_key
where = " AND ".join(clauses)
sql = (
"SELECT to_char("
"(m.feedback->>'timestamp')::timestamptz AT TIME ZONE 'UTC', :fmt"
") AS bucket, "
"SUM(CASE WHEN m.feedback->>'text' = 'like' THEN 1 ELSE 0 END) AS positive, "
"SUM(CASE WHEN m.feedback->>'text' = 'dislike' THEN 1 ELSE 0 END) AS negative "
"FROM conversation_messages m "
"JOIN conversations c ON c.id = m.conversation_id "
f"WHERE {where} "
"GROUP BY bucket ORDER BY bucket ASC"
)
else:
return make_response(
jsonify({"success": False, "message": "Invalid option"}), 400
)
start_date = end_date - datetime.timedelta(days=filter_days)
start_date = start_date.replace(hour=0, minute=0, second=0, microsecond=0)
end_date = end_date.replace(
hour=23, minute=59, second=59, microsecond=999999
)
group_format = "%Y-%m-%d"
date_field = {
"$dateToString": {
"format": group_format,
"date": "$queries.feedback_timestamp",
}
}
try:
match_stage = {
"$match": {
"queries.feedback_timestamp": {
"$gte": start_date,
"$lte": end_date,
},
"queries.feedback": {"$exists": True},
}
}
if api_key:
match_stage["$match"]["api_key"] = api_key
pipeline = [
match_stage,
{"$unwind": "$queries"},
{"$match": {"queries.feedback": {"$exists": True}}},
{
"$group": {
"_id": {"time": date_field, "feedback": "$queries.feedback"},
"count": {"$sum": 1},
}
},
{
"$group": {
"_id": "$_id.time",
"positive": {
"$sum": {
"$cond": [
{"$eq": ["$_id.feedback", "LIKE"]},
"$count",
0,
]
}
},
"negative": {
"$sum": {
"$cond": [
{"$eq": ["$_id.feedback", "DISLIKE"]},
"$count",
0,
]
}
},
}
},
{"$sort": {"_id": 1}},
]
rows = conn.execute(_sql_text(sql), params).fetchall()
feedback_data = conversations_collection.aggregate(pipeline)
if filter_option == "last_hour":
intervals = generate_minute_range(start_date, end_date)
elif filter_option == "last_24_hour":
intervals = generate_hourly_range(start_date, end_date)
else:
intervals = generate_date_range(start_date, end_date)
intervals = _intervals_for_filter(filter_option, start_date, end_date)
daily_feedback = {
interval: {"positive": 0, "negative": 0} for interval in intervals
}
for entry in feedback_data:
daily_feedback[entry["_id"]] = {
"positive": entry["positive"],
"negative": entry["negative"],
for row in rows:
bucket = row._mapping["bucket"]
daily_feedback[bucket] = {
"positive": int(row._mapping["positive"] or 0),
"negative": int(row._mapping["negative"] or 0),
}
except Exception as err:
current_app.logger.error(
@@ -484,47 +339,89 @@ class GetUserLogs(Resource):
if not decoded_token:
return make_response(jsonify({"success": False}), 401)
user = decoded_token.get("sub")
data = request.get_json()
data = request.get_json() or {}
page = int(data.get("page", 1))
api_key_id = data.get("api_key_id")
page_size = int(data.get("page_size", 10))
skip = (page - 1) * page_size
try:
api_key = (
agents_collection.find_one({"_id": ObjectId(api_key_id)})["key"]
if api_key_id
else None
)
with db_readonly() as conn:
api_key = _resolve_api_key(conn, api_key_id, user)
logs_repo = UserLogsRepository(conn)
if api_key:
# ``find_by_api_key`` filters on ``data->>'api_key'``
# — the PG shape of the legacy top-level ``api_key``
# filter. Paginate client-side using offset/limit.
all_rows = logs_repo.find_by_api_key(api_key)
offset = (page - 1) * page_size
window = all_rows[offset: offset + page_size + 1]
items = window
else:
items, has_more_flag = logs_repo.list_paginated(
user_id=user,
page=page,
page_size=page_size,
)
# list_paginated already trims to page_size and
# returns has_more separately.
results = [
{
"id": str(item.get("id") or item.get("_id")),
"action": (item.get("data") or {}).get("action"),
"level": (item.get("data") or {}).get("level"),
"user": item.get("user_id"),
"question": (item.get("data") or {}).get("question"),
"sources": (item.get("data") or {}).get("sources"),
"retriever_params": (item.get("data") or {}).get(
"retriever_params"
),
"timestamp": (
item["timestamp"].isoformat()
if hasattr(item.get("timestamp"), "isoformat")
else item.get("timestamp")
),
}
for item in items
]
return make_response(
jsonify(
{
"success": True,
"logs": results,
"page": page,
"page_size": page_size,
"has_more": has_more_flag,
}
),
200,
)
has_more = len(items) > page_size
items = items[:page_size]
results = [
{
"id": str(item.get("id") or item.get("_id")),
"action": (item.get("data") or {}).get("action"),
"level": (item.get("data") or {}).get("level"),
"user": item.get("user_id"),
"question": (item.get("data") or {}).get("question"),
"sources": (item.get("data") or {}).get("sources"),
"retriever_params": (item.get("data") or {}).get(
"retriever_params"
),
"timestamp": (
item["timestamp"].isoformat()
if hasattr(item.get("timestamp"), "isoformat")
else item.get("timestamp")
),
}
for item in items
]
except Exception as err:
current_app.logger.error(f"Error getting API key: {err}", exc_info=True)
current_app.logger.error(
f"Error getting user logs: {err}", exc_info=True
)
return make_response(jsonify({"success": False}), 400)
query = {"user": user}
if api_key:
query = {"api_key": api_key}
items_cursor = (
user_logs_collection.find(query)
.sort("timestamp", -1)
.skip(skip)
.limit(page_size + 1)
)
items = list(items_cursor)
results = [
{
"id": str(item.get("_id")),
"action": item.get("action"),
"level": item.get("level"),
"user": item.get("user"),
"question": item.get("question"),
"sources": item.get("sources"),
"retriever_params": item.get("retriever_params"),
"timestamp": item.get("timestamp"),
}
for item in items[:page_size]
]
has_more = len(items) > page_size
return make_response(
jsonify(

View File

@@ -4,13 +4,16 @@ import os
import tempfile
from pathlib import Path
from bson.objectid import ObjectId
import uuid
from flask import current_app, jsonify, make_response, request
from flask_restx import fields, Namespace, Resource
from application.api import api
from application.cache import get_redis_instance
from application.core.settings import settings
from application.storage.db.repositories.agents import AgentsRepository
from application.storage.db.session import db_readonly
from application.stt.constants import (
SUPPORTED_AUDIO_EXTENSIONS,
SUPPORTED_AUDIO_MIME_TYPES,
@@ -48,14 +51,13 @@ def _resolve_authenticated_user():
return safe_filename(decoded_token.get("sub"))
if api_key:
from application.api.user.base import agents_collection
agent = agents_collection.find_one({"key": api_key})
with db_readonly() as conn:
agent = AgentsRepository(conn).find_by_key(api_key)
if not agent:
return make_response(
jsonify({"success": False, "message": "Invalid API key"}), 401
)
return safe_filename(agent.get("user"))
return safe_filename(agent.get("user_id"))
return None
@@ -157,7 +159,7 @@ class StoreAttachment(Resource):
for idx, file in enumerate(files):
try:
attachment_id = ObjectId()
attachment_id = uuid.uuid4()
original_filename = safe_filename(os.path.basename(file.filename))
_enforce_uploaded_audio_size_limit(file, original_filename)
relative_path = f"{settings.UPLOAD_FOLDER}/{user}/attachments/{str(attachment_id)}/{original_filename}"
@@ -612,6 +614,10 @@ class LiveSpeechToTextFinish(Resource):
class ServeImage(Resource):
@api.doc(description="Serve an image from storage")
def get(self, image_path):
if ".." in image_path or image_path.startswith("/") or "\x00" in image_path:
return make_response(
jsonify({"success": False, "message": "Invalid image path"}), 400
)
try:
from application.api.user.base import storage
@@ -629,6 +635,10 @@ class ServeImage(Resource):
return make_response(
jsonify({"success": False, "message": "Image not found"}), 404
)
except ValueError:
return make_response(
jsonify({"success": False, "message": "Invalid image path"}), 400
)
except Exception as e:
current_app.logger.error(f"Error serving image: {e}")
return make_response(

View File

@@ -8,13 +8,15 @@ import uuid
from functools import wraps
from typing import Optional, Tuple
from bson.objectid import ObjectId
from flask import current_app, jsonify, make_response, Response
from pymongo import ReturnDocument
from werkzeug.utils import secure_filename
from application.core.mongo_db import MongoDB
from sqlalchemy import text as _sql_text
from application.core.settings import settings
from application.storage.db.base_repository import looks_like_uuid, row_to_dict
from application.storage.db.repositories.users import UsersRepository
from application.storage.db.session import db_readonly, db_session
from application.storage.storage_creator import StorageCreator
from application.vectorstore.vector_creator import VectorCreator
@@ -22,56 +24,6 @@ from application.vectorstore.vector_creator import VectorCreator
storage = StorageCreator.get_storage()
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
conversations_collection = db["conversations"]
sources_collection = db["sources"]
prompts_collection = db["prompts"]
feedback_collection = db["feedback"]
agents_collection = db["agents"]
agent_folders_collection = db["agent_folders"]
token_usage_collection = db["token_usage"]
shared_conversations_collections = db["shared_conversations"]
users_collection = db["users"]
user_logs_collection = db["user_logs"]
user_tools_collection = db["user_tools"]
attachments_collection = db["attachments"]
workflow_runs_collection = db["workflow_runs"]
workflows_collection = db["workflows"]
workflow_nodes_collection = db["workflow_nodes"]
workflow_edges_collection = db["workflow_edges"]
try:
agents_collection.create_index(
[("shared", 1)],
name="shared_index",
background=True,
)
users_collection.create_index("user_id", unique=True)
workflows_collection.create_index(
[("user", 1)], name="workflow_user_index", background=True
)
workflow_nodes_collection.create_index(
[("workflow_id", 1)], name="node_workflow_index", background=True
)
workflow_nodes_collection.create_index(
[("workflow_id", 1), ("graph_version", 1)],
name="node_workflow_graph_version_index",
background=True,
)
workflow_edges_collection.create_index(
[("workflow_id", 1)], name="edge_workflow_index", background=True
)
workflow_edges_collection.create_index(
[("workflow_id", 1), ("graph_version", 1)],
name="edge_workflow_graph_version_index",
background=True,
)
except Exception as e:
print("Error creating indexes:", e)
current_dir = os.path.dirname(
os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
)
@@ -103,66 +55,95 @@ def generate_date_range(start_date, end_date):
def ensure_user_doc(user_id):
"""
Ensure user document exists with proper agent preferences structure.
Ensure a Postgres ``users`` row exists for ``user_id``.
Returns the row as a dict with the shape legacy callers expect — in
particular ``user_id`` and ``agent_preferences`` (with ``pinned`` and
``shared_with_me`` list keys always present).
Args:
user_id: The user ID to ensure
Returns:
The user document
The user document as a dict.
"""
default_prefs = {
"pinned": [],
"shared_with_me": [],
}
with db_session() as conn:
user_doc = UsersRepository(conn).upsert(user_id)
user_doc = users_collection.find_one_and_update(
{"user_id": user_id},
{"$setOnInsert": {"agent_preferences": default_prefs}},
upsert=True,
return_document=ReturnDocument.AFTER,
)
prefs = user_doc.get("agent_preferences", {})
updates = {}
if "pinned" not in prefs:
updates["agent_preferences.pinned"] = []
if "shared_with_me" not in prefs:
updates["agent_preferences.shared_with_me"] = []
if updates:
users_collection.update_one({"user_id": user_id}, {"$set": updates})
user_doc = users_collection.find_one({"user_id": user_id})
prefs = user_doc.get("agent_preferences") or {}
if not isinstance(prefs, dict):
prefs = {}
prefs.setdefault("pinned", [])
prefs.setdefault("shared_with_me", [])
user_doc["agent_preferences"] = prefs
return user_doc
def resolve_tool_details(tool_ids):
"""
Resolve tool IDs to their details.
Resolve tool IDs to their display details.
Accepts either Postgres UUIDs or legacy Mongo ObjectId strings (mixed
lists are supported — each id is looked up via ``get_any``, which
resolves to whichever column matches). Unknown ids are silently
skipped.
Args:
tool_ids: List of tool IDs
tool_ids: List of tool IDs (UUIDs or legacy Mongo ObjectId strings).
Returns:
List of tool details with id, name, and display_name
List of tool details with ``id``, ``name``, and ``display_name``.
"""
valid_ids = []
if not tool_ids:
return []
uuid_ids: list[str] = []
legacy_ids: list[str] = []
for tid in tool_ids:
try:
valid_ids.append(ObjectId(tid))
except Exception:
if not tid:
continue
tools = user_tools_collection.find(
{"_id": {"$in": valid_ids}}
) if valid_ids else []
tid_str = str(tid)
if looks_like_uuid(tid_str):
uuid_ids.append(tid_str)
else:
legacy_ids.append(tid_str)
if not uuid_ids and not legacy_ids:
return []
rows: list[dict] = []
with db_readonly() as conn:
if uuid_ids:
result = conn.execute(
_sql_text(
"SELECT * FROM user_tools "
"WHERE id = ANY(CAST(:ids AS uuid[]))"
),
{"ids": uuid_ids},
)
rows.extend(row_to_dict(r) for r in result.fetchall())
if legacy_ids:
result = conn.execute(
_sql_text(
"SELECT * FROM user_tools "
"WHERE legacy_mongo_id = ANY(:ids)"
),
{"ids": legacy_ids},
)
rows.extend(row_to_dict(r) for r in result.fetchall())
return [
{
"id": str(tool["_id"]),
"name": tool.get("name", ""),
"display_name": tool.get("customName")
or tool.get("displayName")
or tool.get("name", ""),
"id": str(tool.get("id") or tool.get("legacy_mongo_id") or ""),
"name": tool.get("name", "") or "",
"display_name": (
tool.get("custom_name")
or tool.get("display_name")
or tool.get("name", "")
or ""
),
}
for tool in tools
for tool in rows
]
@@ -232,14 +213,15 @@ def require_agent(func):
@wraps(func)
def wrapper(*args, **kwargs):
from application.storage.db.repositories.agents import AgentsRepository
webhook_token = kwargs.get("webhook_token")
if not webhook_token:
return make_response(
jsonify({"success": False, "message": "Webhook token missing"}), 400
)
agent = agents_collection.find_one(
{"incoming_webhook_token": webhook_token}, {"_id": 1}
)
with db_readonly() as conn:
agent = AgentsRepository(conn).find_by_webhook_token(webhook_token)
if not agent:
current_app.logger.warning(
f"Webhook attempt with invalid token: {webhook_token}"
@@ -248,7 +230,7 @@ def require_agent(func):
jsonify({"success": False, "message": "Agent not found"}), 404
)
kwargs["agent"] = agent
kwargs["agent_id_str"] = str(agent["_id"])
kwargs["agent_id_str"] = str(agent["id"])
return func(*args, **kwargs)
return wrapper

View File

@@ -2,12 +2,13 @@
import datetime
from bson.objectid import ObjectId
from flask import current_app, jsonify, make_response, request
from flask_restx import fields, Namespace, Resource
from application.api import api
from application.api.user.base import attachments_collection, conversations_collection
from application.storage.db.repositories.attachments import AttachmentsRepository
from application.storage.db.repositories.conversations import ConversationsRepository
from application.storage.db.session import db_readonly, db_session
from application.utils import check_required_fields
conversations_ns = Namespace(
@@ -30,10 +31,13 @@ class DeleteConversation(Resource):
return make_response(
jsonify({"success": False, "message": "ID is required"}), 400
)
user_id = decoded_token["sub"]
try:
conversations_collection.delete_one(
{"_id": ObjectId(conversation_id), "user": decoded_token["sub"]}
)
with db_session() as conn:
repo = ConversationsRepository(conn)
conv = repo.get_any(conversation_id, user_id)
if conv is not None:
repo.delete(str(conv["id"]), user_id)
except Exception as err:
current_app.logger.error(
f"Error deleting conversation: {err}", exc_info=True
@@ -53,7 +57,8 @@ class DeleteAllConversations(Resource):
return make_response(jsonify({"success": False}), 401)
user_id = decoded_token.get("sub")
try:
conversations_collection.delete_many({"user": user_id})
with db_session() as conn:
ConversationsRepository(conn).delete_all_for_user(user_id)
except Exception as err:
current_app.logger.error(
f"Error deleting all conversations: {err}", exc_info=True
@@ -71,26 +76,21 @@ class GetConversations(Resource):
decoded_token = request.decoded_token
if not decoded_token:
return make_response(jsonify({"success": False}), 401)
user_id = decoded_token.get("sub")
try:
conversations = (
conversations_collection.find(
{
"$or": [
{"api_key": {"$exists": False}},
{"agent_id": {"$exists": True}},
],
"user": decoded_token.get("sub"),
}
with db_readonly() as conn:
conversations = ConversationsRepository(conn).list_for_user(
user_id, limit=30
)
.sort("date", -1)
.limit(30)
)
list_conversations = [
{
"id": str(conversation["_id"]),
"id": str(conversation["id"]),
"name": conversation["name"],
"agent_id": conversation.get("agent_id", None),
"agent_id": (
str(conversation["agent_id"])
if conversation.get("agent_id")
else None
),
"is_shared_usage": conversation.get("is_shared_usage", False),
"shared_token": conversation.get("shared_token", None),
}
@@ -119,38 +119,67 @@ class GetSingleConversation(Resource):
return make_response(
jsonify({"success": False, "message": "ID is required"}), 400
)
user_id = decoded_token.get("sub")
try:
conversation = conversations_collection.find_one(
{"_id": ObjectId(conversation_id), "user": decoded_token.get("sub")}
)
if not conversation:
return make_response(jsonify({"status": "not found"}), 404)
# Process queries to include attachment names
with db_readonly() as conn:
repo = ConversationsRepository(conn)
conversation = repo.get_any(conversation_id, user_id)
if not conversation:
return make_response(jsonify({"status": "not found"}), 404)
conv_pg_id = str(conversation["id"])
messages = repo.get_messages(conv_pg_id)
queries = conversation["queries"]
for query in queries:
if "attachments" in query and query["attachments"]:
attachment_details = []
for attachment_id in query["attachments"]:
try:
attachment = attachments_collection.find_one(
{"_id": ObjectId(attachment_id)}
)
if attachment:
attachment_details.append(
{
"id": str(attachment["_id"]),
"fileName": attachment.get(
"filename", "Unknown file"
),
}
# Resolve attachment details (id, fileName) for each message.
attachments_repo = AttachmentsRepository(conn)
queries = []
for msg in messages:
query = {
"prompt": msg.get("prompt"),
"response": msg.get("response"),
"thought": msg.get("thought"),
"sources": msg.get("sources") or [],
"tool_calls": msg.get("tool_calls") or [],
"timestamp": msg.get("timestamp"),
"model_id": msg.get("model_id"),
}
if msg.get("metadata"):
query["metadata"] = msg["metadata"]
# Feedback on conversation_messages is a JSONB blob with
# shape {"text": <str>, "timestamp": <iso>}. The legacy
# frontend consumed a flat scalar feedback string, so
# unwrap the ``text`` field for compat.
feedback = msg.get("feedback")
if feedback is not None:
if isinstance(feedback, dict):
query["feedback"] = feedback.get("text")
if feedback.get("timestamp"):
query["feedback_timestamp"] = feedback["timestamp"]
else:
query["feedback"] = feedback
attachments = msg.get("attachments") or []
if attachments:
attachment_details = []
for attachment_id in attachments:
try:
att = attachments_repo.get_any(
str(attachment_id), user_id
)
except Exception as e:
current_app.logger.error(
f"Error retrieving attachment {attachment_id}: {e}",
exc_info=True,
)
query["attachments"] = attachment_details
if att:
attachment_details.append(
{
"id": str(att["id"]),
"fileName": att.get(
"filename", "Unknown file"
),
}
)
except Exception as e:
current_app.logger.error(
f"Error retrieving attachment {attachment_id}: {e}",
exc_info=True,
)
query["attachments"] = attachment_details
queries.append(query)
except Exception as err:
current_app.logger.error(
f"Error retrieving conversation: {err}", exc_info=True
@@ -158,7 +187,9 @@ class GetSingleConversation(Resource):
return make_response(jsonify({"success": False}), 400)
data = {
"queries": queries,
"agent_id": conversation.get("agent_id"),
"agent_id": (
str(conversation["agent_id"]) if conversation.get("agent_id") else None
),
"is_shared_usage": conversation.get("is_shared_usage", False),
"shared_token": conversation.get("shared_token", None),
}
@@ -190,11 +221,13 @@ class UpdateConversationName(Resource):
missing_fields = check_required_fields(data, required_fields)
if missing_fields:
return missing_fields
user_id = decoded_token.get("sub")
try:
conversations_collection.update_one(
{"_id": ObjectId(data["id"]), "user": decoded_token.get("sub")},
{"$set": {"name": data["name"]}},
)
with db_session() as conn:
repo = ConversationsRepository(conn)
conv = repo.get_any(data["id"], user_id)
if conv is not None:
repo.rename(str(conv["id"]), user_id, data["name"])
except Exception as err:
current_app.logger.error(
f"Error updating conversation name: {err}", exc_info=True
@@ -237,43 +270,33 @@ class SubmitFeedback(Resource):
missing_fields = check_required_fields(data, required_fields)
if missing_fields:
return missing_fields
user_id = decoded_token.get("sub")
feedback_value = data["feedback"]
question_index = int(data["question_index"])
# Normalize string feedback to lowercase so analytics queries
# (which match 'like'/'dislike') count rows correctly. Tolerate
# legacy uppercase clients on ingest. Non-string values pass through.
if isinstance(feedback_value, str):
feedback_value = feedback_value.lower()
feedback_payload = (
None
if feedback_value is None
else {
"text": feedback_value,
"timestamp": datetime.datetime.now(
datetime.timezone.utc
).isoformat(),
}
)
try:
if data["feedback"] is None:
# Remove feedback and feedback_timestamp if feedback is null
conversations_collection.update_one(
{
"_id": ObjectId(data["conversation_id"]),
"user": decoded_token.get("sub"),
f"queries.{data['question_index']}": {"$exists": True},
},
{
"$unset": {
f"queries.{data['question_index']}.feedback": "",
f"queries.{data['question_index']}.feedback_timestamp": "",
}
},
)
else:
# Set feedback and feedback_timestamp if feedback has a value
conversations_collection.update_one(
{
"_id": ObjectId(data["conversation_id"]),
"user": decoded_token.get("sub"),
f"queries.{data['question_index']}": {"$exists": True},
},
{
"$set": {
f"queries.{data['question_index']}.feedback": data[
"feedback"
],
f"queries.{data['question_index']}.feedback_timestamp": datetime.datetime.now(
datetime.timezone.utc
),
}
},
)
with db_session() as conn:
repo = ConversationsRepository(conn)
conv = repo.get_any(data["conversation_id"], user_id)
if conv is None:
return make_response(
jsonify({"success": False, "message": "Not found"}), 404
)
repo.set_feedback(str(conv["id"]), question_index, feedback_payload)
except Exception as err:
current_app.logger.error(f"Error submitting feedback: {err}", exc_info=True)
return make_response(jsonify({"success": False}), 400)

View File

@@ -2,12 +2,13 @@
import os
from bson.objectid import ObjectId
from flask import current_app, jsonify, make_response, request
from flask_restx import fields, Namespace, Resource
from application.api import api
from application.api.user.base import current_dir, prompts_collection
from application.api.user.base import current_dir
from application.storage.db.repositories.prompts import PromptsRepository
from application.storage.db.session import db_readonly, db_session
from application.utils import check_required_fields
prompts_ns = Namespace(
@@ -40,15 +41,9 @@ class CreatePrompt(Resource):
return missing_fields
user = decoded_token.get("sub")
try:
resp = prompts_collection.insert_one(
{
"name": data["name"],
"content": data["content"],
"user": user,
}
)
new_id = str(resp.inserted_id)
with db_session() as conn:
prompt = PromptsRepository(conn).create(user, data["name"], data["content"])
new_id = str(prompt["id"])
except Exception as err:
current_app.logger.error(f"Error creating prompt: {err}", exc_info=True)
return make_response(jsonify({"success": False}), 400)
@@ -64,17 +59,17 @@ class GetPrompts(Resource):
return make_response(jsonify({"success": False}), 401)
user = decoded_token.get("sub")
try:
prompts = prompts_collection.find({"user": user})
with db_readonly() as conn:
prompts = PromptsRepository(conn).list_for_user(user)
list_prompts = [
{"id": "default", "name": "default", "type": "public"},
{"id": "creative", "name": "creative", "type": "public"},
{"id": "strict", "name": "strict", "type": "public"},
]
for prompt in prompts:
list_prompts.append(
{
"id": str(prompt["_id"]),
"id": str(prompt["id"]),
"name": prompt["name"],
"type": "private",
}
@@ -119,9 +114,12 @@ class GetSinglePrompt(Resource):
) as f:
chat_reduce_strict = f.read()
return make_response(jsonify({"content": chat_reduce_strict}), 200)
prompt = prompts_collection.find_one(
{"_id": ObjectId(prompt_id), "user": user}
)
with db_readonly() as conn:
prompt = PromptsRepository(conn).get_any(prompt_id, user)
if not prompt:
return make_response(
jsonify({"success": False, "message": "Prompt not found"}), 404
)
except Exception as err:
current_app.logger.error(f"Error retrieving prompt: {err}", exc_info=True)
return make_response(jsonify({"success": False}), 400)
@@ -148,7 +146,15 @@ class DeletePrompt(Resource):
if missing_fields:
return missing_fields
try:
prompts_collection.delete_one({"_id": ObjectId(data["id"]), "user": user})
with db_session() as conn:
repo = PromptsRepository(conn)
prompt = repo.get_any(data["id"], user)
if not prompt:
return make_response(
jsonify({"success": False, "message": "Prompt not found"}),
404,
)
repo.delete(str(prompt["id"]), user)
except Exception as err:
current_app.logger.error(f"Error deleting prompt: {err}", exc_info=True)
return make_response(jsonify({"success": False}), 400)
@@ -181,10 +187,15 @@ class UpdatePrompt(Resource):
if missing_fields:
return missing_fields
try:
prompts_collection.update_one(
{"_id": ObjectId(data["id"]), "user": user},
{"$set": {"name": data["name"], "content": data["content"]}},
)
with db_session() as conn:
repo = PromptsRepository(conn)
prompt = repo.get_any(data["id"], user)
if not prompt:
return make_response(
jsonify({"success": False, "message": "Prompt not found"}),
404,
)
repo.update(str(prompt["id"]), user, data["name"], data["content"])
except Exception as err:
current_app.logger.error(f"Error updating prompt: {err}", exc_info=True)
return make_response(jsonify({"success": False}), 400)

View File

@@ -2,26 +2,126 @@
import uuid
from bson.binary import Binary, UuidRepresentation
from bson.dbref import DBRef
from bson.objectid import ObjectId
from flask import current_app, jsonify, make_response, request
from flask_restx import fields, inputs, Namespace, Resource
from sqlalchemy import text as _sql_text
from application.api import api
from application.api.user.base import (
agents_collection,
attachments_collection,
conversations_collection,
shared_conversations_collections,
from application.storage.db.base_repository import looks_like_uuid
from application.storage.db.repositories.agents import AgentsRepository
from application.storage.db.repositories.attachments import AttachmentsRepository
from application.storage.db.repositories.conversations import ConversationsRepository
from application.storage.db.repositories.shared_conversations import (
SharedConversationsRepository,
)
from application.storage.db.session import db_readonly, db_session
from application.utils import check_required_fields
sharing_ns = Namespace(
"sharing", description="Conversation sharing operations", path="/api"
)
def _resolve_prompt_pg_id(conn, prompt_id_raw, user_id):
"""Translate an incoming prompt id (UUID or legacy Mongo ObjectId) to a PG UUID.
Scoped by ``user_id`` so a caller can't link another user's prompt
into their share record. Returns ``None`` for sentinel values
(``"default"``) or unresolved ids.
"""
if not prompt_id_raw or prompt_id_raw == "default":
return None
value = str(prompt_id_raw)
# Already UUID — trust it but still require ownership. A shape-gate
# (rather than a loose ``len == 36 and '-' in value`` check) keeps
# non-UUID input out of ``CAST(:pid AS uuid)``; the cast would raise
# and poison the readonly transaction otherwise.
if looks_like_uuid(value):
row = conn.execute(
_sql_text(
"SELECT id FROM prompts WHERE id = CAST(:pid AS uuid) "
"AND user_id = :uid"
),
{"pid": value, "uid": user_id},
).fetchone()
return str(row[0]) if row else None
# Legacy Mongo ObjectId fallback.
row = conn.execute(
_sql_text(
"SELECT id FROM prompts WHERE legacy_mongo_id = :pid "
"AND user_id = :uid"
),
{"pid": value, "uid": user_id},
).fetchone()
return str(row[0]) if row else None
def _resolve_source_pg_id(conn, source_raw):
"""Translate a source id (UUID or legacy Mongo ObjectId) to a PG UUID."""
if not source_raw:
return None
value = str(source_raw)
# See ``_resolve_prompt_pg_id`` for the shape-gate rationale.
if looks_like_uuid(value):
row = conn.execute(
_sql_text(
"SELECT id FROM sources WHERE id = CAST(:sid AS uuid)"
),
{"sid": value},
).fetchone()
return str(row[0]) if row else None
row = conn.execute(
_sql_text("SELECT id FROM sources WHERE legacy_mongo_id = :sid"),
{"sid": value},
).fetchone()
return str(row[0]) if row else None
def _find_reusable_share_agent(
conn, user_id, *, prompt_pg_id, chunks, source_pg_id, retriever,
):
"""Find an existing share-as-agent key row matching these parameters.
Mirrors the legacy Mongo ``agents_collection.find_one`` pre-existence
check. Used to reuse an api key across repeated shares of the same
conversation with the same prompt/chunks/source/retriever.
"""
clauses = ["user_id = :uid", "key IS NOT NULL"]
params: dict = {"uid": user_id}
if prompt_pg_id is None:
clauses.append("prompt_id IS NULL")
else:
clauses.append("prompt_id = CAST(:pid AS uuid)")
params["pid"] = prompt_pg_id
if chunks is None:
clauses.append("chunks IS NULL")
else:
clauses.append("chunks = :chunks")
params["chunks"] = int(chunks)
if source_pg_id is None:
clauses.append("source_id IS NULL")
else:
clauses.append("source_id = CAST(:sid AS uuid)")
params["sid"] = source_pg_id
if retriever is None:
clauses.append("retriever IS NULL")
else:
clauses.append("retriever = :retr")
params["retr"] = retriever
sql = (
"SELECT * FROM agents WHERE "
+ " AND ".join(clauses)
+ " LIMIT 1"
)
row = conn.execute(_sql_text(sql), params).fetchone()
if row is None:
return None
mapping = dict(row._mapping)
mapping["id"] = str(mapping["id"]) if mapping.get("id") else None
return mapping
@sharing_ns.route("/share")
class ShareConversation(Resource):
share_conversation_model = api.model(
@@ -56,146 +156,94 @@ class ShareConversation(Resource):
conversation_id = data["conversation_id"]
try:
conversation = conversations_collection.find_one(
{"_id": ObjectId(conversation_id)}
)
if conversation is None:
return make_response(
jsonify(
{
"status": "error",
"message": "Conversation does not exist",
}
),
404,
)
current_n_queries = len(conversation["queries"])
explicit_binary = Binary.from_uuid(
uuid.uuid4(), UuidRepresentation.STANDARD
)
with db_session() as conn:
conv_repo = ConversationsRepository(conn)
shared_repo = SharedConversationsRepository(conn)
agents_repo = AgentsRepository(conn)
if is_promptable:
prompt_id = data.get("prompt_id", "default")
chunks = data.get("chunks", "2")
name = conversation["name"] + "(shared)"
new_api_key_data = {
"prompt_id": prompt_id,
"chunks": chunks,
"user": user,
}
if "source" in data and ObjectId.is_valid(data["source"]):
new_api_key_data["source"] = DBRef(
"sources", ObjectId(data["source"])
)
if "retriever" in data:
new_api_key_data["retriever"] = data["retriever"]
pre_existing_api_document = agents_collection.find_one(new_api_key_data)
if pre_existing_api_document:
api_uuid = pre_existing_api_document["key"]
pre_existing = shared_conversations_collections.find_one(
{
"conversation_id": ObjectId(conversation_id),
"isPromptable": is_promptable,
"first_n_queries": current_n_queries,
"user": user,
"api_key": api_uuid,
}
)
if pre_existing is not None:
return make_response(
jsonify(
{
"success": True,
"identifier": str(pre_existing["uuid"].as_uuid()),
}
),
200,
)
else:
shared_conversations_collections.insert_one(
conversation = conv_repo.get_any(conversation_id, user)
if conversation is None:
return make_response(
jsonify(
{
"uuid": explicit_binary,
"conversation_id": ObjectId(conversation_id),
"isPromptable": is_promptable,
"first_n_queries": current_n_queries,
"user": user,
"api_key": api_uuid,
"status": "error",
"message": "Conversation does not exist",
}
)
return make_response(
jsonify(
{
"success": True,
"identifier": str(explicit_binary.as_uuid()),
}
),
201,
)
else:
api_uuid = str(uuid.uuid4())
new_api_key_data["key"] = api_uuid
new_api_key_data["name"] = name
),
404,
)
conv_pg_id = str(conversation["id"])
current_n_queries = conv_repo.message_count(conv_pg_id)
if "source" in data and ObjectId.is_valid(data["source"]):
new_api_key_data["source"] = DBRef(
"sources", ObjectId(data["source"])
if is_promptable:
prompt_id_raw = data.get("prompt_id", "default")
chunks_raw = data.get("chunks", "2")
try:
chunks_int = int(chunks_raw) if chunks_raw not in (None, "") else None
except (TypeError, ValueError):
chunks_int = None
prompt_pg_id = _resolve_prompt_pg_id(conn, prompt_id_raw, user)
source_pg_id = _resolve_source_pg_id(conn, data.get("source"))
retriever = data.get("retriever")
reusable = _find_reusable_share_agent(
conn, user,
prompt_pg_id=prompt_pg_id,
chunks=chunks_int,
source_pg_id=source_pg_id,
retriever=retriever,
)
if reusable:
api_uuid = reusable.get("key")
else:
api_uuid = str(uuid.uuid4())
name = (conversation.get("name") or "") + "(shared)"
agents_repo.create(
user,
name,
"published",
key=api_uuid,
retriever=retriever,
chunks=chunks_int,
prompt_id=prompt_pg_id,
source_id=source_pg_id,
)
if "retriever" in data:
new_api_key_data["retriever"] = data["retriever"]
agents_collection.insert_one(new_api_key_data)
shared_conversations_collections.insert_one(
{
"uuid": explicit_binary,
"conversation_id": ObjectId(conversation_id),
"isPromptable": is_promptable,
"first_n_queries": current_n_queries,
"user": user,
"api_key": api_uuid,
}
share = shared_repo.get_or_create(
conv_pg_id,
user,
is_promptable=True,
first_n_queries=current_n_queries,
api_key=api_uuid,
prompt_id=prompt_pg_id,
chunks=chunks_int,
)
return make_response(
jsonify(
{
"success": True,
"identifier": str(explicit_binary.as_uuid()),
"identifier": str(share["uuid"]),
}
),
201,
201 if reusable is None else 200,
)
pre_existing = shared_conversations_collections.find_one(
{
"conversation_id": ObjectId(conversation_id),
"isPromptable": is_promptable,
"first_n_queries": current_n_queries,
"user": user,
}
)
if pre_existing is not None:
# Non-promptable share path.
share = shared_repo.get_or_create(
conv_pg_id,
user,
is_promptable=False,
first_n_queries=current_n_queries,
api_key=None,
)
return make_response(
jsonify(
{
"success": True,
"identifier": str(pre_existing["uuid"].as_uuid()),
"identifier": str(share["uuid"]),
}
),
200,
)
else:
shared_conversations_collections.insert_one(
{
"uuid": explicit_binary,
"conversation_id": ObjectId(conversation_id),
"isPromptable": is_promptable,
"first_n_queries": current_n_queries,
"user": user,
}
)
return make_response(
jsonify(
{"success": True, "identifier": str(explicit_binary.as_uuid())}
),
201,
)
except Exception as err:
@@ -210,37 +258,13 @@ class GetPubliclySharedConversations(Resource):
@api.doc(description="Get publicly shared conversations by identifier")
def get(self, identifier: str):
try:
query_uuid = Binary.from_uuid(
uuid.UUID(identifier), UuidRepresentation.STANDARD
)
shared = shared_conversations_collections.find_one({"uuid": query_uuid})
conversation_queries = []
with db_readonly() as conn:
shared_repo = SharedConversationsRepository(conn)
conv_repo = ConversationsRepository(conn)
attach_repo = AttachmentsRepository(conn)
if (
shared
and "conversation_id" in shared
):
# Handle DBRef (legacy), ObjectId, dict, and string formats for conversation_id
conversation_id = shared["conversation_id"]
if isinstance(conversation_id, DBRef):
conversation_id = conversation_id.id
elif isinstance(conversation_id, dict):
# Handle dict representation of DBRef (e.g., {"$ref": "...", "$id": "..."})
if "$id" in conversation_id:
conv_id = conversation_id["$id"]
# $id might be a dict like {"$oid": "..."} or a string
if isinstance(conv_id, dict) and "$oid" in conv_id:
conversation_id = ObjectId(conv_id["$oid"])
else:
conversation_id = ObjectId(conv_id)
elif "_id" in conversation_id:
conversation_id = ObjectId(conversation_id["_id"])
elif isinstance(conversation_id, str):
conversation_id = ObjectId(conversation_id)
conversation = conversations_collection.find_one(
{"_id": conversation_id}
)
if conversation is None:
shared = shared_repo.find_by_uuid(identifier)
if not shared or not shared.get("conversation_id"):
return make_response(
jsonify(
{
@@ -250,22 +274,60 @@ class GetPubliclySharedConversations(Resource):
),
404,
)
conversation_queries = conversation["queries"][
: (shared["first_n_queries"])
]
conv_pg_id = str(shared["conversation_id"])
owner_user = shared.get("user_id")
for query in conversation_queries:
if "attachments" in query and query["attachments"]:
conversation = conv_repo.get_owned(conv_pg_id, owner_user) if owner_user else None
if conversation is None:
# Fall back to any-user lookup in case shared row's
# user_id is missing — still keyed by PG UUID.
row = conn.execute(
_sql_text(
"SELECT * FROM conversations WHERE id = CAST(:id AS uuid)"
),
{"id": conv_pg_id},
).fetchone()
if row is None:
return make_response(
jsonify(
{
"success": False,
"error": "might have broken url or the conversation does not exist",
}
),
404,
)
conversation = dict(row._mapping)
messages = conv_repo.get_messages(conv_pg_id)
first_n = shared.get("first_n_queries") or 0
conversation_queries = []
for msg in messages[:first_n]:
query = {
"prompt": msg.get("prompt"),
"response": msg.get("response"),
"thought": msg.get("thought"),
"sources": msg.get("sources") or [],
"tool_calls": msg.get("tool_calls") or [],
"timestamp": (
msg["timestamp"].isoformat()
if hasattr(msg.get("timestamp"), "isoformat")
else msg.get("timestamp")
),
"feedback": msg.get("feedback"),
}
attachments = msg.get("attachments") or []
if attachments:
attachment_details = []
for attachment_id in query["attachments"]:
for attachment_id in attachments:
try:
attachment = attachments_collection.find_one(
{"_id": ObjectId(attachment_id)}
)
attachment = attach_repo.get_any(
str(attachment_id), owner_user,
) if owner_user else None
if attachment:
attachment_details.append(
{
"id": str(attachment["_id"]),
"id": str(attachment["id"]),
"fileName": attachment.get(
"filename", "Unknown file"
),
@@ -277,26 +339,23 @@ class GetPubliclySharedConversations(Resource):
exc_info=True,
)
query["attachments"] = attachment_details
else:
return make_response(
jsonify(
{
"success": False,
"error": "might have broken url or the conversation does not exist",
}
),
404,
conversation_queries.append(query)
created = conversation.get("created_at") or conversation.get("date")
date_iso = (
created.isoformat()
if hasattr(created, "isoformat")
else (str(created) if created is not None else None)
)
date = conversation["_id"].generation_time.isoformat()
res = {
"success": True,
"queries": conversation_queries,
"title": conversation["name"],
"timestamp": date,
}
if shared["isPromptable"] and "api_key" in shared:
res["api_key"] = shared["api_key"]
return make_response(jsonify(res), 200)
res = {
"success": True,
"queries": conversation_queries,
"title": conversation.get("name"),
"timestamp": date_iso,
}
if shared.get("is_promptable") and shared.get("api_key"):
res["api_key"] = shared["api_key"]
return make_response(jsonify(res), 200)
except Exception as err:
current_app.logger.error(
f"Error getting shared conversation: {err}", exc_info=True

View File

@@ -1,11 +1,12 @@
"""Source document management chunk management."""
from bson.objectid import ObjectId
from flask import current_app, jsonify, make_response, request
from flask_restx import fields, Namespace, Resource
from application.api import api
from application.api.user.base import get_vector_store, sources_collection
from application.api.user.base import get_vector_store
from application.storage.db.repositories.sources import SourcesRepository
from application.storage.db.session import db_readonly
from application.utils import check_required_fields, num_tokens_from_string
sources_chunks_ns = Namespace(
@@ -13,6 +14,15 @@ sources_chunks_ns = Namespace(
)
def _resolve_source(doc_id: str, user: str):
"""Resolve a source (UUID or legacy ObjectId) for the caller.
Returns the row dict (with PG UUID in ``id``) or ``None`` if missing.
"""
with db_readonly() as conn:
return SourcesRepository(conn).get_any(doc_id, user)
@sources_chunks_ns.route("/get_chunks")
class GetChunks(Resource):
@api.doc(
@@ -36,36 +46,34 @@ class GetChunks(Resource):
path = request.args.get("path")
search_term = request.args.get("search", "").strip().lower()
if not ObjectId.is_valid(doc_id):
if not doc_id:
return make_response(jsonify({"error": "Invalid doc_id"}), 400)
try:
doc = _resolve_source(doc_id, user)
except Exception as e:
current_app.logger.error(f"Error resolving source: {e}", exc_info=True)
return make_response(jsonify({"error": "Invalid doc_id"}), 400)
doc = sources_collection.find_one({"_id": ObjectId(doc_id), "user": user})
if not doc:
return make_response(
jsonify({"error": "Document not found or access denied"}), 404
)
resolved_id = str(doc["id"])
try:
store = get_vector_store(doc_id)
store = get_vector_store(resolved_id)
chunks = store.get_chunks()
filtered_chunks = []
for chunk in chunks:
metadata = chunk.get("metadata", {})
# Filter by path if provided
if path:
chunk_source = metadata.get("source", "")
chunk_file_path = metadata.get("file_path", "")
# Check if the chunk matches the requested path
# For file uploads: source ends with path (e.g., "inputs/.../file.pdf" ends with "file.pdf")
# For crawlers: file_path ends with path (e.g., "guides/setup.md" ends with "setup.md")
source_match = chunk_source and chunk_source.endswith(path)
file_path_match = chunk_file_path and chunk_file_path.endswith(path)
if not (source_match or file_path_match):
continue
# Filter by search term if provided
if search_term:
text_match = search_term in chunk.get("text", "").lower()
title_match = search_term in metadata.get("title", "").lower()
@@ -132,15 +140,17 @@ class AddChunk(Resource):
token_count = num_tokens_from_string(text)
metadata["token_count"] = token_count
if not ObjectId.is_valid(doc_id):
try:
doc = _resolve_source(doc_id, user)
except Exception as e:
current_app.logger.error(f"Error resolving source: {e}", exc_info=True)
return make_response(jsonify({"error": "Invalid doc_id"}), 400)
doc = sources_collection.find_one({"_id": ObjectId(doc_id), "user": user})
if not doc:
return make_response(
jsonify({"error": "Document not found or access denied"}), 404
)
try:
store = get_vector_store(doc_id)
store = get_vector_store(str(doc["id"]))
chunk_id = store.add_chunk(text, metadata)
return make_response(
jsonify({"message": "Chunk added successfully", "chunk_id": chunk_id}),
@@ -165,15 +175,17 @@ class DeleteChunk(Resource):
doc_id = request.args.get("id")
chunk_id = request.args.get("chunk_id")
if not ObjectId.is_valid(doc_id):
try:
doc = _resolve_source(doc_id, user)
except Exception as e:
current_app.logger.error(f"Error resolving source: {e}", exc_info=True)
return make_response(jsonify({"error": "Invalid doc_id"}), 400)
doc = sources_collection.find_one({"_id": ObjectId(doc_id), "user": user})
if not doc:
return make_response(
jsonify({"error": "Document not found or access denied"}), 404
)
try:
store = get_vector_store(doc_id)
store = get_vector_store(str(doc["id"]))
deleted = store.delete_chunk(chunk_id)
if deleted:
return make_response(
@@ -232,15 +244,17 @@ class UpdateChunk(Resource):
if metadata is None:
metadata = {}
metadata["token_count"] = token_count
if not ObjectId.is_valid(doc_id):
try:
doc = _resolve_source(doc_id, user)
except Exception as e:
current_app.logger.error(f"Error resolving source: {e}", exc_info=True)
return make_response(jsonify({"error": "Invalid doc_id"}), 400)
doc = sources_collection.find_one({"_id": ObjectId(doc_id), "user": user})
if not doc:
return make_response(
jsonify({"error": "Document not found or access denied"}), 404
)
try:
store = get_vector_store(doc_id)
store = get_vector_store(str(doc["id"]))
chunks = store.get_chunks()
existing_chunk = next((c for c in chunks if c["doc_id"] == chunk_id), None)

View File

@@ -3,14 +3,14 @@
import json
import math
from bson.objectid import ObjectId
from flask import current_app, jsonify, make_response, redirect, request
from flask_restx import fields, Namespace, Resource
from application.api import api
from application.api.user.base import sources_collection
from application.api.user.tasks import sync_source
from application.core.settings import settings
from application.storage.db.repositories.sources import SourcesRepository
from application.storage.db.session import db_readonly, db_session
from application.storage.storage_creator import StorageCreator
from application.utils import check_required_fields
from application.vectorstore.vector_creator import VectorCreator
@@ -56,11 +56,20 @@ class CombinedJson(Resource):
]
try:
for index in sources_collection.find({"user": user}).sort("date", -1):
with db_readonly() as conn:
indexes = SourcesRepository(conn).list_for_user(user)
# list_for_user sorts by created_at DESC; legacy shape sorted by
# "date" DESC. Both are monotonic on creation so the ordering is
# equivalent for dev; re-sort defensively.
indexes = sorted(
indexes, key=lambda r: r.get("date") or r.get("created_at") or "",
reverse=True,
)
for index in indexes:
provider = _get_provider_from_remote_data(index.get("remote_data"))
data.append(
{
"id": str(index["_id"]),
"id": str(index["id"]),
"name": index.get("name"),
"date": index.get("date"),
"model": settings.EMBEDDINGS_NAME,
@@ -70,9 +79,7 @@ class CombinedJson(Resource):
"syncFrequency": index.get("sync_frequency", ""),
"provider": provider,
"is_nested": bool(index.get("directory_structure")),
"type": index.get(
"type", "file"
), # Add type field with default "file"
"type": index.get("type", "file"),
}
)
except Exception as err:
@@ -89,61 +96,55 @@ class PaginatedSources(Resource):
if not decoded_token:
return make_response(jsonify({"success": False}), 401)
user = decoded_token.get("sub")
sort_field = request.args.get("sort", "date") # Default to 'date'
sort_order = request.args.get("order", "desc") # Default to 'desc'
page = int(request.args.get("page", 1)) # Default to 1
rows_per_page = int(request.args.get("rows", 10)) # Default to 10
# add .strip() to remove leading and trailing whitespaces
search_term = request.args.get(
"search", ""
).strip() # add search for filter documents
# Prepare query for filtering
query = {"user": user}
if search_term:
query["name"] = {
"$regex": search_term,
"$options": "i", # using case-insensitive search
}
total_documents = sources_collection.count_documents(query)
total_pages = max(1, math.ceil(total_documents / rows_per_page))
page = min(
max(1, page), total_pages
) # add this to make sure page inbound is within the range
sort_order = 1 if sort_order == "asc" else -1
skip = (page - 1) * rows_per_page
sort_field = request.args.get("sort", "date")
sort_order = request.args.get("order", "desc")
page = max(1, int(request.args.get("page", 1)))
rows_per_page = max(1, int(request.args.get("rows", 10)))
search_term = request.args.get("search", "").strip() or None
try:
documents = (
sources_collection.find(query)
.sort(sort_field, sort_order)
.skip(skip)
.limit(rows_per_page)
)
with db_readonly() as conn:
repo = SourcesRepository(conn)
total_documents = repo.count_for_user(
user, search_term=search_term,
)
# Prior in-Python implementation returned ``totalPages = 1``
# for empty result sets (``max(1, ceil(0/rows))``); we
# preserve that contract so the frontend pager stays stable.
total_pages = max(1, math.ceil(total_documents / rows_per_page))
effective_page = min(page, total_pages)
offset = (effective_page - 1) * rows_per_page
window = repo.list_for_user(
user,
limit=rows_per_page,
offset=offset,
search_term=search_term,
sort_field=sort_field,
sort_order=sort_order,
)
paginated_docs = []
for doc in documents:
for doc in window:
provider = _get_provider_from_remote_data(doc.get("remote_data"))
doc_data = {
"id": str(doc["_id"]),
"name": doc.get("name", ""),
"date": doc.get("date", ""),
"model": settings.EMBEDDINGS_NAME,
"location": "local",
"tokens": doc.get("tokens", ""),
"retriever": doc.get("retriever", "classic"),
"syncFrequency": doc.get("sync_frequency", ""),
"provider": provider,
"isNested": bool(doc.get("directory_structure")),
"type": doc.get("type", "file"),
}
paginated_docs.append(doc_data)
paginated_docs.append(
{
"id": str(doc["id"]),
"name": doc.get("name", ""),
"date": doc.get("date", ""),
"model": settings.EMBEDDINGS_NAME,
"location": "local",
"tokens": doc.get("tokens", ""),
"retriever": doc.get("retriever", "classic"),
"syncFrequency": doc.get("sync_frequency", ""),
"provider": provider,
"isNested": bool(doc.get("directory_structure")),
"type": doc.get("type", "file"),
}
)
response = {
"total": total_documents,
"totalPages": total_pages,
"currentPage": page,
"currentPage": effective_page,
"paginated": paginated_docs,
}
return make_response(jsonify(response), 200)
@@ -154,28 +155,6 @@ class PaginatedSources(Resource):
return make_response(jsonify({"success": False}), 400)
@sources_ns.route("/delete_by_ids")
class DeleteByIds(Resource):
@api.doc(
description="Deletes documents from the vector store by IDs",
params={"path": "Comma-separated list of IDs"},
)
def get(self):
ids = request.args.get("path")
if not ids:
return make_response(
jsonify({"success": False, "message": "Missing required fields"}), 400
)
try:
result = sources_collection.delete_index(ids=ids)
if result:
return make_response(jsonify({"success": True}), 200)
except Exception as err:
current_app.logger.error(f"Error deleting indexes: {err}", exc_info=True)
return make_response(jsonify({"success": False}), 400)
return make_response(jsonify({"success": False}), 400)
@sources_ns.route("/delete_old")
class DeleteOldIndexes(Resource):
@api.doc(
@@ -186,30 +165,33 @@ class DeleteOldIndexes(Resource):
decoded_token = request.decoded_token
if not decoded_token:
return make_response(jsonify({"success": False}), 401)
user = decoded_token.get("sub")
source_id = request.args.get("source_id")
if not source_id:
return make_response(
jsonify({"success": False, "message": "Missing required fields"}), 400
)
doc = sources_collection.find_one(
{"_id": ObjectId(source_id), "user": decoded_token.get("sub")}
)
try:
with db_readonly() as conn:
doc = SourcesRepository(conn).get_any(source_id, user)
except Exception as err:
current_app.logger.error(f"Error looking up source: {err}", exc_info=True)
return make_response(jsonify({"success": False}), 400)
if not doc:
return make_response(jsonify({"status": "not found"}), 404)
storage = StorageCreator.get_storage()
resolved_id = str(doc["id"])
try:
# Delete vector index
if settings.VECTOR_STORE == "faiss":
index_path = f"indexes/{str(doc['_id'])}"
index_path = f"indexes/{resolved_id}"
if storage.file_exists(f"{index_path}/index.faiss"):
storage.delete_file(f"{index_path}/index.faiss")
if storage.file_exists(f"{index_path}/index.pkl"):
storage.delete_file(f"{index_path}/index.pkl")
else:
vectorstore = VectorCreator.create_vectorstore(
settings.VECTOR_STORE, source_id=str(doc["_id"])
settings.VECTOR_STORE, source_id=resolved_id
)
vectorstore.delete_index()
if "file_path" in doc and doc["file_path"]:
@@ -227,7 +209,14 @@ class DeleteOldIndexes(Resource):
f"Error deleting files and indexes: {err}", exc_info=True
)
return make_response(jsonify({"success": False}), 400)
sources_collection.delete_one({"_id": ObjectId(source_id)})
try:
with db_session() as conn:
SourcesRepository(conn).delete(resolved_id, user)
except Exception as err:
current_app.logger.error(
f"Error deleting source row: {err}", exc_info=True
)
return make_response(jsonify({"success": False}), 400)
return make_response(jsonify({"success": True}), 200)
@@ -272,15 +261,16 @@ class ManageSync(Resource):
return make_response(
jsonify({"success": False, "message": "Invalid frequency"}), 400
)
update_data = {"$set": {"sync_frequency": sync_frequency}}
try:
sources_collection.update_one(
{
"_id": ObjectId(source_id),
"user": user,
},
update_data,
)
with db_session() as conn:
repo = SourcesRepository(conn)
doc = repo.get_any(source_id, user)
if doc is None:
return make_response(
jsonify({"success": False, "message": "Source not found"}),
404,
)
repo.update(str(doc["id"]), user, {"sync_frequency": sync_frequency})
except Exception as err:
current_app.logger.error(
f"Error updating sync frequency: {err}", exc_info=True
@@ -309,19 +299,20 @@ class SyncSource(Resource):
if missing_fields:
return missing_fields
source_id = data["source_id"]
if not ObjectId.is_valid(source_id):
try:
with db_readonly() as conn:
doc = SourcesRepository(conn).get_any(source_id, user)
except Exception as err:
current_app.logger.error(f"Error looking up source: {err}", exc_info=True)
return make_response(
jsonify({"success": False, "message": "Invalid source ID"}), 400
)
doc = sources_collection.find_one(
{"_id": ObjectId(source_id), "user": user}
)
if not doc:
return make_response(
jsonify({"success": False, "message": "Source not found"}), 404
)
source_type = doc.get("type", "")
if source_type.startswith("connector"):
if source_type and source_type.startswith("connector"):
return make_response(
jsonify(
{
@@ -344,7 +335,7 @@ class SyncSource(Resource):
loader=source_type,
sync_frequency=doc.get("sync_frequency", "never"),
retriever=doc.get("retriever", "classic"),
doc_id=source_id,
doc_id=str(doc["id"]),
)
except Exception as err:
current_app.logger.error(
@@ -370,10 +361,9 @@ class DirectoryStructure(Resource):
if not doc_id:
return make_response(jsonify({"error": "Document ID is required"}), 400)
if not ObjectId.is_valid(doc_id):
return make_response(jsonify({"error": "Invalid document ID"}), 400)
try:
doc = sources_collection.find_one({"_id": ObjectId(doc_id), "user": user})
with db_readonly() as conn:
doc = SourcesRepository(conn).get_any(doc_id, user)
if not doc:
return make_response(
jsonify({"error": "Document not found or access denied"}), 404
@@ -387,6 +377,8 @@ class DirectoryStructure(Resource):
if isinstance(remote_data, str) and remote_data:
remote_data_obj = json.loads(remote_data)
provider = remote_data_obj.get("provider")
elif isinstance(remote_data, dict):
provider = remote_data.get("provider")
except Exception as e:
current_app.logger.warning(
f"Failed to parse remote_data for doc {doc_id}: {e}"
@@ -406,4 +398,7 @@ class DirectoryStructure(Resource):
current_app.logger.error(
f"Error retrieving directory structure: {e}", exc_info=True
)
return make_response(jsonify({"success": False, "error": "Failed to retrieve directory structure"}), 500)
return make_response(
jsonify({"success": False, "error": "Failed to retrieve directory structure"}),
500,
)

View File

@@ -5,16 +5,16 @@ import os
import tempfile
import zipfile
from bson.objectid import ObjectId
from flask import current_app, jsonify, make_response, request
from flask_restx import fields, Namespace, Resource
from application.api import api
from application.api.user.base import sources_collection
from application.api.user.tasks import ingest, ingest_connector_task, ingest_remote
from application.core.settings import settings
from application.parser.connectors.connector_creator import ConnectorCreator
from application.parser.file.constants import SUPPORTED_SOURCE_EXTENSIONS
from application.storage.db.repositories.sources import SourcesRepository
from application.storage.db.session import db_readonly, db_session
from application.storage.storage_creator import StorageCreator
from application.stt.upload_limits import (
AudioFileTooLargeError,
@@ -329,15 +329,8 @@ class ManageSourceFiles(Resource):
400,
)
try:
ObjectId(source_id)
except Exception:
return make_response(
jsonify({"success": False, "message": "Invalid source ID format"}), 400
)
try:
source = sources_collection.find_one(
{"_id": ObjectId(source_id), "user": user}
)
with db_readonly() as conn:
source = SourcesRepository(conn).get_any(source_id, user)
if not source:
return make_response(
jsonify(
@@ -353,6 +346,7 @@ class ManageSourceFiles(Resource):
return make_response(
jsonify({"success": False, "message": "Database error"}), 500
)
resolved_source_id = str(source["id"])
try:
storage = StorageCreator.get_storage()
source_file_path = source.get("file_path", "")
@@ -411,15 +405,18 @@ class ManageSourceFiles(Resource):
map_updated = True
if map_updated:
sources_collection.update_one(
{"_id": ObjectId(source_id)},
{"$set": {"file_name_map": file_name_map}},
)
with db_session() as conn:
SourcesRepository(conn).update(
resolved_source_id, user,
{"file_name_map": dict(file_name_map)},
)
# Trigger re-ingestion pipeline
from application.api.user.tasks import reingest_source_task
task = reingest_source_task.delay(source_id=source_id, user=user)
task = reingest_source_task.delay(
source_id=resolved_source_id, user=user
)
return make_response(
jsonify(
@@ -463,6 +460,16 @@ class ManageSourceFiles(Resource):
removed_files = []
map_updated = False
for file_path in file_paths:
if ".." in str(file_path) or str(file_path).startswith("/"):
return make_response(
jsonify(
{
"success": False,
"message": "Invalid file path",
}
),
400,
)
full_path = f"{source_file_path}/{file_path}"
# Remove from storage
@@ -475,15 +482,18 @@ class ManageSourceFiles(Resource):
map_updated = True
if map_updated and isinstance(file_name_map, dict):
sources_collection.update_one(
{"_id": ObjectId(source_id)},
{"$set": {"file_name_map": file_name_map}},
)
with db_session() as conn:
SourcesRepository(conn).update(
resolved_source_id, user,
{"file_name_map": dict(file_name_map)},
)
# Trigger re-ingestion pipeline
from application.api.user.tasks import reingest_source_task
task = reingest_source_task.delay(source_id=source_id, user=user)
task = reingest_source_task.delay(
source_id=resolved_source_id, user=user
)
return make_response(
jsonify(
@@ -571,16 +581,19 @@ class ManageSourceFiles(Resource):
if keys_to_remove:
for key in keys_to_remove:
file_name_map.pop(key, None)
sources_collection.update_one(
{"_id": ObjectId(source_id)},
{"$set": {"file_name_map": file_name_map}},
)
with db_session() as conn:
SourcesRepository(conn).update(
resolved_source_id, user,
{"file_name_map": dict(file_name_map)},
)
# Trigger re-ingestion pipeline
from application.api.user.tasks import reingest_source_task
task = reingest_source_task.delay(source_id=source_id, user=user)
task = reingest_source_task.delay(
source_id=resolved_source_id, user=user
)
return make_response(
jsonify(

View File

@@ -134,6 +134,12 @@ def setup_periodic_tasks(sender, **kwargs):
timedelta(days=30),
schedule_syncs.s("monthly"),
)
# Replaces Mongo's TTL index on pending_tool_state.expires_at.
sender.add_periodic_task(
timedelta(seconds=60),
cleanup_pending_tool_state.s(),
name="cleanup-pending-tool-state",
)
@celery.task(bind=True)
@@ -146,3 +152,27 @@ def mcp_oauth_task(self, config, user):
def mcp_oauth_status_task(self, task_id):
resp = mcp_oauth_status(self, task_id)
return resp
@celery.task(bind=True)
def cleanup_pending_tool_state(self):
"""Delete pending_tool_state rows past their TTL.
Replaces Mongo's ``expireAfterSeconds=0`` TTL index — Postgres has
no native TTL, so this task runs every 60 seconds to keep
``pending_tool_state`` bounded. No-ops if ``POSTGRES_URI`` isn't
configured (keeps the task runnable in Mongo-only environments).
"""
from application.core.settings import settings
if not settings.POSTGRES_URI:
return {"deleted": 0, "skipped": "POSTGRES_URI not set"}
from application.storage.db.engine import get_engine
from application.storage.db.repositories.pending_tool_state import (
PendingToolStateRepository,
)
engine = get_engine()
with engine.begin() as conn:
deleted = PendingToolStateRepository(conn).cleanup_expired()
return {"deleted": deleted}

View File

@@ -3,26 +3,24 @@
import json
from urllib.parse import urlencode, urlparse
from bson.objectid import ObjectId
from flask import current_app, jsonify, make_response, redirect, request
from flask_restx import Namespace, Resource, fields
from application.agents.tools.mcp_tool import MCPOAuthManager, MCPTool
from application.api import api
from application.api.user.base import user_tools_collection
from application.api.user.tools.routes import transform_actions
from application.cache import get_redis_instance
from application.core.mongo_db import MongoDB
from application.core.settings import settings
from application.core.url_validation import SSRFError, validate_url
from application.security.encryption import decrypt_credentials, encrypt_credentials
from application.storage.db.repositories.connector_sessions import (
ConnectorSessionsRepository,
)
from application.storage.db.repositories.user_tools import UserToolsRepository
from application.storage.db.session import db_readonly, db_session
from application.utils import check_required_fields
tools_mcp_ns = Namespace("tools", description="Tool management operations", path="/api")
_mongo = MongoDB.get_client()
_db = _mongo[settings.MONGO_DB_NAME]
_connector_sessions = _db["connector_sessions"]
_ALLOWED_TRANSPORTS = {"auto", "sse", "http"}
@@ -63,6 +61,21 @@ def _extract_auth_credentials(config):
return auth_credentials
def _validate_mcp_server_url(config: dict) -> None:
"""Validate the server_url in an MCP config to prevent SSRF.
Raises:
ValueError: If the URL is missing or points to a blocked address.
"""
server_url = (config.get("server_url") or "").strip()
if not server_url:
raise ValueError("server_url is required")
try:
validate_url(server_url)
except SSRFError as exc:
raise ValueError(f"Invalid server URL: {exc}") from exc
@tools_mcp_ns.route("/mcp_server/test")
class TestMCPServerConfig(Resource):
@api.expect(
@@ -97,6 +110,8 @@ class TestMCPServerConfig(Resource):
400,
)
_validate_mcp_server_url(config)
auth_credentials = _extract_auth_credentials(config)
test_config = config.copy()
test_config["auth_credentials"] = auth_credentials
@@ -105,15 +120,41 @@ class TestMCPServerConfig(Resource):
result = mcp_tool.test_connection()
if result.get("requires_oauth"):
return make_response(jsonify(result), 200)
safe_result = {
k: v
for k, v in result.items()
if k in ("success", "requires_oauth", "auth_url")
}
return make_response(jsonify(safe_result), 200)
if not result.get("success") and "message" in result:
if not result.get("success"):
current_app.logger.error(
f"MCP connection test failed: {result.get('message')}"
)
result["message"] = "Connection test failed"
return make_response(
jsonify(
{
"success": False,
"message": "Connection test failed",
"tools_count": 0,
}
),
200,
)
return make_response(jsonify(result), 200)
safe_result = {
"success": True,
"message": result.get("message", "Connection successful"),
"tools_count": result.get("tools_count", 0),
"tools": result.get("tools", []),
}
return make_response(jsonify(safe_result), 200)
except ValueError as e:
current_app.logger.warning(f"Invalid MCP server test request: {e}")
return make_response(
jsonify({"success": False, "error": "Invalid MCP server configuration"}),
400,
)
except Exception as e:
current_app.logger.error(f"Error testing MCP server: {e}", exc_info=True)
return make_response(
@@ -165,6 +206,8 @@ class MCPServerSave(Resource):
400,
)
_validate_mcp_server_url(config)
auth_credentials = _extract_auth_credentials(config)
auth_type = config.get("auth_type", "none")
mcp_config = config.copy()
@@ -206,15 +249,18 @@ class MCPServerSave(Resource):
storage_config = config.copy()
tool_id = data.get("id")
existing_doc = None
existing_encrypted = None
if tool_id:
existing_doc = user_tools_collection.find_one(
{"_id": ObjectId(tool_id), "user": user, "name": "mcp_tool"}
)
if existing_doc:
existing_encrypted = existing_doc.get("config", {}).get(
with db_readonly() as conn:
repo = UserToolsRepository(conn)
existing_doc = repo.get_any(tool_id, user)
if existing_doc and existing_doc.get("name") == "mcp_tool":
existing_encrypted = (existing_doc.get("config") or {}).get(
"encrypted_credentials"
)
else:
existing_doc = None
if auth_credentials:
if existing_encrypted:
@@ -237,48 +283,95 @@ class MCPServerSave(Resource):
]:
storage_config.pop(field, None)
transformed_actions = transform_actions(actions_metadata)
tool_data = {
"name": "mcp_tool",
"displayName": data["displayName"],
"customName": data["displayName"],
"description": f"MCP Server: {storage_config.get('server_url', 'Unknown')}",
"config": storage_config,
"actions": transformed_actions,
"status": data.get("status", True),
"user": user,
}
if tool_id:
result = user_tools_collection.update_one(
{"_id": ObjectId(tool_id), "user": user, "name": "mcp_tool"},
{"$set": {k: v for k, v in tool_data.items() if k != "user"}},
)
if result.matched_count == 0:
return make_response(
jsonify(
{
"success": False,
"error": "Tool not found or access denied",
}
),
404,
display_name = data["displayName"]
description = f"MCP Server: {storage_config.get('server_url', 'Unknown')}"
status_bool = bool(data.get("status", True))
with db_session() as conn:
repo = UserToolsRepository(conn)
if existing_doc:
repo.update(
str(existing_doc["id"]), user,
{
"display_name": display_name,
"custom_name": display_name,
"description": description,
"config": storage_config,
"actions": transformed_actions,
"status": status_bool,
},
)
response_data = {
"success": True,
"id": tool_id,
"message": f"MCP server updated successfully! Discovered {len(transformed_actions)} tools.",
"tools_count": len(transformed_actions),
}
else:
result = user_tools_collection.insert_one(tool_data)
tool_id = str(result.inserted_id)
response_data = {
"success": True,
"id": tool_id,
"message": f"MCP server created successfully! Discovered {len(transformed_actions)} tools.",
"tools_count": len(transformed_actions),
}
saved_id = str(existing_doc["id"])
response_data = {
"success": True,
"id": saved_id,
"message": f"MCP server updated successfully! Discovered {len(transformed_actions)} tools.",
"tools_count": len(transformed_actions),
}
else:
# Fall back to find_by_user_and_name — the original
# dual-write path also ran an existence check before
# deciding between insert and update.
existing_by_name = repo.find_by_user_and_name(user, "mcp_tool")
if tool_id is None and existing_by_name and (
(existing_by_name.get("config") or {}).get("server_url")
== storage_config.get("server_url")
):
repo.update(
str(existing_by_name["id"]), user,
{
"display_name": display_name,
"custom_name": display_name,
"description": description,
"config": storage_config,
"actions": transformed_actions,
"status": status_bool,
},
)
saved_id = str(existing_by_name["id"])
response_data = {
"success": True,
"id": saved_id,
"message": f"MCP server updated successfully! Discovered {len(transformed_actions)} tools.",
"tools_count": len(transformed_actions),
}
else:
created = repo.create(
user, "mcp_tool",
config=storage_config,
custom_name=display_name,
display_name=display_name,
description=description,
config_requirements={},
actions=transformed_actions,
status=status_bool,
)
saved_id = str(created["id"])
response_data = {
"success": True,
"id": saved_id,
"message": f"MCP server created successfully! Discovered {len(transformed_actions)} tools.",
"tools_count": len(transformed_actions),
}
if tool_id and existing_doc is None:
# Client requested update on a non-existent tool id.
return make_response(
jsonify(
{
"success": False,
"error": "Tool not found or access denied",
}
),
404,
)
return make_response(jsonify(response_data), 200)
except ValueError as e:
current_app.logger.warning(f"Invalid MCP server save request: {e}")
return make_response(
jsonify({"success": False, "error": "Invalid MCP server configuration"}),
400,
)
except Exception as e:
current_app.logger.error(f"Error saving MCP server: {e}", exc_info=True)
return make_response(
@@ -407,49 +500,59 @@ class MCPAuthStatus(Resource):
return make_response(jsonify({"success": False}), 401)
user = decoded_token.get("sub")
try:
mcp_tools = list(
user_tools_collection.find(
{"user": user, "name": "mcp_tool"},
{"_id": 1, "config": 1},
)
)
if not mcp_tools:
return make_response(jsonify({"success": True, "statuses": {}}), 200)
oauth_server_urls = {}
statuses = {}
for tool in mcp_tools:
tool_id = str(tool["_id"])
config = tool.get("config", {})
auth_type = config.get("auth_type", "none")
if auth_type == "oauth":
server_url = config.get("server_url", "")
if server_url:
parsed = urlparse(server_url)
base_url = f"{parsed.scheme}://{parsed.netloc}"
oauth_server_urls[tool_id] = base_url
else:
statuses[tool_id] = "needs_auth"
else:
statuses[tool_id] = "configured"
if oauth_server_urls:
unique_urls = list(set(oauth_server_urls.values()))
sessions = list(
_connector_sessions.find(
{"user_id": user, "server_url": {"$in": unique_urls}},
{"server_url": 1, "tokens": 1},
with db_readonly() as conn:
tools_repo = UserToolsRepository(conn)
sessions_repo = ConnectorSessionsRepository(conn)
all_tools = tools_repo.list_for_user(user)
mcp_tools = [t for t in all_tools if t.get("name") == "mcp_tool"]
if not mcp_tools:
return make_response(
jsonify({"success": True, "statuses": {}}), 200
)
)
url_has_tokens = {
doc["server_url"]: bool(doc.get("tokens", {}).get("access_token"))
for doc in sessions
}
for tool_id, base_url in oauth_server_urls.items():
if url_has_tokens.get(base_url):
statuses[tool_id] = "connected"
oauth_server_urls: dict = {}
statuses: dict = {}
for tool in mcp_tools:
tool_id = str(tool["id"])
config = tool.get("config") or {}
auth_type = config.get("auth_type", "none")
if auth_type == "oauth":
server_url = config.get("server_url", "")
if server_url:
parsed = urlparse(server_url)
base_url = f"{parsed.scheme}://{parsed.netloc}"
oauth_server_urls[tool_id] = base_url
else:
statuses[tool_id] = "needs_auth"
else:
statuses[tool_id] = "needs_auth"
statuses[tool_id] = "configured"
if oauth_server_urls:
# Look up a session per distinct base URL. MCP sessions
# are stored with ``provider = "mcp:<server_url>"``
# and the URL in ``server_url``; reuse the repo's
# per-URL accessor rather than an ad-hoc $in query.
url_has_tokens: dict = {}
for base_url in set(oauth_server_urls.values()):
session = sessions_repo.get_by_user_and_server_url(
user, base_url,
)
tokens = (
(session or {}).get("session_data", {}) or {}
).get("tokens", {}) or {}
# MCP code also stashes tokens into token_info on
# the row; consider either present as "connected".
token_info = (session or {}).get("token_info") or {}
url_has_tokens[base_url] = bool(
tokens.get("access_token")
or token_info.get("access_token")
)
for tool_id, base_url in oauth_server_urls.items():
if url_has_tokens.get(base_url):
statuses[tool_id] = "connected"
else:
statuses[tool_id] = "needs_auth"
return make_response(jsonify({"success": True, "statuses": statuses}), 200)
except Exception as e:

View File

@@ -1,20 +1,59 @@
"""Tool management routes."""
from bson.objectid import ObjectId
from flask import current_app, jsonify, make_response, request
from flask_restx import fields, Namespace, Resource
from application.agents.tools.spec_parser import parse_spec
from application.agents.tools.tool_manager import ToolManager
from application.api import api
from application.api.user.base import user_tools_collection
from application.core.url_validation import SSRFError, validate_url
from application.security.encryption import decrypt_credentials, encrypt_credentials
from application.storage.db.repositories.notes import NotesRepository
from application.storage.db.repositories.todos import TodosRepository
from application.storage.db.repositories.user_tools import UserToolsRepository
from application.storage.db.session import db_readonly, db_session
from application.utils import check_required_fields, validate_function_name
tool_config = {}
tool_manager = ToolManager(config=tool_config)
# ---------------------------------------------------------------------------
# Shape translation helpers
# ---------------------------------------------------------------------------
# The frontend speaks camelCase (``displayName`` / ``customName`` /
# ``configRequirements``). The PG ``user_tools`` table stores snake_case
# (``display_name`` / ``custom_name`` / ``config_requirements``). Keep the
# translation localized to this module so repositories stay pure.
_CAMEL_TO_SNAKE = {
"displayName": "display_name",
"customName": "custom_name",
"configRequirements": "config_requirements",
}
_SNAKE_TO_CAMEL = {v: k for k, v in _CAMEL_TO_SNAKE.items()}
def _row_to_api(row: dict) -> dict:
"""Rename DB-native snake_case keys to the camelCase shape the frontend expects."""
out = dict(row)
for snake, camel in _SNAKE_TO_CAMEL.items():
if snake in out:
out[camel] = out.pop(snake)
# ``user_id`` is exposed as ``user`` in the legacy API shape.
if "user_id" in out:
out["user"] = out.pop("user_id")
return out
def _api_to_update_fields(data: dict) -> dict:
"""Rename incoming camelCase update keys to the repo's snake_case columns."""
fields_out: dict = {}
for key, value in data.items():
fields_out[_CAMEL_TO_SNAKE.get(key, key)] = value
return fields_out
def _encrypt_secret_fields(config, config_requirements, user_id):
secret_keys = [
key for key, spec in config_requirements.items()
@@ -130,6 +169,8 @@ tools_ns = Namespace("tools", description="Tool management operations", path="/a
class AvailableTools(Resource):
@api.doc(description="Get available tools for a user")
def get(self):
if not request.decoded_token:
return make_response(jsonify({"success": False}), 401)
try:
tools_metadata = []
for tool_name, tool_instance in tool_manager.tools.items():
@@ -165,12 +206,11 @@ class GetTools(Resource):
if not decoded_token:
return make_response(jsonify({"success": False}), 401)
user = decoded_token.get("sub")
tools = user_tools_collection.find({"user": user})
with db_readonly() as conn:
rows = UserToolsRepository(conn).list_for_user(user)
user_tools = []
for tool in tools:
tool_copy = {**tool}
tool_copy["id"] = str(tool["_id"])
tool_copy.pop("_id", None)
for row in rows:
tool_copy = _row_to_api(row)
config_req = tool_copy.get("configRequirements", {})
if not config_req:
@@ -236,6 +276,16 @@ class CreateTool(Resource):
if missing_fields:
return missing_fields
try:
if data["name"] == "mcp_tool":
server_url = (data.get("config", {}).get("server_url") or "").strip()
if server_url:
try:
validate_url(server_url)
except SSRFError:
return make_response(
jsonify({"success": False, "message": "Invalid server URL"}),
400,
)
tool_instance = tool_manager.tools.get(data["name"])
if not tool_instance:
return make_response(
@@ -268,19 +318,19 @@ class CreateTool(Resource):
storage_config = _encrypt_secret_fields(
data["config"], config_requirements, user
)
new_tool = {
"user": user,
"name": data["name"],
"displayName": data["displayName"],
"description": data["description"],
"customName": data.get("customName", ""),
"actions": transformed_actions,
"config": storage_config,
"configRequirements": config_requirements,
"status": data["status"],
}
resp = user_tools_collection.insert_one(new_tool)
new_id = str(resp.inserted_id)
with db_session() as conn:
created = UserToolsRepository(conn).create(
user,
data["name"],
config=storage_config,
custom_name=data.get("customName", ""),
display_name=data["displayName"],
description=data["description"],
config_requirements=config_requirements,
actions=transformed_actions,
status=bool(data.get("status", True)),
)
new_id = str(created["id"])
except Exception as err:
current_app.logger.error(f"Error creating tool: {err}", exc_info=True)
return make_response(jsonify({"success": False}), 400)
@@ -318,17 +368,10 @@ class UpdateTool(Resource):
if missing_fields:
return missing_fields
try:
update_data = {}
if "name" in data:
update_data["name"] = data["name"]
if "displayName" in data:
update_data["displayName"] = data["displayName"]
if "customName" in data:
update_data["customName"] = data["customName"]
if "description" in data:
update_data["description"] = data["description"]
if "actions" in data:
update_data["actions"] = data["actions"]
update_data: dict = {}
for key in ("name", "displayName", "customName", "description", "actions"):
if key in data:
update_data[key] = data[key]
if "config" in data:
if "actions" in data["config"]:
for action_name in list(data["config"]["actions"].keys()):
@@ -343,46 +386,61 @@ class UpdateTool(Resource):
),
400,
)
tool_doc = user_tools_collection.find_one(
{"_id": ObjectId(data["id"]), "user": user}
)
if not tool_doc:
return make_response(
jsonify({"success": False, "message": "Tool not found"}),
404,
)
tool_name = tool_doc.get("name", data.get("name"))
tool_instance = tool_manager.tools.get(tool_name)
config_requirements = (
tool_instance.get_config_requirements() if tool_instance else {}
)
existing_config = tool_doc.get("config", {})
has_existing_secrets = "encrypted_credentials" in existing_config
if config_requirements:
validation_errors = _validate_config(
data["config"], config_requirements,
has_existing_secrets=has_existing_secrets,
)
if validation_errors:
with db_session() as conn:
repo = UserToolsRepository(conn)
tool_doc = repo.get_any(data["id"], user)
if not tool_doc:
return make_response(
jsonify({
"success": False,
"message": "Validation failed",
"errors": validation_errors,
}),
400,
jsonify({"success": False, "message": "Tool not found"}),
404,
)
tool_name = tool_doc.get("name", data.get("name"))
tool_instance = tool_manager.tools.get(tool_name)
config_requirements = (
tool_instance.get_config_requirements()
if tool_instance
else {}
)
existing_config = tool_doc.get("config", {}) or {}
has_existing_secrets = "encrypted_credentials" in existing_config
update_data["config"] = _merge_secrets_on_update(
data["config"], existing_config, config_requirements, user
)
if "status" in data:
update_data["status"] = data["status"]
user_tools_collection.update_one(
{"_id": ObjectId(data["id"]), "user": user},
{"$set": update_data},
)
if config_requirements:
validation_errors = _validate_config(
data["config"], config_requirements,
has_existing_secrets=has_existing_secrets,
)
if validation_errors:
return make_response(
jsonify({
"success": False,
"message": "Validation failed",
"errors": validation_errors,
}),
400,
)
update_data["config"] = _merge_secrets_on_update(
data["config"], existing_config, config_requirements, user
)
if "status" in data:
update_data["status"] = bool(data["status"])
repo.update(
str(tool_doc["id"]), user, _api_to_update_fields(update_data),
)
else:
if "status" in data:
update_data["status"] = bool(data["status"])
with db_session() as conn:
repo = UserToolsRepository(conn)
tool_doc = repo.get_any(data["id"], user)
if not tool_doc:
return make_response(
jsonify({"success": False, "message": "Tool not found"}),
404,
)
repo.update(
str(tool_doc["id"]), user, _api_to_update_fields(update_data),
)
except Exception as err:
current_app.logger.error(f"Error updating tool: {err}", exc_info=True)
return make_response(jsonify({"success": False}), 400)
@@ -414,43 +472,50 @@ class UpdateToolConfig(Resource):
if missing_fields:
return missing_fields
try:
tool_doc = user_tools_collection.find_one(
{"_id": ObjectId(data["id"]), "user": user}
)
if not tool_doc:
return make_response(jsonify({"success": False}), 404)
with db_session() as conn:
repo = UserToolsRepository(conn)
tool_doc = repo.get_any(data["id"], user)
if not tool_doc:
return make_response(jsonify({"success": False}), 404)
tool_name = tool_doc.get("name")
tool_instance = tool_manager.tools.get(tool_name)
config_requirements = (
tool_instance.get_config_requirements() if tool_instance else {}
)
existing_config = tool_doc.get("config", {})
has_existing_secrets = "encrypted_credentials" in existing_config
if config_requirements:
validation_errors = _validate_config(
data["config"], config_requirements,
has_existing_secrets=has_existing_secrets,
tool_name = tool_doc.get("name")
if tool_name == "mcp_tool":
server_url = (data["config"].get("server_url") or "").strip()
if server_url:
try:
validate_url(server_url)
except SSRFError:
return make_response(
jsonify({"success": False, "message": "Invalid server URL"}),
400,
)
tool_instance = tool_manager.tools.get(tool_name)
config_requirements = (
tool_instance.get_config_requirements() if tool_instance else {}
)
if validation_errors:
return make_response(
jsonify({
"success": False,
"message": "Validation failed",
"errors": validation_errors,
}),
400,
existing_config = tool_doc.get("config", {}) or {}
has_existing_secrets = "encrypted_credentials" in existing_config
if config_requirements:
validation_errors = _validate_config(
data["config"], config_requirements,
has_existing_secrets=has_existing_secrets,
)
if validation_errors:
return make_response(
jsonify({
"success": False,
"message": "Validation failed",
"errors": validation_errors,
}),
400,
)
final_config = _merge_secrets_on_update(
data["config"], existing_config, config_requirements, user
)
final_config = _merge_secrets_on_update(
data["config"], existing_config, config_requirements, user
)
user_tools_collection.update_one(
{"_id": ObjectId(data["id"]), "user": user},
{"$set": {"config": final_config}},
)
repo.update(str(tool_doc["id"]), user, {"config": final_config})
except Exception as err:
current_app.logger.error(
f"Error updating tool config: {err}", exc_info=True
@@ -486,10 +551,17 @@ class UpdateToolActions(Resource):
if missing_fields:
return missing_fields
try:
user_tools_collection.update_one(
{"_id": ObjectId(data["id"]), "user": user},
{"$set": {"actions": data["actions"]}},
)
with db_session() as conn:
repo = UserToolsRepository(conn)
tool_doc = repo.get_any(data["id"], user)
if not tool_doc:
return make_response(
jsonify({"success": False, "message": "Tool not found"}),
404,
)
repo.update(
str(tool_doc["id"]), user, {"actions": data["actions"]},
)
except Exception as err:
current_app.logger.error(
f"Error updating tool actions: {err}", exc_info=True
@@ -523,10 +595,17 @@ class UpdateToolStatus(Resource):
if missing_fields:
return missing_fields
try:
user_tools_collection.update_one(
{"_id": ObjectId(data["id"]), "user": user},
{"$set": {"status": data["status"]}},
)
with db_session() as conn:
repo = UserToolsRepository(conn)
tool_doc = repo.get_any(data["id"], user)
if not tool_doc:
return make_response(
jsonify({"success": False, "message": "Tool not found"}),
404,
)
repo.update(
str(tool_doc["id"]), user, {"status": bool(data["status"])},
)
except Exception as err:
current_app.logger.error(
f"Error updating tool status: {err}", exc_info=True
@@ -555,13 +634,14 @@ class DeleteTool(Resource):
if missing_fields:
return missing_fields
try:
result = user_tools_collection.delete_one(
{"_id": ObjectId(data["id"]), "user": user}
)
if result.deleted_count == 0:
return make_response(
jsonify({"success": False, "message": "Tool not found"}), 404
)
with db_session() as conn:
repo = UserToolsRepository(conn)
tool_doc = repo.get_any(data["id"], user)
if not tool_doc:
return make_response(
jsonify({"success": False, "message": "Tool not found"}), 404
)
repo.delete(str(tool_doc["id"]), user)
except Exception as err:
current_app.logger.error(f"Error deleting tool: {err}", exc_info=True)
return make_response(jsonify({"success": False}), 400)
@@ -630,70 +710,88 @@ class GetArtifact(Resource):
user_id = decoded_token.get("sub")
try:
obj_id = ObjectId(artifact_id)
except Exception:
return make_response(
jsonify({"success": False, "message": "Invalid artifact ID"}), 400
with db_readonly() as conn:
notes_repo = NotesRepository(conn)
todos_repo = TodosRepository(conn)
# Artifact IDs may be PG UUIDs (post-cutover) or legacy
# Mongo ObjectIds embedded in older conversation history.
# Both repos' ``get_any`` handles the id-shape branching
# internally so a non-UUID input never reaches
# ``CAST(:id AS uuid)`` (which would poison the readonly
# transaction and break the fallback below).
note_doc = notes_repo.get_any(artifact_id, user_id)
if note_doc:
content = note_doc.get("note", "") or note_doc.get("content", "")
line_count = len(content.split("\n")) if content else 0
updated = note_doc.get("updated_at")
artifact = {
"artifact_type": "note",
"data": {
"content": content,
"line_count": line_count,
"updated_at": (
updated.isoformat()
if hasattr(updated, "isoformat")
else updated
),
},
}
return make_response(
jsonify({"success": True, "artifact": artifact}), 200
)
todo_doc = todos_repo.get_any(artifact_id, user_id)
if todo_doc:
tool_id = todo_doc.get("tool_id")
all_todos = todos_repo.list_for_tool(user_id, tool_id) if tool_id else []
items = []
open_count = 0
completed_count = 0
for t in all_todos:
# PG ``todos`` stores a ``completed BOOLEAN`` column;
# the legacy Mongo shape used a ``status`` string.
# Keep the response shape stable by translating here.
status = "completed" if t.get("completed") else "open"
if status == "open":
open_count += 1
else:
completed_count += 1
created = t.get("created_at")
updated = t.get("updated_at")
items.append({
"todo_id": t.get("todo_id"),
"title": t.get("title", ""),
"status": status,
"created_at": (
created.isoformat()
if hasattr(created, "isoformat")
else created
),
"updated_at": (
updated.isoformat()
if hasattr(updated, "isoformat")
else updated
),
})
artifact = {
"artifact_type": "todo_list",
"data": {
"items": items,
"total_count": len(items),
"open_count": open_count,
"completed_count": completed_count,
},
}
return make_response(
jsonify({"success": True, "artifact": artifact}), 200
)
except Exception as err:
current_app.logger.error(
f"Error retrieving artifact: {err}", exc_info=True
)
from application.core.mongo_db import MongoDB
from application.core.settings import settings
db = MongoDB.get_client()[settings.MONGO_DB_NAME]
note_doc = db["notes"].find_one({"_id": obj_id, "user_id": user_id})
if note_doc:
content = note_doc.get("note", "")
line_count = len(content.split("\n")) if content else 0
artifact = {
"artifact_type": "note",
"data": {
"content": content,
"line_count": line_count,
"updated_at": (
note_doc["updated_at"].isoformat()
if note_doc.get("updated_at")
else None
),
},
}
return make_response(jsonify({"success": True, "artifact": artifact}), 200)
todo_doc = db["todos"].find_one({"_id": obj_id, "user_id": user_id})
if todo_doc:
tool_id = todo_doc.get("tool_id")
query = {"user_id": user_id, "tool_id": tool_id}
all_todos = list(db["todos"].find(query))
items = []
open_count = 0
completed_count = 0
for t in all_todos:
status = t.get("status", "open")
if status == "open":
open_count += 1
elif status == "completed":
completed_count += 1
items.append({
"todo_id": t.get("todo_id"),
"title": t.get("title", ""),
"status": status,
"created_at": (
t["created_at"].isoformat() if t.get("created_at") else None
),
"updated_at": (
t["updated_at"].isoformat() if t.get("updated_at") else None
),
})
artifact = {
"artifact_type": "todo_list",
"data": {
"items": items,
"total_count": len(items),
"open_count": open_count,
"completed_count": completed_count,
},
}
return make_response(jsonify({"success": True, "artifact": artifact}), 200)
return make_response(jsonify({"success": False}), 400)
return make_response(
jsonify({"success": False, "message": "Artifact not found"}), 404

View File

@@ -1,290 +1,61 @@
"""Centralized utilities for API routes."""
"""Centralized utilities for API routes.
Post-Mongo-cutover slim: the old Mongo-shaped helpers (``validate_object_id``,
``check_resource_ownership``, ``paginated_response``, ``serialize_object_id``,
``safe_db_operation``, ``validate_enum``, ``extract_sort_params``) have been
removed — they carried ``bson`` / ``pymongo`` imports and had zero callers.
"""
from functools import wraps
from typing import Any, Callable, Dict, List, Optional, Tuple
from typing import Callable, Optional
from bson.errors import InvalidId
from bson.objectid import ObjectId
from flask import (
Response,
current_app,
has_app_context,
jsonify,
make_response,
request,
)
from pymongo.collection import Collection
def get_user_id() -> Optional[str]:
"""
Extract user ID from decoded JWT token.
Returns:
User ID string or None if not authenticated
"""
"""Extract user ID from decoded JWT token, or None if unauthenticated."""
decoded_token = getattr(request, "decoded_token", None)
return decoded_token.get("sub") if decoded_token else None
def require_auth(func: Callable) -> Callable:
"""
Decorator to require authentication for route handlers.
Usage:
@require_auth
def get(self):
user_id = get_user_id()
...
"""
"""Decorator to require authentication. Returns 401 when absent."""
@wraps(func)
def wrapper(*args, **kwargs):
user_id = get_user_id()
if not user_id:
return error_response("Unauthorized", 401)
return make_response(jsonify({"success": False, "error": "Unauthorized"}), 401)
return func(*args, **kwargs)
return wrapper
def success_response(
data: Optional[Dict[str, Any]] = None, status: int = 200
data=None, message: Optional[str] = None, status: int = 200
) -> Response:
"""
Create a standardized success response.
Args:
data: Optional data dictionary to include in response
status: HTTP status code (default: 200)
Returns:
Flask Response object
Example:
return success_response({"users": [...], "total": 10})
"""
response = {"success": True}
if data:
response.update(data)
return make_response(jsonify(response), status)
"""Shape a successful JSON response."""
body = {"success": True}
if data is not None:
body["data"] = data
if message is not None:
body["message"] = message
return make_response(jsonify(body), status)
def error_response(message: str, status: int = 400, **kwargs) -> Response:
"""
Create a standardized error response.
Args:
message: Error message string
status: HTTP status code (default: 400)
**kwargs: Additional fields to include in response
Returns:
Flask Response object
Example:
return error_response("Resource not found", 404)
return error_response("Invalid input", 400, errors=["field1", "field2"])
"""
response = {"success": False, "message": message}
response.update(kwargs)
return make_response(jsonify(response), status)
"""Shape an error JSON response; any kwargs are merged into the body."""
body = {"success": False, "error": message, **kwargs}
return make_response(jsonify(body), status)
def validate_object_id(
id_string: str, resource_name: str = "Resource"
) -> Tuple[Optional[ObjectId], Optional[Response]]:
"""
Validate and convert string to ObjectId.
Args:
id_string: String to convert
resource_name: Name of resource for error message
Returns:
Tuple of (ObjectId or None, error_response or None)
Example:
obj_id, error = validate_object_id(workflow_id, "Workflow")
if error:
return error
"""
try:
return ObjectId(id_string), None
except (InvalidId, TypeError):
return None, error_response(f"Invalid {resource_name} ID format")
def validate_pagination(
default_limit: int = 20, max_limit: int = 100
) -> Tuple[int, int, Optional[Response]]:
"""
Extract and validate pagination parameters from request.
Args:
default_limit: Default items per page
max_limit: Maximum allowed items per page
Returns:
Tuple of (limit, skip, error_response or None)
Example:
limit, skip, error = validate_pagination()
if error:
return error
"""
try:
limit = min(int(request.args.get("limit", default_limit)), max_limit)
skip = int(request.args.get("skip", 0))
if limit < 1 or skip < 0:
return 0, 0, error_response("Invalid pagination parameters")
return limit, skip, None
except ValueError:
return 0, 0, error_response("Invalid pagination parameters")
def check_resource_ownership(
collection: Collection,
resource_id: ObjectId,
user_id: str,
resource_name: str = "Resource",
) -> Tuple[Optional[Dict], Optional[Response]]:
"""
Check if resource exists and belongs to user.
Args:
collection: MongoDB collection
resource_id: Resource ObjectId
user_id: User ID string
resource_name: Name of resource for error messages
Returns:
Tuple of (resource_dict or None, error_response or None)
Example:
workflow, error = check_resource_ownership(
workflows_collection,
workflow_id,
user_id,
"Workflow"
)
if error:
return error
"""
resource = collection.find_one({"_id": resource_id, "user": user_id})
if not resource:
return None, error_response(f"{resource_name} not found", 404)
return resource, None
def serialize_object_id(
obj: Dict[str, Any], id_field: str = "_id", new_field: str = "id"
) -> Dict[str, Any]:
"""
Convert ObjectId to string in a dictionary.
Args:
obj: Dictionary containing ObjectId
id_field: Field name containing ObjectId
new_field: New field name for string ID
Returns:
Modified dictionary
Example:
user = serialize_object_id(user_doc)
# user["id"] = "507f1f77bcf86cd799439011"
"""
if id_field in obj:
obj[new_field] = str(obj[id_field])
if id_field != new_field:
obj.pop(id_field, None)
return obj
def serialize_list(items: List[Dict], serializer: Callable[[Dict], Dict]) -> List[Dict]:
"""
Apply serializer function to list of items.
Args:
items: List of dictionaries
serializer: Function to apply to each item
Returns:
List of serialized items
Example:
workflows = serialize_list(workflow_docs, serialize_workflow)
"""
return [serializer(item) for item in items]
def paginated_response(
collection: Collection,
query: Dict[str, Any],
serializer: Callable[[Dict], Dict],
limit: int,
skip: int,
sort_field: str = "created_at",
sort_order: int = -1,
response_key: str = "items",
) -> Response:
"""
Create paginated response for collection query.
Args:
collection: MongoDB collection
query: Query dictionary
serializer: Function to serialize each item
limit: Items per page
skip: Number of items to skip
sort_field: Field to sort by
sort_order: Sort order (1=asc, -1=desc)
response_key: Key name for items in response
Returns:
Flask Response with paginated data
Example:
return paginated_response(
workflows_collection,
{"user": user_id},
serialize_workflow,
limit, skip,
response_key="workflows"
)
"""
items = list(
collection.find(query).sort(sort_field, sort_order).skip(skip).limit(limit)
)
total = collection.count_documents(query)
return success_response(
{
response_key: serialize_list(items, serializer),
"total": total,
"limit": limit,
"skip": skip,
}
)
def require_fields(required: List[str]) -> Callable:
"""
Decorator to validate required fields in request JSON.
Args:
required: List of required field names
Returns:
Decorator function
Example:
@require_fields(["name", "description"])
def post(self):
data = request.get_json()
...
"""
def require_fields(required: list) -> Callable:
"""Decorator: return 400 if any listed field is missing/falsy in the JSON body."""
def decorator(func: Callable) -> Callable:
@wraps(func)
@@ -294,94 +65,11 @@ def require_fields(required: List[str]) -> Callable:
return error_response("Request body required")
missing = [field for field in required if not data.get(field)]
if missing:
return error_response(f"Missing required fields: {', '.join(missing)}")
return error_response(
f"Missing required fields: {', '.join(missing)}"
)
return func(*args, **kwargs)
return wrapper
return decorator
def safe_db_operation(
operation: Callable, error_message: str = "Database operation failed"
) -> Tuple[Any, Optional[Response]]:
"""
Safely execute database operation with error handling.
Args:
operation: Function to execute
error_message: Error message if operation fails
Returns:
Tuple of (result or None, error_response or None)
Example:
result, error = safe_db_operation(
lambda: collection.insert_one(doc),
"Failed to create resource"
)
if error:
return error
"""
try:
result = operation()
return result, None
except Exception as err:
if has_app_context():
current_app.logger.error(f"{error_message}: {err}", exc_info=True)
return None, error_response(error_message)
def validate_enum(
value: Any, allowed: List[Any], field_name: str
) -> Optional[Response]:
"""
Validate that value is in allowed list.
Args:
value: Value to validate
allowed: List of allowed values
field_name: Field name for error message
Returns:
error_response if invalid, None if valid
Example:
error = validate_enum(status, ["draft", "published"], "status")
if error:
return error
"""
if value not in allowed:
allowed_str = ", ".join(f"'{v}'" for v in allowed)
return error_response(f"Invalid {field_name}. Must be one of: {allowed_str}")
return None
def extract_sort_params(
default_field: str = "created_at",
default_order: str = "desc",
allowed_fields: Optional[List[str]] = None,
) -> Tuple[str, int]:
"""
Extract and validate sort parameters from request.
Args:
default_field: Default sort field
default_order: Default sort order ("asc" or "desc")
allowed_fields: List of allowed sort fields (None = no validation)
Returns:
Tuple of (sort_field, sort_order)
Example:
sort_field, sort_order = extract_sort_params(
allowed_fields=["name", "date", "status"]
)
"""
sort_field = request.args.get("sort", default_field)
sort_order_str = request.args.get("order", default_order).lower()
if allowed_fields and sort_field not in allowed_fields:
sort_field = default_field
sort_order = -1 if sort_order_str == "desc" else 1
return sort_field, sort_order

View File

@@ -1,30 +1,26 @@
"""Workflow management routes."""
from datetime import datetime, timezone
from typing import Any, Dict, List, Optional, Set
from flask import current_app, request
from flask_restx import Namespace, Resource
from application.api.user.base import (
workflow_edges_collection,
workflow_nodes_collection,
workflows_collection,
)
from application.storage.db.base_repository import looks_like_uuid
from application.storage.db.repositories.workflow_edges import WorkflowEdgesRepository
from application.storage.db.repositories.workflow_nodes import WorkflowNodesRepository
from application.storage.db.repositories.workflows import WorkflowsRepository
from application.storage.db.session import db_readonly, db_session
from application.core.json_schema_utils import (
JsonSchemaValidationError,
normalize_json_schema_payload,
)
from application.core.model_utils import get_model_capabilities
from application.api.user.utils import (
check_resource_ownership,
error_response,
get_user_id,
require_auth,
require_fields,
safe_db_operation,
success_response,
validate_object_id,
)
workflows_ns = Namespace("workflows", path="/api")
@@ -35,33 +31,112 @@ def _workflow_error_response(message: str, err: Exception):
return error_response(message)
def _resolve_workflow(repo: WorkflowsRepository, workflow_id: str, user_id: str):
"""Resolve a workflow by UUID or legacy Mongo id, scoped to user."""
if not workflow_id:
return None
if looks_like_uuid(workflow_id):
row = repo.get(workflow_id, user_id)
if row is not None:
return row
return repo.get_by_legacy_id(workflow_id, user_id)
def _write_graph(
conn,
pg_workflow_id: str,
graph_version: int,
nodes_data: List[Dict],
edges_data: List[Dict],
) -> List[Dict]:
"""Bulk-create nodes + edges for one graph version. Uses ON CONFLICT upsert.
Edges arrive with source/target as user-provided node-id strings. We
insert nodes first, capture their ``node_id → UUID`` map, then
translate edges before insertion. Edges referencing missing nodes are
dropped with a warning.
"""
nodes_repo = WorkflowNodesRepository(conn)
edges_repo = WorkflowEdgesRepository(conn)
if nodes_data:
created_nodes = nodes_repo.bulk_create(
pg_workflow_id, graph_version,
[
{
"node_id": n["id"],
"node_type": n["type"],
"title": n.get("title", ""),
"description": n.get("description", ""),
"position": n.get("position", {"x": 0, "y": 0}),
"config": n.get("data", {}),
}
for n in nodes_data
],
)
node_uuid_by_str = {n["node_id"]: n["id"] for n in created_nodes}
else:
created_nodes = []
node_uuid_by_str = {}
if edges_data:
translated_edges: List[Dict] = []
for e in edges_data:
src = e.get("source")
tgt = e.get("target")
from_uuid = node_uuid_by_str.get(src)
to_uuid = node_uuid_by_str.get(tgt)
if not from_uuid or not to_uuid:
current_app.logger.warning(
"Workflow graph write: dropping edge %s; node refs unresolved "
"(source=%s, target=%s)",
e.get("id"), src, tgt,
)
continue
translated_edges.append({
"edge_id": e["id"],
"from_node_id": from_uuid,
"to_node_id": to_uuid,
"source_handle": e.get("sourceHandle"),
"target_handle": e.get("targetHandle"),
})
if translated_edges:
edges_repo.bulk_create(
pg_workflow_id, graph_version, translated_edges,
)
return created_nodes
def serialize_workflow(w: Dict) -> Dict:
"""Serialize workflow document to API response format."""
"""Serialize workflow row to API response format."""
created_at = w.get("created_at")
updated_at = w.get("updated_at")
return {
"id": str(w["_id"]),
"id": str(w["id"]),
"name": w.get("name"),
"description": w.get("description"),
"created_at": w["created_at"].isoformat() if w.get("created_at") else None,
"updated_at": w["updated_at"].isoformat() if w.get("updated_at") else None,
"created_at": created_at.isoformat() if hasattr(created_at, "isoformat") else created_at,
"updated_at": updated_at.isoformat() if hasattr(updated_at, "isoformat") else updated_at,
}
def serialize_node(n: Dict) -> Dict:
"""Serialize workflow node document to API response format."""
"""Serialize workflow node row to API response format."""
return {
"id": n["id"],
"type": n["type"],
"id": n["node_id"],
"type": n["node_type"],
"title": n.get("title"),
"description": n.get("description"),
"position": n.get("position"),
"data": n.get("config", {}),
"data": n.get("config", {}) or {},
}
def serialize_edge(e: Dict) -> Dict:
"""Serialize workflow edge document to API response format."""
"""Serialize workflow edge row to API response format."""
return {
"id": e["id"],
"id": e["edge_id"],
"source": e.get("source_id"),
"target": e.get("target_id"),
"sourceHandle": e.get("source_handle"),
@@ -70,7 +145,7 @@ def serialize_edge(e: Dict) -> Dict:
def get_workflow_graph_version(workflow: Dict) -> int:
"""Get current graph version with legacy fallback."""
"""Get current graph version with fallback."""
raw_version = workflow.get("current_graph_version", 1)
try:
version = int(raw_version)
@@ -79,22 +154,6 @@ def get_workflow_graph_version(workflow: Dict) -> int:
return 1
def fetch_graph_documents(collection, workflow_id: str, graph_version: int) -> List[Dict]:
"""Fetch graph docs for active version, with fallback for legacy unversioned data."""
docs = list(
collection.find({"workflow_id": workflow_id, "graph_version": graph_version})
)
if docs:
return docs
if graph_version == 1:
return list(
collection.find(
{"workflow_id": workflow_id, "graph_version": {"$exists": False}}
)
)
return docs
def validate_json_schema_payload(
json_schema: Any,
) -> tuple[Optional[Dict[str, Any]], Optional[str]]:
@@ -315,49 +374,6 @@ def _can_reach_end(
return any(_can_reach_end(t, edges, node_map, end_ids, visited) for t in outgoing if t)
def create_workflow_nodes(
workflow_id: str, nodes_data: List[Dict], graph_version: int
) -> None:
"""Insert workflow nodes into database."""
if nodes_data:
workflow_nodes_collection.insert_many(
[
{
"id": n["id"],
"workflow_id": workflow_id,
"graph_version": graph_version,
"type": n["type"],
"title": n.get("title", ""),
"description": n.get("description", ""),
"position": n.get("position", {"x": 0, "y": 0}),
"config": n.get("data", {}),
}
for n in nodes_data
]
)
def create_workflow_edges(
workflow_id: str, edges_data: List[Dict], graph_version: int
) -> None:
"""Insert workflow edges into database."""
if edges_data:
workflow_edges_collection.insert_many(
[
{
"id": e["id"],
"workflow_id": workflow_id,
"graph_version": graph_version,
"source_id": e.get("source"),
"target_id": e.get("target"),
"source_handle": e.get("sourceHandle"),
"target_handle": e.get("targetHandle"),
}
for e in edges_data
]
)
@workflows_ns.route("/workflows")
class WorkflowList(Resource):
@@ -369,6 +385,7 @@ class WorkflowList(Resource):
data = request.get_json()
name = data.get("name", "").strip()
description = data.get("description", "")
nodes_data = data.get("nodes", [])
edges_data = data.get("edges", [])
@@ -379,35 +396,16 @@ class WorkflowList(Resource):
)
nodes_data = normalize_agent_node_json_schemas(nodes_data)
now = datetime.now(timezone.utc)
workflow_doc = {
"name": name,
"description": data.get("description", ""),
"user": user_id,
"created_at": now,
"updated_at": now,
"current_graph_version": 1,
}
result, error = safe_db_operation(
lambda: workflows_collection.insert_one(workflow_doc),
"Failed to create workflow",
)
if error:
return error
workflow_id = str(result.inserted_id)
try:
create_workflow_nodes(workflow_id, nodes_data, 1)
create_workflow_edges(workflow_id, edges_data, 1)
with db_session() as conn:
repo = WorkflowsRepository(conn)
workflow = repo.create(user_id, name, description=description)
pg_workflow_id = str(workflow["id"])
_write_graph(conn, pg_workflow_id, 1, nodes_data, edges_data)
except Exception as err:
workflow_nodes_collection.delete_many({"workflow_id": workflow_id})
workflow_edges_collection.delete_many({"workflow_id": workflow_id})
workflows_collection.delete_one({"_id": result.inserted_id})
return _workflow_error_response("Failed to create workflow structure", err)
return _workflow_error_response("Failed to create workflow", err)
return success_response({"id": workflow_id}, 201)
return success_response({"id": pg_workflow_id}, 201)
@workflows_ns.route("/workflows/<string:workflow_id>")
@@ -417,23 +415,22 @@ class WorkflowDetail(Resource):
def get(self, workflow_id: str):
"""Get workflow details with nodes and edges."""
user_id = get_user_id()
obj_id, error = validate_object_id(workflow_id, "Workflow")
if error:
return error
workflow, error = check_resource_ownership(
workflows_collection, obj_id, user_id, "Workflow"
)
if error:
return error
graph_version = get_workflow_graph_version(workflow)
nodes = fetch_graph_documents(
workflow_nodes_collection, workflow_id, graph_version
)
edges = fetch_graph_documents(
workflow_edges_collection, workflow_id, graph_version
)
try:
with db_readonly() as conn:
repo = WorkflowsRepository(conn)
workflow = _resolve_workflow(repo, workflow_id, user_id)
if workflow is None:
return error_response("Workflow not found", 404)
pg_workflow_id = str(workflow["id"])
graph_version = get_workflow_graph_version(workflow)
nodes = WorkflowNodesRepository(conn).find_by_version(
pg_workflow_id, graph_version,
)
edges = WorkflowEdgesRepository(conn).find_by_version(
pg_workflow_id, graph_version,
)
except Exception as err:
return _workflow_error_response("Failed to fetch workflow", err)
return success_response(
{
@@ -448,18 +445,9 @@ class WorkflowDetail(Resource):
def put(self, workflow_id: str):
"""Update workflow and replace nodes/edges."""
user_id = get_user_id()
obj_id, error = validate_object_id(workflow_id, "Workflow")
if error:
return error
workflow, error = check_resource_ownership(
workflows_collection, obj_id, user_id, "Workflow"
)
if error:
return error
data = request.get_json()
name = data.get("name", "").strip()
description = data.get("description", "")
nodes_data = data.get("nodes", [])
edges_data = data.get("edges", [])
@@ -470,55 +458,36 @@ class WorkflowDetail(Resource):
)
nodes_data = normalize_agent_node_json_schemas(nodes_data)
current_graph_version = get_workflow_graph_version(workflow)
next_graph_version = current_graph_version + 1
try:
create_workflow_nodes(workflow_id, nodes_data, next_graph_version)
create_workflow_edges(workflow_id, edges_data, next_graph_version)
except Exception as err:
workflow_nodes_collection.delete_many(
{"workflow_id": workflow_id, "graph_version": next_graph_version}
)
workflow_edges_collection.delete_many(
{"workflow_id": workflow_id, "graph_version": next_graph_version}
)
return _workflow_error_response("Failed to update workflow structure", err)
with db_session() as conn:
repo = WorkflowsRepository(conn)
workflow = _resolve_workflow(repo, workflow_id, user_id)
if workflow is None:
return error_response("Workflow not found", 404)
pg_workflow_id = str(workflow["id"])
current_graph_version = get_workflow_graph_version(workflow)
next_graph_version = current_graph_version + 1
now = datetime.now(timezone.utc)
_, error = safe_db_operation(
lambda: workflows_collection.update_one(
{"_id": obj_id},
{
"$set": {
_write_graph(
conn, pg_workflow_id, next_graph_version,
nodes_data, edges_data,
)
repo.update(
pg_workflow_id, user_id,
{
"name": name,
"description": data.get("description", ""),
"updated_at": now,
"description": description,
"current_graph_version": next_graph_version,
}
},
),
"Failed to update workflow",
)
if error:
workflow_nodes_collection.delete_many(
{"workflow_id": workflow_id, "graph_version": next_graph_version}
)
workflow_edges_collection.delete_many(
{"workflow_id": workflow_id, "graph_version": next_graph_version}
)
return error
try:
workflow_nodes_collection.delete_many(
{"workflow_id": workflow_id, "graph_version": {"$ne": next_graph_version}}
)
workflow_edges_collection.delete_many(
{"workflow_id": workflow_id, "graph_version": {"$ne": next_graph_version}}
)
except Exception as cleanup_err:
current_app.logger.warning(
f"Failed to clean old workflow graph versions for {workflow_id}: {cleanup_err}"
)
},
)
WorkflowNodesRepository(conn).delete_other_versions(
pg_workflow_id, next_graph_version,
)
WorkflowEdgesRepository(conn).delete_other_versions(
pg_workflow_id, next_graph_version,
)
except Exception as err:
return _workflow_error_response("Failed to update workflow", err)
return success_response()
@@ -526,20 +495,14 @@ class WorkflowDetail(Resource):
def delete(self, workflow_id: str):
"""Delete workflow and its graph."""
user_id = get_user_id()
obj_id, error = validate_object_id(workflow_id, "Workflow")
if error:
return error
workflow, error = check_resource_ownership(
workflows_collection, obj_id, user_id, "Workflow"
)
if error:
return error
try:
workflow_nodes_collection.delete_many({"workflow_id": workflow_id})
workflow_edges_collection.delete_many({"workflow_id": workflow_id})
workflows_collection.delete_one({"_id": workflow["_id"], "user": user_id})
with db_session() as conn:
repo = WorkflowsRepository(conn)
workflow = _resolve_workflow(repo, workflow_id, user_id)
if workflow is None:
return error_response("Workflow not found", 404)
# ON DELETE CASCADE on workflow_nodes/edges cleans children.
repo.delete(str(workflow["id"]), user_id)
except Exception as err:
return _workflow_error_response("Failed to delete workflow", err)

View File

@@ -0,0 +1,3 @@
from application.api.v1.routes import v1_bp
__all__ = ["v1_bp"]

View File

@@ -0,0 +1,331 @@
"""Standard chat completions API routes.
Exposes ``/v1/chat/completions`` and ``/v1/models`` endpoints that
follow the widely-adopted chat completions protocol so external tools
(opencode, continue, etc.) can connect to DocsGPT agents.
"""
import json
import logging
import time
import traceback
from typing import Any, Dict, Generator, Optional
from flask import Blueprint, jsonify, make_response, request, Response
from application.api.answer.routes.base import BaseAnswerResource
from application.api.answer.services.stream_processor import StreamProcessor
from application.api.v1.translator import (
translate_request,
translate_response,
translate_stream_event,
)
from application.storage.db.repositories.agents import AgentsRepository
from application.storage.db.session import db_readonly
logger = logging.getLogger(__name__)
v1_bp = Blueprint("v1", __name__, url_prefix="/v1")
def _extract_bearer_token() -> Optional[str]:
"""Extract API key from Authorization: Bearer header."""
auth = request.headers.get("Authorization", "")
if auth.startswith("Bearer "):
return auth[7:].strip()
return None
def _lookup_agent(api_key: str) -> Optional[Dict]:
"""Look up the agent document for this API key."""
try:
with db_readonly() as conn:
return AgentsRepository(conn).find_by_key(api_key)
except Exception:
logger.warning("Failed to look up agent for API key", exc_info=True)
return None
def _get_model_name(agent: Optional[Dict], api_key: str) -> str:
"""Return agent name for display as model name."""
if agent:
return agent.get("name", api_key)
return api_key
class _V1AnswerHelper(BaseAnswerResource):
"""Thin wrapper to access complete_stream / process_response_stream."""
pass
@v1_bp.route("/chat/completions", methods=["POST"])
def chat_completions():
"""Handle POST /v1/chat/completions."""
api_key = _extract_bearer_token()
if not api_key:
return make_response(
jsonify({"error": {"message": "Missing Authorization header", "type": "auth_error"}}),
401,
)
data = request.get_json()
if not data or not data.get("messages"):
return make_response(
jsonify({"error": {"message": "messages field is required", "type": "invalid_request"}}),
400,
)
is_stream = data.get("stream", False)
agent_doc = _lookup_agent(api_key)
model_name = _get_model_name(agent_doc, api_key)
try:
internal_data = translate_request(data, api_key)
except Exception as e:
logger.error(f"/v1/chat/completions translate error: {e}", exc_info=True)
return make_response(
jsonify({"error": {"message": "Failed to process request", "type": "invalid_request"}}),
400,
)
# Link decoded_token to the agent's owner so continuation state,
# logs, and tool execution use the correct user identity. The PG
# ``agents`` row exposes the owner via ``user_id`` (``user`` is the
# legacy Mongo field name kept in ``row_to_dict`` only for the
# mapping ``id``/``_id``).
agent_user = (
(agent_doc.get("user_id") or agent_doc.get("user"))
if agent_doc else None
)
decoded_token = {"sub": agent_user or "api_key_user"}
try:
processor = StreamProcessor(internal_data, decoded_token)
if internal_data.get("tool_actions"):
# Continuation mode
conversation_id = internal_data.get("conversation_id")
if not conversation_id:
return make_response(
jsonify({"error": {"message": "conversation_id required for tool continuation", "type": "invalid_request"}}),
400,
)
(
agent,
messages,
tools_dict,
pending_tool_calls,
tool_actions,
) = processor.resume_from_tool_actions(
internal_data["tool_actions"], conversation_id
)
continuation = {
"messages": messages,
"tools_dict": tools_dict,
"pending_tool_calls": pending_tool_calls,
"tool_actions": tool_actions,
}
question = ""
else:
# Normal mode
question = internal_data.get("question", "")
agent = processor.build_agent(question)
continuation = None
if not processor.decoded_token:
return make_response(
jsonify({"error": {"message": "Unauthorized", "type": "auth_error"}}),
401,
)
helper = _V1AnswerHelper()
usage_error = helper.check_usage(processor.agent_config)
if usage_error:
return usage_error
should_save_conversation = bool(internal_data.get("save_conversation", False))
if is_stream:
return Response(
_stream_response(
helper,
question,
agent,
processor,
model_name,
continuation,
should_save_conversation,
),
mimetype="text/event-stream",
headers={
"Cache-Control": "no-cache",
"X-Accel-Buffering": "no",
},
)
else:
return _non_stream_response(
helper,
question,
agent,
processor,
model_name,
continuation,
should_save_conversation,
)
except ValueError as e:
logger.error(
f"/v1/chat/completions error: {e} - {traceback.format_exc()}",
extra={"error": str(e)},
)
return make_response(
jsonify({"error": {"message": "Failed to process request", "type": "invalid_request"}}),
400,
)
except Exception as e:
logger.error(
f"/v1/chat/completions error: {e} - {traceback.format_exc()}",
extra={"error": str(e)},
)
return make_response(
jsonify({"error": {"message": "Internal server error", "type": "server_error"}}),
500,
)
def _stream_response(
helper: _V1AnswerHelper,
question: str,
agent: Any,
processor: StreamProcessor,
model_name: str,
continuation: Optional[Dict],
should_save_conversation: bool,
) -> Generator[str, None, None]:
"""Generate translated SSE chunks for streaming response."""
completion_id = f"chatcmpl-{int(time.time())}"
internal_stream = helper.complete_stream(
question=question,
agent=agent,
conversation_id=processor.conversation_id,
user_api_key=processor.agent_config.get("user_api_key"),
decoded_token=processor.decoded_token,
agent_id=processor.agent_id,
model_id=processor.model_id,
should_save_conversation=should_save_conversation,
_continuation=continuation,
)
for line in internal_stream:
if not line.strip():
continue
# Parse the internal SSE event
event_str = line.replace("data: ", "").strip()
try:
event_data = json.loads(event_str)
except (json.JSONDecodeError, TypeError):
continue
# Update completion_id when we get the conversation id
if event_data.get("type") == "id":
conv_id = event_data.get("id", "")
if conv_id:
completion_id = f"chatcmpl-{conv_id}"
# Translate to standard format
translated = translate_stream_event(event_data, completion_id, model_name)
for chunk in translated:
yield chunk
def _non_stream_response(
helper: _V1AnswerHelper,
question: str,
agent: Any,
processor: StreamProcessor,
model_name: str,
continuation: Optional[Dict],
should_save_conversation: bool,
) -> Response:
"""Collect full response and return as single JSON."""
stream = helper.complete_stream(
question=question,
agent=agent,
conversation_id=processor.conversation_id,
user_api_key=processor.agent_config.get("user_api_key"),
decoded_token=processor.decoded_token,
agent_id=processor.agent_id,
model_id=processor.model_id,
should_save_conversation=should_save_conversation,
_continuation=continuation,
)
result = helper.process_response_stream(stream)
if result["error"]:
return make_response(
jsonify({"error": {"message": result["error"], "type": "server_error"}}),
500,
)
extra = result.get("extra")
pending = extra.get("pending_tool_calls") if isinstance(extra, dict) else None
response = translate_response(
conversation_id=result["conversation_id"],
answer=result["answer"] or "",
sources=result["sources"],
tool_calls=result["tool_calls"],
thought=result["thought"] or "",
model_name=model_name,
pending_tool_calls=pending,
)
return make_response(jsonify(response), 200)
@v1_bp.route("/models", methods=["GET"])
def list_models():
"""Handle GET /v1/models — return agents as models."""
api_key = _extract_bearer_token()
if not api_key:
return make_response(
jsonify({"error": {"message": "Missing Authorization header", "type": "auth_error"}}),
401,
)
try:
with db_readonly() as conn:
agents_repo = AgentsRepository(conn)
agent = agents_repo.find_by_key(api_key)
if not agent:
return make_response(
jsonify({"error": {"message": "Invalid API key", "type": "auth_error"}}),
401,
)
created = agent.get("created_at") or agent.get("createdAt")
created_ts = (
int(created.timestamp()) if hasattr(created, "timestamp")
else int(time.time())
)
model_id = str(agent.get("id") or agent.get("_id") or "")
model = {
"id": model_id,
"object": "model",
"created": created_ts,
"owned_by": "docsgpt",
"name": agent.get("name", ""),
"description": agent.get("description", ""),
}
return make_response(
jsonify({"object": "list", "data": [model]}),
200,
)
except Exception as e:
logger.error(f"/v1/models error: {e}", exc_info=True)
return make_response(
jsonify({"error": {"message": "Internal server error", "type": "server_error"}}),
500,
)

View File

@@ -0,0 +1,433 @@
"""Translate between standard chat completions format and DocsGPT internals.
This module handles:
- Request translation (chat completions -> DocsGPT internal format)
- Response translation (DocsGPT response -> chat completions format)
- Streaming event translation (DocsGPT SSE -> standard SSE chunks)
"""
import json
import time
from typing import Any, Dict, List, Optional
def _get_client_tool_name(tc: Dict) -> str:
"""Return the original tool name for client-facing responses.
For client-side tools the ``tool_name`` field carries the name the
client originally registered. Fall back to ``action_name`` (which
is now the clean LLM-visible name) or ``name``.
"""
return tc.get("tool_name", tc.get("action_name", tc.get("name", "")))
# ---------------------------------------------------------------------------
# Request translation
# ---------------------------------------------------------------------------
def is_continuation(messages: List[Dict]) -> bool:
"""Check if messages represent a tool-call continuation.
A continuation is detected when the last message(s) have ``role: "tool"``
immediately after an assistant message with ``tool_calls``.
"""
if not messages:
return False
# Walk backwards: if we see tool messages before hitting a non-tool, non-assistant message
# and there's an assistant message with tool_calls, it's a continuation.
i = len(messages) - 1
while i >= 0 and messages[i].get("role") == "tool":
i -= 1
if i < 0:
return False
return (
messages[i].get("role") == "assistant"
and bool(messages[i].get("tool_calls"))
)
def extract_tool_results(messages: List[Dict]) -> List[Dict]:
"""Extract tool results from trailing tool messages for continuation.
Returns a list of ``tool_actions`` dicts with ``call_id`` and ``result``.
"""
results = []
for msg in reversed(messages):
if msg.get("role") != "tool":
break
call_id = msg.get("tool_call_id", "")
content = msg.get("content", "")
if isinstance(content, str):
try:
content = json.loads(content)
except (json.JSONDecodeError, TypeError):
pass
results.append({"call_id": call_id, "result": content})
results.reverse()
return results
def extract_conversation_id(messages: List[Dict]) -> Optional[str]:
"""Try to extract conversation_id from the assistant message before tool results.
The conversation_id may be stored in a custom field on the assistant message
from a previous response cycle.
"""
for msg in reversed(messages):
if msg.get("role") == "assistant":
# Check docsgpt extension
return msg.get("docsgpt", {}).get("conversation_id")
return None
def extract_system_prompt(messages: List[Dict]) -> Optional[str]:
"""Extract the first system message content from the messages array.
Returns None if no system message is present.
"""
for msg in messages:
if msg.get("role") == "system":
return msg.get("content", "")
return None
def convert_history(messages: List[Dict]) -> List[Dict]:
"""Convert chat completions messages array to DocsGPT history format.
DocsGPT history is a list of ``{prompt, response}`` dicts.
Excludes the last user message (that becomes the ``question``).
"""
history = []
i = 0
while i < len(messages):
msg = messages[i]
if msg.get("role") == "system":
i += 1
continue
if msg.get("role") == "user":
# Look ahead for assistant response
if i + 1 < len(messages) and messages[i + 1].get("role") == "assistant":
content = messages[i + 1].get("content") or ""
history.append({
"prompt": msg.get("content", ""),
"response": content,
})
i += 2
continue
# Last user message without response — skip (it's the question)
i += 1
continue
i += 1
return history
def translate_request(
data: Dict[str, Any], api_key: str
) -> Dict[str, Any]:
"""Translate a chat completions request to DocsGPT internal format.
Args:
data: The incoming request body.
api_key: Agent API key from the Authorization header.
Returns:
Dict suitable for passing to ``StreamProcessor``.
"""
messages = data.get("messages", [])
# Check for continuation (tool results after assistant tool_calls)
if is_continuation(messages):
tool_actions = extract_tool_results(messages)
conversation_id = extract_conversation_id(messages)
if not conversation_id:
conversation_id = data.get("conversation_id")
result = {
"conversation_id": conversation_id,
"tool_actions": tool_actions,
"api_key": api_key,
}
# Carry tools forward for next iteration
if data.get("tools"):
result["client_tools"] = data["tools"]
return result
# Normal request — extract question from last user message
question = ""
for msg in reversed(messages):
if msg.get("role") == "user":
question = msg.get("content", "")
break
history = convert_history(messages)
system_prompt_override = extract_system_prompt(messages)
docsgpt = data.get("docsgpt", {})
result = {
"question": question,
"api_key": api_key,
"history": json.dumps(history),
# Conversations are NOT persisted by default on the v1 endpoint.
# Callers opt in via ``docsgpt.save_conversation: true``.
"save_conversation": bool(docsgpt.get("save_conversation", False)),
}
if system_prompt_override is not None:
result["system_prompt_override"] = system_prompt_override
# Client tools
if data.get("tools"):
result["client_tools"] = data["tools"]
# DocsGPT extensions
if docsgpt.get("attachments"):
result["attachments"] = docsgpt["attachments"]
return result
# ---------------------------------------------------------------------------
# Response translation (non-streaming)
# ---------------------------------------------------------------------------
def translate_response(
conversation_id: str,
answer: str,
sources: Optional[List[Dict]],
tool_calls: Optional[List[Dict]],
thought: str,
model_name: str,
pending_tool_calls: Optional[List[Dict]] = None,
) -> Dict[str, Any]:
"""Translate DocsGPT response to chat completions format.
Args:
conversation_id: The DocsGPT conversation ID.
answer: The assistant's text response.
sources: RAG retrieval sources.
tool_calls: Completed tool call results.
thought: Reasoning/thinking tokens.
model_name: Model/agent identifier.
pending_tool_calls: Pending client-side tool calls (if paused).
Returns:
Dict in the standard chat completions response format.
"""
created = int(time.time())
completion_id = f"chatcmpl-{conversation_id}" if conversation_id else f"chatcmpl-{created}"
# Build message
message: Dict[str, Any] = {"role": "assistant"}
if pending_tool_calls:
# Tool calls pending — return them for client execution
message["content"] = None
message["tool_calls"] = [
{
"id": tc.get("call_id", ""),
"type": "function",
"function": {
"name": _get_client_tool_name(tc),
"arguments": (
json.dumps(tc["arguments"])
if isinstance(tc.get("arguments"), dict)
else tc.get("arguments", "{}")
),
},
}
for tc in pending_tool_calls
]
finish_reason = "tool_calls"
else:
message["content"] = answer
if thought:
message["reasoning_content"] = thought
finish_reason = "stop"
result: Dict[str, Any] = {
"id": completion_id,
"object": "chat.completion",
"created": created,
"model": model_name,
"choices": [
{
"index": 0,
"message": message,
"finish_reason": finish_reason,
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0,
},
}
# DocsGPT extensions
docsgpt: Dict[str, Any] = {}
if conversation_id:
docsgpt["conversation_id"] = conversation_id
if sources:
docsgpt["sources"] = sources
if tool_calls:
docsgpt["tool_calls"] = tool_calls
if docsgpt:
result["docsgpt"] = docsgpt
return result
# ---------------------------------------------------------------------------
# Streaming event translation
# ---------------------------------------------------------------------------
def _make_chunk(
completion_id: str,
model_name: str,
delta: Dict[str, Any],
finish_reason: Optional[str] = None,
) -> str:
"""Build a single SSE chunk in the standard streaming format."""
chunk = {
"id": completion_id,
"object": "chat.completion.chunk",
"created": int(time.time()),
"model": model_name,
"choices": [
{
"index": 0,
"delta": delta,
"finish_reason": finish_reason,
}
],
}
return f"data: {json.dumps(chunk)}\n\n"
def _make_docsgpt_chunk(data: Dict[str, Any]) -> str:
"""Build a DocsGPT extension SSE chunk."""
return f"data: {json.dumps({'docsgpt': data})}\n\n"
def translate_stream_event(
event_data: Dict[str, Any],
completion_id: str,
model_name: str,
) -> List[str]:
"""Translate a DocsGPT SSE event dict to standard streaming chunks.
May return 0, 1, or 2 chunks per input event. For example, a completed
tool call produces both a docsgpt extension chunk and nothing on the
standard side (since server-side tool calls aren't surfaced in standard
format).
Args:
event_data: Parsed DocsGPT event dict.
completion_id: The completion ID for this response.
model_name: Model/agent identifier.
Returns:
List of SSE-formatted strings to send to the client.
"""
event_type = event_data.get("type")
chunks: List[str] = []
if event_type == "answer":
chunks.append(
_make_chunk(completion_id, model_name, {"content": event_data.get("answer", "")})
)
elif event_type == "thought":
chunks.append(
_make_chunk(
completion_id, model_name,
{"reasoning_content": event_data.get("thought", "")},
)
)
elif event_type == "source":
chunks.append(
_make_docsgpt_chunk({
"type": "source",
"sources": event_data.get("source", []),
})
)
elif event_type == "tool_call":
tc_data = event_data.get("data", {})
status = tc_data.get("status")
if status == "requires_client_execution":
# Standard: stream as tool_calls delta
args = tc_data.get("arguments", {})
args_str = json.dumps(args) if isinstance(args, dict) else str(args)
chunks.append(
_make_chunk(completion_id, model_name, {
"tool_calls": [{
"index": 0,
"id": tc_data.get("call_id", ""),
"type": "function",
"function": {
"name": _get_client_tool_name(tc_data),
"arguments": args_str,
},
}],
})
)
elif status == "awaiting_approval":
# Extension: approval needed
chunks.append(_make_docsgpt_chunk({"type": "tool_call", "data": tc_data}))
elif status in ("completed", "pending", "error", "denied", "skipped"):
# Extension: tool call progress
chunks.append(_make_docsgpt_chunk({"type": "tool_call", "data": tc_data}))
elif event_type == "tool_calls_pending":
# Standard: finish_reason = tool_calls
chunks.append(
_make_chunk(completion_id, model_name, {}, finish_reason="tool_calls")
)
# Also emit as docsgpt extension
chunks.append(
_make_docsgpt_chunk({
"type": "tool_calls_pending",
"pending_tool_calls": event_data.get("data", {}).get("pending_tool_calls", []),
})
)
elif event_type == "end":
chunks.append(
_make_chunk(completion_id, model_name, {}, finish_reason="stop")
)
chunks.append("data: [DONE]\n\n")
elif event_type == "id":
chunks.append(
_make_docsgpt_chunk({
"type": "id",
"conversation_id": event_data.get("id", ""),
})
)
elif event_type == "error":
# Emit as standard error (non-standard but widely supported)
error_data = {
"error": {
"message": event_data.get("error", "An error occurred"),
"type": "server_error",
}
}
chunks.append(f"data: {json.dumps(error_data)}\n\n")
elif event_type == "structured_answer":
chunks.append(
_make_chunk(
completion_id, model_name,
{"content": event_data.get("answer", "")},
)
)
# Skip: tool_calls (redundant), research_plan, research_progress
return chunks

View File

@@ -1,3 +1,4 @@
import logging
import os
import platform
import uuid
@@ -17,8 +18,10 @@ from application.api.answer import answer # noqa: E402
from application.api.internal.routes import internal # noqa: E402
from application.api.user.routes import user # noqa: E402
from application.api.connector.routes import connector # noqa: E402
from application.api.v1 import v1_bp # noqa: E402
from application.celery_init import celery # noqa: E402
from application.core.settings import settings # noqa: E402
from application.storage.db.bootstrap import ensure_database_ready # noqa: E402
from application.stt.upload_limits import ( # noqa: E402
build_stt_file_size_limit_message,
should_reject_stt_request,
@@ -31,11 +34,23 @@ if platform.system() == "Windows":
pathlib.PosixPath = pathlib.WindowsPath
dotenv.load_dotenv()
# Self-bootstrap the user-data Postgres DB. Runs before any blueprint or
# repository touches the engine, so the first request can't race the
# schema being created. Gated by AUTO_CREATE_DB / AUTO_MIGRATE settings
# (default ON for dev; disable in prod if schema is managed out-of-band).
ensure_database_ready(
settings.POSTGRES_URI,
create_db=settings.AUTO_CREATE_DB,
migrate=settings.AUTO_MIGRATE,
logger=logging.getLogger("application.app"),
)
app = Flask(__name__)
app.register_blueprint(user)
app.register_blueprint(answer)
app.register_blueprint(internal)
app.register_blueprint(connector)
app.register_blueprint(v1_bp)
app.config.update(
UPLOAD_FOLDER="inputs",
CELERY_BROKER_URL=settings.CELERY_BROKER_URL,
@@ -118,6 +133,12 @@ def enforce_stt_request_size_limits():
def authenticate_request():
if request.method == "OPTIONS":
return "", 200
# OpenAI-compatible routes authenticate via opaque agent API keys in the
# Authorization header, which the JWT decoder below would reject. Defer
# auth to the route handlers (see application/api/v1/routes.py).
if request.path.startswith("/v1/"):
request.decoded_token = None
return None
decoded_token = handle_auth(request)
if not decoded_token:
request.decoded_token = None

View File

@@ -1,6 +1,6 @@
from celery import Celery
from application.core.settings import settings
from celery.signals import setup_logging
from celery.signals import setup_logging, worker_process_init
def make_celery(app_name=__name__):
@@ -20,5 +20,24 @@ def config_loggers(*args, **kwargs):
setup_logging()
@worker_process_init.connect
def _dispose_db_engine_on_fork(*args, **kwargs):
"""Dispose the SQLAlchemy engine pool in each forked Celery worker.
SQLAlchemy connection pools are not fork-safe: file descriptors shared
between the parent and a forked worker will corrupt the pool. Disposing
on ``worker_process_init`` gives every worker its own fresh pool on
first use.
Imported lazily so Celery workers that don't touch Postgres (or where
``POSTGRES_URI`` is unset) don't fail at startup.
"""
try:
from application.storage.db.engine import dispose_engine
except Exception:
return
dispose_engine()
celery = make_celery()
celery.config_from_object("application.celeryconfig")

View File

@@ -0,0 +1,89 @@
"""Normalize user-supplied Postgres URIs for different drivers.
DocsGPT has two Postgres connection strings pointing at potentially
different databases:
* ``POSTGRES_URI`` feeds SQLAlchemy, which needs the
``postgresql+psycopg://`` dialect prefix to pick the psycopg v3 driver.
* ``PGVECTOR_CONNECTION_STRING`` feeds ``psycopg.connect()`` directly
(via libpq) in ``application/vectorstore/pgvector.py``. libpq only
understands ``postgres://`` and ``postgresql://`` — the SQLAlchemy
dialect prefix is an invalid URI from its point of view.
The two fields therefore need opposite normalization so operators don't
have to know which driver a given field feeds. Each normalizer also
silently upgrades the legacy ``postgresql+psycopg2://`` prefix since
psycopg2 is no longer in the project.
This module is deliberately separate from ``application/core/settings.py``
so the Settings class stays focused on field declarations, and the
URI-rewriting logic can be unit-tested without triggering ``.env``
file loading from importing Settings.
"""
from __future__ import annotations
def _rewrite_uri_prefixes(v, rewrites):
"""Shared URI prefix rewriter used by both normalizers below.
Strips whitespace, returns ``None`` for empty / ``"none"`` values,
applies the first matching rewrite, and passes unrecognised input
through so downstream consumers (SQLAlchemy, libpq) can produce
their own error messages rather than us silently eating a
misconfiguration.
"""
if v is None:
return None
if not isinstance(v, str):
return v
v = v.strip()
if not v or v.lower() == "none":
return None
for prefix, target in rewrites:
if v.startswith(prefix):
return target + v[len(prefix):]
return v
# POSTGRES_URI feeds SQLAlchemy, which needs a ``postgresql+psycopg://``
# dialect prefix to select the psycopg v3 driver. Normalize the
# operator-friendly forms TOWARD that dialect.
_POSTGRES_URI_REWRITES = (
("postgresql+psycopg2://", "postgresql+psycopg://"),
("postgresql://", "postgresql+psycopg://"),
("postgres://", "postgresql+psycopg://"),
)
# PGVECTOR_CONNECTION_STRING feeds ``psycopg.connect()`` directly in
# application/vectorstore/pgvector.py — NOT SQLAlchemy. libpq only
# understands ``postgres://`` and ``postgresql://``; the SQLAlchemy
# dialect prefix is an invalid URI from libpq's point of view. Strip it
# if the operator accidentally copied their POSTGRES_URI value here.
_PGVECTOR_CONNECTION_STRING_REWRITES = (
("postgresql+psycopg2://", "postgresql://"),
("postgresql+psycopg://", "postgresql://"),
)
def normalize_postgres_uri(v):
"""Normalize a user-supplied POSTGRES_URI to the SQLAlchemy psycopg3 form.
Accepts the forms operators naturally write (``postgres://``,
``postgresql://``) and rewrites them to ``postgresql+psycopg://``.
Unknown schemes pass through unchanged so SQLAlchemy can produce its
own dialect-not-found error.
"""
return _rewrite_uri_prefixes(v, _POSTGRES_URI_REWRITES)
def normalize_pgvector_connection_string(v):
"""Normalize a user-supplied PGVECTOR_CONNECTION_STRING for libpq.
Strips the SQLAlchemy dialect prefix if the operator accidentally
copied their POSTGRES_URI value here — libpq can't parse it.
User-friendly forms (``postgres://``, ``postgresql://``) pass
through unchanged since libpq accepts them natively.
"""
return _rewrite_uri_prefixes(v, _PGVECTOR_CONNECTION_STRING_REWRITES)

View File

@@ -27,6 +27,8 @@ ANTHROPIC_ATTACHMENTS = IMAGE_ATTACHMENTS
OPENROUTER_ATTACHMENTS = IMAGE_ATTACHMENTS
NOVITA_ATTACHMENTS = IMAGE_ATTACHMENTS
OPENAI_MODELS = [
AvailableModel(
@@ -193,6 +195,46 @@ OPENROUTER_MODELS = [
),
]
NOVITA_MODELS = [
AvailableModel(
id="moonshotai/kimi-k2.5",
provider=ModelProvider.NOVITA,
display_name="Kimi K2.5",
description="MoE model with function calling, structured output, reasoning, and vision",
capabilities=ModelCapabilities(
supports_tools=True,
supports_structured_output=True,
supported_attachment_types=NOVITA_ATTACHMENTS,
context_window=262144,
),
),
AvailableModel(
id="zai-org/glm-5",
provider=ModelProvider.NOVITA,
display_name="GLM-5",
description="MoE model with function calling, structured output, and reasoning",
capabilities=ModelCapabilities(
supports_tools=True,
supports_structured_output=True,
supported_attachment_types=[],
context_window=202800,
),
),
AvailableModel(
id="minimax/minimax-m2.5",
provider=ModelProvider.NOVITA,
display_name="MiniMax M2.5",
description="MoE model with function calling, structured output, and reasoning",
capabilities=ModelCapabilities(
supports_tools=True,
supports_structured_output=True,
supported_attachment_types=[],
context_window=204800,
),
),
]
AZURE_OPENAI_MODELS = [
AvailableModel(
id="azure-gpt-4",

View File

@@ -114,6 +114,10 @@ class ModelRegistry:
settings.LLM_PROVIDER == "openrouter" and settings.API_KEY
):
self._add_openrouter_models(settings)
if settings.NOVITA_API_KEY or (
settings.LLM_PROVIDER == "novita" and settings.API_KEY
):
self._add_novita_models(settings)
if settings.HUGGINGFACE_API_KEY or (
settings.LLM_PROVIDER == "huggingface" and settings.API_KEY
):
@@ -245,6 +249,21 @@ class ModelRegistry:
for model in OPENROUTER_MODELS:
self.models[model.id] = model
def _add_novita_models(self, settings):
from application.core.model_configs import NOVITA_MODELS
if settings.NOVITA_API_KEY:
for model in NOVITA_MODELS:
self.models[model.id] = model
return
if settings.LLM_PROVIDER == "novita" and settings.LLM_NAME:
for model in NOVITA_MODELS:
if model.id == settings.LLM_NAME:
self.models[model.id] = model
return
for model in NOVITA_MODELS:
self.models[model.id] = model
def _add_docsgpt_models(self, settings):
model_id = "docsgpt-local"
model = AvailableModel(

View File

@@ -10,6 +10,7 @@ def get_api_key_for_provider(provider: str) -> Optional[str]:
provider_key_map = {
"openai": settings.OPENAI_API_KEY,
"openrouter": settings.OPEN_ROUTER_API_KEY,
"novita": settings.NOVITA_API_KEY,
"anthropic": settings.ANTHROPIC_API_KEY,
"google": settings.GOOGLE_API_KEY,
"groq": settings.GROQ_API_KEY,

View File

@@ -1,24 +0,0 @@
from application.core.settings import settings
from pymongo import MongoClient
class MongoDB:
_client = None
@classmethod
def get_client(cls):
"""
Get the MongoDB client instance, creating it if necessary.
"""
if cls._client is None:
cls._client = MongoClient(settings.MONGO_URI)
return cls._client
@classmethod
def close_client(cls):
"""
Close the MongoDB client connection.
"""
if cls._client is not None:
cls._client.close()
cls._client = None

View File

@@ -5,8 +5,12 @@ from typing import Optional
from pydantic import field_validator
from pydantic_settings import BaseSettings, SettingsConfigDict
current_dir = os.path.dirname(
os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
current_dir = os.path.dirname(os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from application.core.db_uri import ( # noqa: E402
normalize_pgvector_connection_string,
normalize_postgres_uri,
)
@@ -15,19 +19,21 @@ class Settings(BaseSettings):
AUTH_TYPE: Optional[str] = None # simple_jwt, session_jwt, or None
LLM_PROVIDER: str = "docsgpt"
LLM_NAME: Optional[str] = (
None # if LLM_PROVIDER is openai, LLM_NAME can be gpt-4 or gpt-3.5-turbo
)
LLM_NAME: Optional[str] = None # if LLM_PROVIDER is openai, LLM_NAME can be gpt-4 or gpt-3.5-turbo
EMBEDDINGS_NAME: str = "huggingface_sentence-transformers/all-mpnet-base-v2"
EMBEDDINGS_BASE_URL: Optional[str] = None # Remote embeddings API URL (OpenAI-compatible)
EMBEDDINGS_KEY: Optional[str] = (
None # api key for embeddings (if using openai, just copy API_KEY)
)
EMBEDDINGS_KEY: Optional[str] = None # api key for embeddings (if using openai, just copy API_KEY)
CELERY_BROKER_URL: str = "redis://localhost:6379/0"
CELERY_RESULT_BACKEND: str = "redis://localhost:6379/1"
MONGO_URI: str = "mongodb://localhost:27017/docsgpt"
MONGO_DB_NAME: str = "docsgpt"
# Only consulted when VECTOR_STORE=mongodb or when running scripts/db/backfill.py; user data lives in Postgres.
MONGO_URI: Optional[str] = None
# User-data Postgres DB.
POSTGRES_URI: Optional[str] = None
# On app startup, apply pending Alembic migrations. Default ON for dev; disable in prod if you manage schema out-of-band.
AUTO_MIGRATE: bool = True
# On app startup, create the target Postgres database if it's missing (requires CREATEDB privilege). Dev-friendly default.
AUTO_CREATE_DB: bool = True
LLM_PATH: str = os.path.join(current_dir, "models/docsgpt-7b-f16.gguf")
DEFAULT_MAX_HISTORY: int = 150
DEFAULT_LLM_TOKEN_LIMIT: int = 128000 # Fallback when model not found in registry
@@ -45,9 +51,7 @@ class Settings(BaseSettings):
PARSE_IMAGE_REMOTE: bool = False
DOCLING_OCR_ENABLED: bool = False # Enable OCR for docling parsers (PDF, images)
DOCLING_OCR_ATTACHMENTS_ENABLED: bool = False # Enable OCR for docling when parsing attachments
VECTOR_STORE: str = (
"faiss" # "faiss" or "elasticsearch" or "qdrant" or "milvus" or "lancedb" or "pgvector"
)
VECTOR_STORE: str = "faiss" # "faiss" or "elasticsearch" or "qdrant" or "milvus" or "lancedb" or "pgvector"
RETRIEVERS_ENABLED: list = ["classic_rag"]
AGENT_NAME: str = "classic"
FALLBACK_LLM_PROVIDER: Optional[str] = None # provider for fallback llm
@@ -55,12 +59,8 @@ class Settings(BaseSettings):
FALLBACK_LLM_API_KEY: Optional[str] = None # api key for fallback llm
# Google Drive integration
GOOGLE_CLIENT_ID: Optional[str] = (
None # Replace with your actual Google OAuth client ID
)
GOOGLE_CLIENT_SECRET: Optional[str] = (
None # Replace with your actual Google OAuth client secret
)
GOOGLE_CLIENT_ID: Optional[str] = None # Replace with your actual Google OAuth client ID
GOOGLE_CLIENT_SECRET: Optional[str] = None # Replace with your actual Google OAuth client secret
CONNECTOR_REDIRECT_BASE_URI: Optional[str] = (
"http://127.0.0.1:7091/api/connectors/callback" ##add redirect url as it is to your provider's console(gcp)
)
@@ -71,8 +71,12 @@ class Settings(BaseSettings):
MICROSOFT_TENANT_ID: Optional[str] = "common" # Azure AD Tenant ID (or 'common' for multi-tenant)
MICROSOFT_AUTHORITY: Optional[str] = None # e.g., "https://login.microsoftonline.com/{tenant_id}"
# Confluence Cloud integration
CONFLUENCE_CLIENT_ID: Optional[str] = None
CONFLUENCE_CLIENT_SECRET: Optional[str] = None
# GitHub source
GITHUB_ACCESS_TOKEN: Optional[str] = None # PAT token with read repo access
GITHUB_ACCESS_TOKEN: Optional[str] = None # PAT token with read repo access
# LLM Cache
CACHE_REDIS_URL: str = "redis://localhost:6379/2"
@@ -90,16 +94,13 @@ class Settings(BaseSettings):
GROQ_API_KEY: Optional[str] = None
HUGGINGFACE_API_KEY: Optional[str] = None
OPEN_ROUTER_API_KEY: Optional[str] = None
NOVITA_API_KEY: Optional[str] = None
OPENAI_API_BASE: Optional[str] = None # azure openai api base url
OPENAI_API_VERSION: Optional[str] = None # azure openai api version
AZURE_DEPLOYMENT_NAME: Optional[str] = None # azure deployment name for answering
AZURE_EMBEDDINGS_DEPLOYMENT_NAME: Optional[str] = (
None # azure deployment name for embeddings
)
OPENAI_BASE_URL: Optional[str] = (
None # openai base url for open ai compatable models
)
AZURE_EMBEDDINGS_DEPLOYMENT_NAME: Optional[str] = None # azure deployment name for embeddings
OPENAI_BASE_URL: Optional[str] = None # openai base url for open ai compatable models
# elasticsearch
ELASTIC_CLOUD_ID: Optional[str] = None # cloud id for elasticsearch
@@ -132,7 +133,10 @@ class Settings(BaseSettings):
QDRANT_PATH: Optional[str] = None
QDRANT_DISTANCE_FUNC: str = "Cosine"
# PGVector vectorstore config
# PGVector vectorstore config. Write the URI in whichever form you
# prefer — ``postgres://``, ``postgresql://``, or even the SQLAlchemy
# dialect form (``postgresql+psycopg://``) are all accepted and
# normalized internally for ``psycopg.connect()``.
PGVECTOR_CONNECTION_STRING: Optional[str] = None
# Milvus vectorstore config
MILVUS_COLLECTION_NAME: Optional[str] = "docsgpt"
@@ -141,9 +145,7 @@ class Settings(BaseSettings):
# LanceDB vectorstore config
LANCEDB_PATH: str = "./data/lancedb" # Path where LanceDB stores its local data
LANCEDB_TABLE_NAME: Optional[str] = (
"docsgpts" # Name of the table to use for storing vectors
)
LANCEDB_TABLE_NAME: Optional[str] = "docsgpts" # Name of the table to use for storing vectors
FLASK_DEBUG_MODE: bool = False
STORAGE_TYPE: str = "local" # local or s3
@@ -173,6 +175,16 @@ class Settings(BaseSettings):
COMPRESSION_PROMPT_VERSION: str = "v1.0" # Track prompt iterations
COMPRESSION_MAX_HISTORY_POINTS: int = 3 # Keep only last N compression points to prevent DB bloat
@field_validator("POSTGRES_URI", mode="before")
@classmethod
def _normalize_postgres_uri_validator(cls, v):
return normalize_postgres_uri(v)
@field_validator("PGVECTOR_CONNECTION_STRING", mode="before")
@classmethod
def _normalize_pgvector_connection_string_validator(cls, v):
return normalize_pgvector_connection_string(v)
@field_validator(
"API_KEY",
"OPENAI_API_KEY",
@@ -180,6 +192,7 @@ class Settings(BaseSettings):
"GOOGLE_API_KEY",
"GROQ_API_KEY",
"HUGGINGFACE_API_KEY",
"NOVITA_API_KEY",
"EMBEDDINGS_KEY",
"FALLBACK_LLM_API_KEY",
"QDRANT_API_KEY",

View File

@@ -127,15 +127,33 @@ class GoogleLLM(BaseLLM):
).uri,
)
from application.core.mongo_db import MongoDB
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
attachments_collection = db["attachments"]
if "_id" in attachment:
attachments_collection.update_one(
{"_id": attachment["_id"]}, {"$set": {"google_file_uri": file_uri}}
# Cache the Google file URI on the attachment row so we don't
# re-upload on the next LLM call. Accept either a PG UUID
# (``id``) or a legacy Mongo ObjectId (``_id``). Opened per
# write — this runs mid-LLM-call, so we don't wrap the
# surrounding generator in a long-lived session.
attachment_id = attachment.get("id") or attachment.get("_id")
if attachment_id:
user_id = None
decoded = getattr(self, "decoded_token", None)
if isinstance(decoded, dict):
user_id = decoded.get("sub")
from application.storage.db.repositories.attachments import (
AttachmentsRepository,
)
from application.storage.db.session import db_session
try:
with db_session() as conn:
AttachmentsRepository(conn).update_any(
str(attachment_id),
user_id,
{"google_file_uri": file_uri},
)
except Exception as cache_err:
logging.warning(
f"Failed to cache google_file_uri on attachment {attachment_id}: {cache_err}"
)
return file_uri
except Exception as e:
logging.error(f"Error uploading file to Google AI: {e}", exc_info=True)
@@ -167,6 +185,8 @@ class GoogleLLM(BaseLLM):
return "\n".join(parts)
return ""
import json as _json
for message in messages:
role = message.get("role")
content = message.get("content")
@@ -180,9 +200,66 @@ class GoogleLLM(BaseLLM):
if role == "assistant":
role = "model"
elif role == "tool":
role = "model"
parts = []
# Standard format: assistant message with tool_calls array
msg_tool_calls = message.get("tool_calls")
if msg_tool_calls and role == "model":
for tc in msg_tool_calls:
func = tc.get("function", {})
args = func.get("arguments", "{}")
if isinstance(args, str):
try:
args = _json.loads(args)
except (_json.JSONDecodeError, TypeError):
args = {}
cleaned_args = self._remove_null_values(args)
thought_sig = tc.get("thought_signature")
if thought_sig:
parts.append(
types.Part(
functionCall=types.FunctionCall(
name=func.get("name", ""),
args=cleaned_args,
),
thoughtSignature=thought_sig,
)
)
else:
parts.append(
types.Part.from_function_call(
name=func.get("name", ""),
args=cleaned_args,
)
)
if parts:
cleaned_messages.append(types.Content(role=role, parts=parts))
continue
# Standard format: tool message with tool_call_id
tool_call_id = message.get("tool_call_id")
if role == "tool" and tool_call_id is not None:
result_content = content
if isinstance(result_content, str):
try:
result_content = _json.loads(result_content)
except (_json.JSONDecodeError, TypeError):
pass
# Google expects function_response name — extract from tool_call_id context
# We use a placeholder name since Google API doesn't require exact match
parts.append(
types.Part.from_function_response(
name="tool_result",
response={"result": result_content},
)
)
cleaned_messages.append(types.Content(role="model", parts=parts))
continue
if role == "tool":
role = "model"
if role and content is not None:
if isinstance(content, str):
parts = [types.Part.from_text(text=content)]
@@ -191,15 +268,11 @@ class GoogleLLM(BaseLLM):
if "text" in item:
parts.append(types.Part.from_text(text=item["text"]))
elif "function_call" in item:
# Remove null values from args to avoid API errors
# Legacy format support
cleaned_args = self._remove_null_values(
item["function_call"]["args"]
)
# Create function call part with thought_signature if present
# For Gemini 3 models, we need to include thought_signature
if "thought_signature" in item:
# Use Part constructor with functionCall and thoughtSignature
parts.append(
types.Part(
functionCall=types.FunctionCall(
@@ -210,7 +283,6 @@ class GoogleLLM(BaseLLM):
)
)
else:
# Use helper method when no thought_signature
parts.append(
types.Part.from_function_call(
name=item["function_call"]["name"],

View File

@@ -1,3 +1,4 @@
import json
import logging
import uuid
from abc import ABC, abstractmethod
@@ -315,10 +316,34 @@ class LLMHandler(ABC):
current_prompt = self._extract_text_from_content(content)
elif role in {"assistant", "model"}:
# If this assistant turn contains tool calls, collect them; otherwise commit a response.
# Standard format: tool_calls array on assistant message
msg_tool_calls = message.get("tool_calls")
if msg_tool_calls:
for tc in msg_tool_calls:
call_id = tc.get("id") or str(uuid.uuid4())
func = tc.get("function", {})
args = func.get("arguments")
if isinstance(args, str):
try:
args = json.loads(args)
except (json.JSONDecodeError, TypeError):
pass
current_tool_calls[call_id] = {
"tool_name": "unknown_tool",
"action_name": func.get("name"),
"arguments": args,
"result": None,
"status": "called",
"call_id": call_id,
}
continue
# Legacy format: function_call/function_response in content list
if isinstance(content, list):
has_fc = False
for item in content:
if "function_call" in item:
has_fc = True
fc = item["function_call"]
call_id = fc.get("call_id") or str(uuid.uuid4())
current_tool_calls[call_id] = {
@@ -329,37 +354,30 @@ class LLMHandler(ABC):
"status": "called",
"call_id": call_id,
}
elif "function_response" in item:
fr = item["function_response"]
call_id = fr.get("call_id") or str(uuid.uuid4())
current_tool_calls[call_id] = {
"tool_name": "unknown_tool",
"action_name": fr.get("name"),
"arguments": None,
"result": fr.get("response", {}).get("result"),
"status": "completed",
"call_id": call_id,
}
# No direct assistant text here; continue to next message
continue
if has_fc:
continue
response_text = self._extract_text_from_content(content)
_commit_query(response_text)
elif role == "tool":
# Attach tool outputs to the latest pending tool call if possible
# Standard format: tool_call_id on tool message
call_id = message.get("tool_call_id")
tool_text = self._extract_text_from_content(content)
# Attempt to parse function_response style
call_id = None
if isinstance(content, list):
for item in content:
if "function_response" in item and item["function_response"].get("call_id"):
call_id = item["function_response"]["call_id"]
break
if call_id and call_id in current_tool_calls:
current_tool_calls[call_id]["result"] = tool_text
current_tool_calls[call_id]["status"] = "completed"
elif queries:
# Legacy: function_response in content list
elif isinstance(content, list):
for item in content:
if "function_response" in item:
legacy_id = item["function_response"].get("call_id")
if legacy_id and legacy_id in current_tool_calls:
current_tool_calls[legacy_id]["result"] = tool_text
current_tool_calls[legacy_id]["status"] = "completed"
break
elif call_id is None and queries:
queries[-1].setdefault("tool_calls", []).append(
{
"tool_name": "unknown_tool",
@@ -648,6 +666,13 @@ class LLMHandler(ABC):
"""
Execute tool calls and update conversation history.
When a tool requires approval or client-side execution, it is
collected as a pending action instead of being executed. The
generator returns ``(updated_messages, pending_actions)`` where
*pending_actions* is ``None`` when every tool was executed
normally, or a list of dicts describing actions the client must
resolve before the LLM loop can continue.
Args:
agent: The agent instance
tool_calls: List of tool calls to execute
@@ -655,9 +680,11 @@ class LLMHandler(ABC):
messages: Current conversation history
Returns:
Updated messages list
Tuple of (updated_messages, pending_actions).
pending_actions is None if all tools executed, otherwise a list.
"""
updated_messages = messages.copy()
pending_actions: List[Dict] = []
for i, call in enumerate(tool_calls):
# Check context limit before executing tool call
@@ -763,6 +790,29 @@ class LLMHandler(ABC):
# Set flag on agent
agent.context_limit_reached = True
break
# ---- Pause check: approval / client-side execution ----
llm_class = agent.llm.__class__.__name__
pause_info = agent.tool_executor.check_pause(
tools_dict, call, llm_class
)
if pause_info:
# Yield pause event so the client knows this tool is waiting
yield {
"type": "tool_call",
"data": {
"tool_name": pause_info["tool_name"],
"call_id": pause_info["call_id"],
"action_name": pause_info.get("llm_name", pause_info["name"]),
"arguments": pause_info["arguments"],
"status": pause_info["pause_type"],
},
}
pending_actions.append(pause_info)
# Do NOT add messages for pending tools here.
# They will be added on resume to keep call/result pairs together.
continue
try:
self.tool_calls.append(call)
tool_executor_gen = agent._execute_tool_action(tools_dict, call)
@@ -772,25 +822,30 @@ class LLMHandler(ABC):
except StopIteration as e:
tool_response, call_id = e.value
break
function_call_content = {
"function_call": {
"name": call.name,
"args": call.arguments,
"call_id": call_id,
}
}
# Include thought_signature for Google Gemini 3 models
# It should be at the same level as function_call, not inside it
if call.thought_signature:
function_call_content["thought_signature"] = call.thought_signature
updated_messages.append(
{
"role": "assistant",
"content": [function_call_content],
}
)
# Standard internal format: assistant message with tool_calls array
args_str = (
json.dumps(call.arguments)
if isinstance(call.arguments, dict)
else call.arguments
)
tool_call_obj = {
"id": call_id,
"type": "function",
"function": {
"name": call.name,
"arguments": args_str,
},
}
# Preserve thought_signature for Google Gemini 3 models
if call.thought_signature:
tool_call_obj["thought_signature"] = call.thought_signature
updated_messages.append({
"role": "assistant",
"content": None,
"tool_calls": [tool_call_obj],
})
updated_messages.append(self.create_tool_message(call, tool_response))
except Exception as e:
@@ -802,16 +857,15 @@ class LLMHandler(ABC):
error_message = self.create_tool_message(error_call, error_response)
updated_messages.append(error_message)
call_parts = call.name.split("_")
if len(call_parts) >= 2:
tool_id = call_parts[-1] # Last part is tool ID (e.g., "1")
action_name = "_".join(call_parts[:-1])
tool_name = tools_dict.get(tool_id, {}).get("name", "unknown_tool")
full_action_name = f"{action_name}_{tool_id}"
mapping = agent.tool_executor._name_to_tool
if call.name in mapping:
resolved_tool_id, _ = mapping[call.name]
tool_name = tools_dict.get(resolved_tool_id, {}).get(
"name", "unknown_tool"
)
else:
tool_name = "unknown_tool"
action_name = call.name
full_action_name = call.name
full_action_name = call.name
yield {
"type": "tool_call",
"data": {
@@ -823,7 +877,7 @@ class LLMHandler(ABC):
"status": "error",
},
}
return updated_messages
return updated_messages, pending_actions if pending_actions else None
def handle_non_streaming(
self, agent, response: Any, tools_dict: Dict, messages: List[Dict]
@@ -851,8 +905,22 @@ class LLMHandler(ABC):
try:
yield next(tool_handler_gen)
except StopIteration as e:
messages = e.value
messages, pending_actions = e.value
break
# If tools need approval or client execution, pause the loop
if pending_actions:
agent._pending_continuation = {
"messages": messages,
"pending_tool_calls": pending_actions,
"tools_dict": tools_dict,
}
yield {
"type": "tool_calls_pending",
"data": {"pending_tool_calls": pending_actions},
}
return ""
response = agent.llm.gen(
model=agent.model_id, messages=messages, tools=agent.tools
)
@@ -913,10 +981,23 @@ class LLMHandler(ABC):
try:
yield next(tool_handler_gen)
except StopIteration as e:
messages = e.value
messages, pending_actions = e.value
break
tool_calls = {}
# If tools need approval or client execution, pause the loop
if pending_actions:
agent._pending_continuation = {
"messages": messages,
"pending_tool_calls": pending_actions,
"tools_dict": tools_dict,
}
yield {
"type": "tool_calls_pending",
"data": {"pending_tool_calls": pending_actions},
}
return
# Check if context limit was reached during tool execution
if hasattr(agent, 'context_limit_reached') and agent.context_limit_reached:
# Add system message warning about context limit

View File

@@ -67,18 +67,18 @@ class GoogleLLMHandler(LLMHandler):
)
def create_tool_message(self, tool_call: ToolCall, result: Any) -> Dict:
"""Create Google-style tool message."""
"""Create a tool result message in the standard internal format."""
import json as _json
content = (
_json.dumps(result)
if not isinstance(result, str)
else result
)
return {
"role": "model",
"content": [
{
"function_response": {
"name": tool_call.name,
"response": {"result": result},
}
}
],
"role": "tool",
"tool_call_id": tool_call.id,
"content": content,
}
def _iterate_stream(self, response: Any) -> Generator:

View File

@@ -7,6 +7,7 @@ class LLMHandlerCreator:
handlers = {
"openai": OpenAILLMHandler,
"google": GoogleLLMHandler,
"novita": OpenAILLMHandler, # Novita uses OpenAI-compatible API
"default": OpenAILLMHandler,
}

View File

@@ -37,18 +37,18 @@ class OpenAILLMHandler(LLMHandler):
)
def create_tool_message(self, tool_call: ToolCall, result: Any) -> Dict:
"""Create OpenAI-style tool message."""
"""Create a tool result message in the standard internal format."""
import json as _json
content = (
_json.dumps(result)
if not isinstance(result, str)
else result
)
return {
"role": "tool",
"content": [
{
"function_response": {
"name": tool_call.name,
"response": {"result": result},
"call_id": tool_call.id,
}
}
],
"tool_call_id": tool_call.id,
"content": content,
}
def _iterate_stream(self, response: Any) -> Generator:

View File

@@ -1,13 +1,13 @@
from application.core.settings import settings
from application.llm.openai import OpenAILLM
NOVITA_BASE_URL = "https://api.novita.ai/v3/openai"
NOVITA_BASE_URL = "https://api.novita.ai/openai"
class NovitaLLM(OpenAILLM):
def __init__(self, api_key=None, user_api_key=None, base_url=None, *args, **kwargs):
super().__init__(
api_key=api_key or settings.API_KEY,
api_key=api_key or settings.NOVITA_API_KEY or settings.API_KEY,
user_api_key=user_api_key,
base_url=base_url or NOVITA_BASE_URL,
*args,

View File

@@ -91,16 +91,52 @@ class OpenAILLM(BaseLLM):
if role == "model":
role = "assistant"
# Standard format: assistant message with tool_calls (passthrough)
tool_calls = message.get("tool_calls")
if tool_calls and role == "assistant":
cleaned_tcs = []
for tc in tool_calls:
func = tc.get("function", {})
args = func.get("arguments", "{}")
if isinstance(args, dict):
args = json.dumps(self._remove_null_values(args))
elif isinstance(args, str):
try:
parsed = json.loads(args)
args = json.dumps(self._remove_null_values(parsed))
except (json.JSONDecodeError, TypeError):
pass
cleaned_tcs.append({
"id": tc.get("id", ""),
"type": "function",
"function": {"name": func.get("name", ""), "arguments": args},
})
cleaned_messages.append({
"role": "assistant",
"content": None,
"tool_calls": cleaned_tcs,
})
continue
# Standard format: tool message with tool_call_id (passthrough)
tool_call_id = message.get("tool_call_id")
if role == "tool" and tool_call_id is not None:
cleaned_messages.append({
"role": "tool",
"tool_call_id": tool_call_id,
"content": content if isinstance(content, str) else json.dumps(content),
})
continue
if role and content is not None:
if isinstance(content, str):
cleaned_messages.append({"role": role, "content": content})
elif isinstance(content, list):
# Collect all content parts into a single message
content_parts = []
for item in content:
# Legacy format support: function_call / function_response
if "function_call" in item:
# Function calls need their own message
args = item["function_call"]["args"]
if isinstance(args, str):
try:
@@ -116,28 +152,20 @@ class OpenAILLM(BaseLLM):
"arguments": json.dumps(cleaned_args),
},
}
cleaned_messages.append(
{
"role": "assistant",
"content": None,
"tool_calls": [tool_call],
}
)
cleaned_messages.append({
"role": "assistant",
"content": None,
"tool_calls": [tool_call],
})
elif "function_response" in item:
# Function responses need their own message
cleaned_messages.append(
{
"role": "tool",
"tool_call_id": item["function_response"][
"call_id"
],
"content": json.dumps(
item["function_response"]["response"]["result"]
),
}
)
cleaned_messages.append({
"role": "tool",
"tool_call_id": item["function_response"]["call_id"],
"content": json.dumps(
item["function_response"]["response"]["result"]
),
})
elif isinstance(item, dict):
# Collect content parts (text, images, files) into a single message
if "type" in item and item["type"] == "text" and "text" in item:
content_parts.append(item)
elif "type" in item and item["type"] == "file" and "file" in item:
@@ -145,10 +173,7 @@ class OpenAILLM(BaseLLM):
elif "type" in item and item["type"] == "image_url" and "image_url" in item:
content_parts.append(item)
elif "text" in item and "type" not in item:
# Legacy format: {"text": "..."} without type
content_parts.append({"type": "text", "text": item["text"]})
# Add the collected content parts as a single message
if content_parts:
cleaned_messages.append({"role": role, "content": content_parts})
else:
@@ -502,15 +527,34 @@ class OpenAILLM(BaseLLM):
).id,
)
from application.core.mongo_db import MongoDB
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
attachments_collection = db["attachments"]
if "_id" in attachment:
attachments_collection.update_one(
{"_id": attachment["_id"]}, {"$set": {"openai_file_id": file_id}}
# Cache the OpenAI file id on the attachment row so we don't
# re-upload the same blob on the next LLM call. Prefer the PG
# UUID (``id``) when present; fall back to the legacy Mongo
# ObjectId string (``_id``). Opened per-write — this runs
# inside the hot LLM path, so we don't want a long-lived
# session wrapping the generator.
attachment_id = attachment.get("id") or attachment.get("_id")
if attachment_id:
user_id = None
decoded = getattr(self, "decoded_token", None)
if isinstance(decoded, dict):
user_id = decoded.get("sub")
from application.storage.db.repositories.attachments import (
AttachmentsRepository,
)
from application.storage.db.session import db_session
try:
with db_session() as conn:
AttachmentsRepository(conn).update_any(
str(attachment_id),
user_id,
{"openai_file_id": file_id},
)
except Exception as cache_err:
logging.warning(
f"Failed to cache openai_file_id on attachment {attachment_id}: {cache_err}"
)
return file_id
except Exception as e:
logging.error(f"Error uploading file to OpenAI: {e}", exc_info=True)

View File

@@ -6,8 +6,8 @@ import logging
import uuid
from typing import Any, Callable, Dict, Generator, List
from application.core.mongo_db import MongoDB
from application.core.settings import settings
from application.storage.db.repositories.stack_logs import StackLogsRepository
from application.storage.db.session import db_session
logging.basicConfig(
level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s"
@@ -101,7 +101,7 @@ def _consume_and_log(generator: Generator, context: "LogContext"):
except Exception as e:
logging.exception(f"Error in {context.endpoint} - {context.activity_id}: {e}")
context.stacks.append({"component": "error", "data": {"message": str(e)}})
_log_to_mongodb(
_log_activity_to_db(
endpoint=context.endpoint,
activity_id=context.activity_id,
user=context.user,
@@ -112,7 +112,7 @@ def _consume_and_log(generator: Generator, context: "LogContext"):
)
raise
finally:
_log_to_mongodb(
_log_activity_to_db(
endpoint=context.endpoint,
activity_id=context.activity_id,
user=context.user,
@@ -123,7 +123,7 @@ def _consume_and_log(generator: Generator, context: "LogContext"):
)
def _log_to_mongodb(
def _log_activity_to_db(
endpoint: str,
activity_id: str,
user: str,
@@ -132,30 +132,26 @@ def _log_to_mongodb(
stacks: List[Dict],
level: str,
) -> None:
"""Append a per-request activity log row to Postgres (``stack_logs``)."""
try:
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
user_logs_collection = db["stack_logs"]
log_entry = {
"endpoint": endpoint,
"id": activity_id,
"level": level,
"user": user,
"api_key": api_key,
"query": query,
"stacks": stacks,
"timestamp": datetime.datetime.now(datetime.timezone.utc),
}
# clean up text fields to be no longer than 10000 characters
for key, value in log_entry.items():
if isinstance(value, str) and len(value) > 10000:
log_entry[key] = value[:10000]
user_logs_collection.insert_one(log_entry)
logging.debug(f"Logged activity to MongoDB: {activity_id}")
# Clean up text fields to be no longer than 10000 characters so a
# runaway payload can't blow up the insert.
def _truncate(val):
if isinstance(val, str) and len(val) > 10000:
return val[:10000]
return val
with db_session() as conn:
StackLogsRepository(conn).insert(
activity_id=activity_id,
endpoint=_truncate(endpoint),
level=_truncate(level),
user_id=_truncate(user),
api_key=_truncate(api_key),
query=_truncate(query),
stacks=stacks,
timestamp=datetime.datetime.now(datetime.timezone.utc),
)
logging.debug(f"Logged activity to Postgres: {activity_id}")
except Exception as e:
logging.error(f"Failed to log to MongoDB: {e}", exc_info=True)
logging.error(f"Failed to log activity to Postgres: {e}", exc_info=True)

View File

@@ -0,0 +1,37 @@
"""Shared helpers for connector auth modules.
These helpers exist so that sensitive values (session tokens, bearer
credentials) never end up interpolated into exception messages or log
lines. Exception messages frequently flow into ``stack_logs`` (Postgres)
and Sentry via ``exc_info=True``, so the raw value must never be the
thing we format.
"""
from __future__ import annotations
import hashlib
def session_token_fingerprint(session_token: str) -> str:
"""Return a short, irreversible fingerprint for a session token.
The returned string is safe to embed in exception messages and log
lines: it is a prefix of a SHA-256 digest, clearly tagged so an
operator reading the log knows it is a hash and not the token
itself. It is stable for a given input, which lets operators
correlate "which token failed" across log lines without exposing
the credential.
Args:
session_token: The raw session token. Accepts ``None`` or the
empty string for defensive callers; both yield a distinct
sentinel rather than raising.
Returns:
A string of the form ``"sha256:<6 hex chars>"``, or
``"sha256:<empty>"`` when the input is falsy.
"""
if not session_token:
return "sha256:<empty>"
digest = hashlib.sha256(session_token.encode("utf-8")).hexdigest()
return f"sha256:{digest[:6]}"

View File

@@ -0,0 +1,4 @@
from .auth import ConfluenceAuth
from .loader import ConfluenceLoader
__all__ = ["ConfluenceAuth", "ConfluenceLoader"]

View File

@@ -0,0 +1,221 @@
import datetime
import logging
from typing import Any, Dict, Optional
from urllib.parse import urlencode
import requests
from application.core.settings import settings
from application.parser.connectors._auth_utils import session_token_fingerprint
from application.parser.connectors.base import BaseConnectorAuth
logger = logging.getLogger(__name__)
class ConfluenceAuth(BaseConnectorAuth):
SCOPES = [
"read:page:confluence",
"read:space:confluence",
"read:attachment:confluence",
"read:me",
"offline_access",
]
AUTH_URL = "https://auth.atlassian.com/authorize"
TOKEN_URL = "https://auth.atlassian.com/oauth/token"
RESOURCES_URL = "https://api.atlassian.com/oauth/token/accessible-resources"
ME_URL = "https://api.atlassian.com/me"
def __init__(self):
self.client_id = settings.CONFLUENCE_CLIENT_ID
self.client_secret = settings.CONFLUENCE_CLIENT_SECRET
self.redirect_uri = settings.CONNECTOR_REDIRECT_BASE_URI
if not self.client_id or not self.client_secret:
raise ValueError(
"Confluence OAuth credentials not configured. "
"Please set CONFLUENCE_CLIENT_ID and CONFLUENCE_CLIENT_SECRET in settings."
)
def get_authorization_url(self, state: Optional[str] = None) -> str:
params = {
"audience": "api.atlassian.com",
"client_id": self.client_id,
"scope": " ".join(self.SCOPES),
"redirect_uri": self.redirect_uri,
"state": state,
"response_type": "code",
"prompt": "consent",
}
return f"{self.AUTH_URL}?{urlencode(params)}"
def exchange_code_for_tokens(self, authorization_code: str) -> Dict[str, Any]:
if not authorization_code:
raise ValueError("Authorization code is required")
response = requests.post(
self.TOKEN_URL,
json={
"grant_type": "authorization_code",
"client_id": self.client_id,
"client_secret": self.client_secret,
"code": authorization_code,
"redirect_uri": self.redirect_uri,
},
headers={"Content-Type": "application/json"},
timeout=30,
)
response.raise_for_status()
token_data = response.json()
access_token = token_data.get("access_token")
if not access_token:
raise ValueError("OAuth flow did not return an access token")
refresh_token = token_data.get("refresh_token")
if not refresh_token:
raise ValueError("OAuth flow did not return a refresh token")
expires_in = token_data.get("expires_in", 3600)
expiry = (
datetime.datetime.now(datetime.timezone.utc)
+ datetime.timedelta(seconds=expires_in)
).isoformat()
cloud_id = self._fetch_cloud_id(access_token)
user_info = self._fetch_user_info(access_token)
return {
"access_token": access_token,
"refresh_token": refresh_token,
"token_uri": self.TOKEN_URL,
"scopes": self.SCOPES,
"expiry": expiry,
"cloud_id": cloud_id,
"user_info": {
"name": user_info.get("display_name", ""),
"email": user_info.get("email", ""),
},
}
def refresh_access_token(self, refresh_token: str) -> Dict[str, Any]:
if not refresh_token:
raise ValueError("Refresh token is required")
response = requests.post(
self.TOKEN_URL,
json={
"grant_type": "refresh_token",
"client_id": self.client_id,
"client_secret": self.client_secret,
"refresh_token": refresh_token,
},
headers={"Content-Type": "application/json"},
timeout=30,
)
response.raise_for_status()
token_data = response.json()
access_token = token_data.get("access_token")
new_refresh_token = token_data.get("refresh_token", refresh_token)
expires_in = token_data.get("expires_in", 3600)
expiry = (
datetime.datetime.now(datetime.timezone.utc)
+ datetime.timedelta(seconds=expires_in)
).isoformat()
cloud_id = self._fetch_cloud_id(access_token)
return {
"access_token": access_token,
"refresh_token": new_refresh_token,
"token_uri": self.TOKEN_URL,
"scopes": self.SCOPES,
"expiry": expiry,
"cloud_id": cloud_id,
}
def is_token_expired(self, token_info: Dict[str, Any]) -> bool:
if not token_info:
return True
expiry = token_info.get("expiry")
if not expiry:
return bool(token_info.get("access_token"))
try:
expiry_dt = datetime.datetime.fromisoformat(expiry)
now = datetime.datetime.now(datetime.timezone.utc)
return now >= expiry_dt - datetime.timedelta(seconds=60)
except Exception:
return True
def get_token_info_from_session(self, session_token: str) -> Dict[str, Any]:
from application.storage.db.repositories.connector_sessions import (
ConnectorSessionsRepository,
)
from application.storage.db.session import db_readonly
with db_readonly() as conn:
session = ConnectorSessionsRepository(conn).get_by_session_token(
session_token
)
if not session:
raise ValueError(
f"Invalid session token ({session_token_fingerprint(session_token)})"
)
token_info = session.get("token_info")
if not token_info:
raise ValueError("Session missing token information")
required = ["access_token", "refresh_token", "cloud_id"]
missing = [f for f in required if not token_info.get(f)]
if missing:
raise ValueError(f"Missing required token fields: {missing}")
return token_info
def sanitize_token_info(
self, token_info: Dict[str, Any], **extra_fields
) -> Dict[str, Any]:
return super().sanitize_token_info(
token_info,
cloud_id=token_info.get("cloud_id"),
**extra_fields,
)
def _fetch_cloud_id(self, access_token: str) -> str:
response = requests.get(
self.RESOURCES_URL,
headers={
"Authorization": f"Bearer {access_token}",
"Accept": "application/json",
},
timeout=30,
)
response.raise_for_status()
resources = response.json()
if not resources:
raise ValueError("No accessible Confluence sites found for this account")
return resources[0]["id"]
def _fetch_user_info(self, access_token: str) -> Dict[str, Any]:
try:
response = requests.get(
self.ME_URL,
headers={
"Authorization": f"Bearer {access_token}",
"Accept": "application/json",
},
timeout=30,
)
response.raise_for_status()
return response.json()
except Exception as e:
logger.warning("Could not fetch user info: %s", e)
return {}

View File

@@ -0,0 +1,417 @@
import functools
import logging
import os
from typing import Any, Dict, List, Optional
import requests
from application.parser.connectors.base import BaseConnectorLoader
from application.parser.connectors.confluence.auth import ConfluenceAuth
from application.parser.schema.base import Document
logger = logging.getLogger(__name__)
API_V2 = "https://api.atlassian.com/ex/confluence/{cloud_id}/wiki/api/v2"
DOWNLOAD_BASE = "https://api.atlassian.com/ex/confluence/{cloud_id}/wiki"
SUPPORTED_ATTACHMENT_TYPES = {
"application/pdf": ".pdf",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document": ".docx",
"application/vnd.openxmlformats-officedocument.presentationml.presentation": ".pptx",
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": ".xlsx",
"application/msword": ".doc",
"application/vnd.ms-powerpoint": ".ppt",
"application/vnd.ms-excel": ".xls",
"text/plain": ".txt",
"text/csv": ".csv",
"text/html": ".html",
"text/markdown": ".md",
"application/json": ".json",
"application/epub+zip": ".epub",
"image/jpeg": ".jpg",
"image/png": ".png",
}
def _retry_on_auth_failure(func):
@functools.wraps(func)
def wrapper(self, *args, **kwargs):
try:
return func(self, *args, **kwargs)
except requests.exceptions.HTTPError as e:
if e.response is not None and e.response.status_code in (401, 403):
logger.info(
"Auth failure in %s, refreshing token and retrying", func.__name__
)
try:
new_token_info = self.auth.refresh_access_token(self.refresh_token)
self.access_token = new_token_info["access_token"]
self.refresh_token = new_token_info.get(
"refresh_token", self.refresh_token
)
self._persist_refreshed_tokens(new_token_info)
except Exception as refresh_err:
raise ValueError(
f"Authentication failed and could not be refreshed: {refresh_err}"
) from e
return func(self, *args, **kwargs)
raise
return wrapper
class ConfluenceLoader(BaseConnectorLoader):
def __init__(self, session_token: str):
self.auth = ConfluenceAuth()
self.session_token = session_token
token_info = self.auth.get_token_info_from_session(session_token)
self.access_token = token_info["access_token"]
self.refresh_token = token_info["refresh_token"]
self.cloud_id = token_info["cloud_id"]
self.base_url = API_V2.format(cloud_id=self.cloud_id)
self.download_base = DOWNLOAD_BASE.format(cloud_id=self.cloud_id)
self.next_page_token = None
def _headers(self) -> Dict[str, str]:
return {
"Authorization": f"Bearer {self.access_token}",
"Accept": "application/json",
}
def _persist_refreshed_tokens(self, token_info: Dict[str, Any]) -> None:
try:
from application.storage.db.repositories.connector_sessions import (
ConnectorSessionsRepository,
)
from application.storage.db.session import db_session
sanitized = self.auth.sanitize_token_info(token_info)
with db_session() as conn:
repo = ConnectorSessionsRepository(conn)
session = repo.get_by_session_token(self.session_token)
if session:
repo.update(str(session["id"]), {"token_info": sanitized})
except Exception as e:
logger.warning("Failed to persist refreshed tokens: %s", e)
@_retry_on_auth_failure
def load_data(self, inputs: Dict[str, Any]) -> List[Document]:
folder_id = inputs.get("folder_id")
file_ids = inputs.get("file_ids", [])
limit = inputs.get("limit", 100)
list_only = inputs.get("list_only", False)
page_token = inputs.get("page_token")
search_query = inputs.get("search_query")
self.next_page_token = None
if file_ids:
return self._load_pages_by_ids(file_ids, list_only, search_query)
if folder_id:
return self._list_pages_in_space(
folder_id, limit, list_only, page_token, search_query
)
return self._list_spaces(limit, page_token, search_query)
@_retry_on_auth_failure
def download_to_directory(self, local_dir: str, source_config: dict = None) -> dict:
config = source_config or getattr(self, "config", {})
file_ids = config.get("file_ids", [])
folder_ids = config.get("folder_ids", [])
files_downloaded = 0
os.makedirs(local_dir, exist_ok=True)
if isinstance(file_ids, str):
file_ids = [file_ids]
if isinstance(folder_ids, str):
folder_ids = [folder_ids]
for page_id in file_ids:
if self._download_page(page_id, local_dir):
files_downloaded += 1
files_downloaded += self._download_page_attachments(page_id, local_dir)
for space_id in folder_ids:
files_downloaded += self._download_space(space_id, local_dir)
return {
"files_downloaded": files_downloaded,
"directory_path": local_dir,
"empty_result": files_downloaded == 0,
"source_type": "confluence",
"config_used": config,
}
def _list_spaces(
self, limit: int, cursor: Optional[str], search_query: Optional[str]
) -> List[Document]:
documents: List[Document] = []
params: Dict[str, Any] = {"limit": min(limit, 250)}
if cursor:
params["cursor"] = cursor
response = requests.get(
f"{self.base_url}/spaces",
headers=self._headers(),
params=params,
timeout=30,
)
response.raise_for_status()
data = response.json()
for space in data.get("results", []):
name = space.get("name", "")
if search_query and search_query.lower() not in name.lower():
continue
documents.append(
Document(
text="",
doc_id=space["id"],
extra_info={
"file_name": name,
"mime_type": "folder",
"size": None,
"created_time": space.get("createdAt"),
"modified_time": None,
"source": "confluence",
"is_folder": True,
"space_key": space.get("key"),
},
)
)
next_link = data.get("_links", {}).get("next")
self.next_page_token = self._extract_cursor(next_link)
return documents
def _list_pages_in_space(
self,
space_id: str,
limit: int,
list_only: bool,
cursor: Optional[str],
search_query: Optional[str],
) -> List[Document]:
documents: List[Document] = []
params: Dict[str, Any] = {"limit": min(limit, 250)}
if cursor:
params["cursor"] = cursor
response = requests.get(
f"{self.base_url}/spaces/{space_id}/pages",
headers=self._headers(),
params=params,
timeout=30,
)
response.raise_for_status()
data = response.json()
for page in data.get("results", []):
title = page.get("title", "")
if search_query and search_query.lower() not in title.lower():
continue
doc = self._page_to_document(
page, load_content=not list_only, space_id=space_id
)
if doc:
documents.append(doc)
next_link = data.get("_links", {}).get("next")
self.next_page_token = self._extract_cursor(next_link)
return documents
def _load_pages_by_ids(
self, page_ids: List[str], list_only: bool, search_query: Optional[str]
) -> List[Document]:
documents: List[Document] = []
for page_id in page_ids:
try:
params: Dict[str, str] = {}
if not list_only:
params["body-format"] = "storage"
response = requests.get(
f"{self.base_url}/pages/{page_id}",
headers=self._headers(),
params=params,
timeout=30,
)
response.raise_for_status()
page = response.json()
title = page.get("title", "")
if search_query and search_query.lower() not in title.lower():
continue
doc = self._page_to_document(page, load_content=not list_only)
if doc:
documents.append(doc)
except Exception as e:
logger.error("Error loading page %s: %s", page_id, e)
return documents
def _page_to_document(
self,
page: Dict[str, Any],
load_content: bool = False,
space_id: Optional[str] = None,
) -> Optional[Document]:
page_id = page.get("id")
title = page.get("title", "Unknown")
version = page.get("version", {})
modified_time = version.get("createdAt") if isinstance(version, dict) else None
created_time = page.get("createdAt")
resolved_space_id = space_id or page.get("spaceId")
text = ""
if load_content:
body = page.get("body", {})
storage = body.get("storage", {}) if isinstance(body, dict) else {}
text = storage.get("value", "") if isinstance(storage, dict) else ""
return Document(
text=text,
doc_id=str(page_id),
extra_info={
"file_name": title,
"mime_type": "text/html",
"size": len(text) if text else None,
"created_time": created_time,
"modified_time": modified_time,
"source": "confluence",
"is_folder": False,
"page_id": str(page_id),
"space_id": resolved_space_id,
"cloud_id": self.cloud_id,
},
)
def _download_page(self, page_id: str, local_dir: str) -> bool:
try:
response = requests.get(
f"{self.base_url}/pages/{page_id}",
headers=self._headers(),
params={"body-format": "storage"},
timeout=30,
)
response.raise_for_status()
page = response.json()
title = page.get("title", page_id)
safe_name = "".join(c if c.isalnum() or c in " -_" else "_" for c in title)
body = page.get("body", {}).get("storage", {}).get("value", "")
file_path = os.path.join(local_dir, f"{safe_name}.html")
with open(file_path, "w", encoding="utf-8") as f:
f.write(body)
return True
except Exception as e:
logger.error("Error downloading page %s: %s", page_id, e)
return False
def _download_page_attachments(self, page_id: str, local_dir: str) -> int:
downloaded = 0
try:
cursor = None
while True:
params: Dict[str, Any] = {"limit": 100}
if cursor:
params["cursor"] = cursor
response = requests.get(
f"{self.base_url}/pages/{page_id}/attachments",
headers=self._headers(),
params=params,
timeout=30,
)
response.raise_for_status()
data = response.json()
for att in data.get("results", []):
media_type = att.get("mediaType", "")
if media_type not in SUPPORTED_ATTACHMENT_TYPES:
continue
download_link = att.get("_links", {}).get("download")
if not download_link:
continue
raw_name = att.get("title", att.get("id", "attachment"))
file_name = "".join(
c if c.isalnum() or c in " -_." else "_"
for c in os.path.basename(raw_name)
) or "attachment"
file_path = os.path.join(local_dir, file_name)
url = f"{self.download_base}{download_link}"
file_resp = requests.get(
url, headers=self._headers(), timeout=60, stream=True
)
file_resp.raise_for_status()
with open(file_path, "wb") as f:
for chunk in file_resp.iter_content(chunk_size=8192):
f.write(chunk)
downloaded += 1
next_link = data.get("_links", {}).get("next")
cursor = self._extract_cursor(next_link)
if not cursor:
break
except Exception as e:
logger.error("Error downloading attachments for page %s: %s", page_id, e)
return downloaded
def _download_space(self, space_id: str, local_dir: str) -> int:
downloaded = 0
cursor = None
while True:
params: Dict[str, Any] = {"limit": 250}
if cursor:
params["cursor"] = cursor
try:
response = requests.get(
f"{self.base_url}/spaces/{space_id}/pages",
headers=self._headers(),
params=params,
timeout=30,
)
response.raise_for_status()
data = response.json()
except Exception as e:
logger.error("Error listing pages in space %s: %s", space_id, e)
break
for page in data.get("results", []):
page_id = page.get("id")
if self._download_page(str(page_id), local_dir):
downloaded += 1
downloaded += self._download_page_attachments(str(page_id), local_dir)
next_link = data.get("_links", {}).get("next")
cursor = self._extract_cursor(next_link)
if not cursor:
break
return downloaded
@staticmethod
def _extract_cursor(next_link: Optional[str]) -> Optional[str]:
if not next_link:
return None
from urllib.parse import parse_qs, urlparse
parsed = urlparse(next_link)
cursors = parse_qs(parsed.query).get("cursor")
return cursors[0] if cursors else None

View File

@@ -1,5 +1,7 @@
from application.parser.connectors.google_drive.loader import GoogleDriveLoader
from application.parser.connectors.confluence.auth import ConfluenceAuth
from application.parser.connectors.confluence.loader import ConfluenceLoader
from application.parser.connectors.google_drive.auth import GoogleDriveAuth
from application.parser.connectors.google_drive.loader import GoogleDriveLoader
from application.parser.connectors.share_point.auth import SharePointAuth
from application.parser.connectors.share_point.loader import SharePointLoader
@@ -13,11 +15,13 @@ class ConnectorCreator:
"""
connectors = {
"confluence": ConfluenceLoader,
"google_drive": GoogleDriveLoader,
"share_point": SharePointLoader,
}
auth_providers = {
"confluence": ConfluenceAuth,
"google_drive": GoogleDriveAuth,
"share_point": SharePointAuth,
}

View File

@@ -8,6 +8,7 @@ from googleapiclient.discovery import build
from googleapiclient.errors import HttpError
from application.core.settings import settings
from application.parser.connectors._auth_utils import session_token_fingerprint
from application.parser.connectors.base import BaseConnectorAuth
@@ -209,23 +210,23 @@ class GoogleDriveAuth(BaseConnectorAuth):
def get_token_info_from_session(self, session_token: str) -> Dict[str, Any]:
try:
from application.core.mongo_db import MongoDB
from application.core.settings import settings
from application.storage.db.repositories.connector_sessions import (
ConnectorSessionsRepository,
)
from application.storage.db.session import db_readonly
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
sessions_collection = db["connector_sessions"]
session = sessions_collection.find_one({"session_token": session_token})
with db_readonly() as conn:
session = ConnectorSessionsRepository(conn).get_by_session_token(
session_token
)
if not session:
raise ValueError(f"Invalid session token: {session_token}")
raise ValueError(
f"Invalid session token ({session_token_fingerprint(session_token)})"
)
if "token_info" not in session:
raise ValueError("Session missing token information")
token_info = session["token_info"]
token_info = session.get("token_info")
if not token_info:
raise ValueError("Invalid token information")
raise ValueError("Session missing token information")
required_fields = ["access_token", "refresh_token"]
missing_fields = [field for field in required_fields if field not in token_info or not token_info.get(field)]

View File

@@ -5,6 +5,7 @@ from typing import Optional, Dict, Any
from msal import ConfidentialClientApplication
from application.core.settings import settings
from application.parser.connectors._auth_utils import session_token_fingerprint
from application.parser.connectors.base import BaseConnectorAuth
logger = logging.getLogger(__name__)
@@ -77,24 +78,24 @@ class SharePointAuth(BaseConnectorAuth):
def get_token_info_from_session(self, session_token: str) -> Dict[str, Any]:
try:
from application.core.mongo_db import MongoDB
from application.core.settings import settings
from application.storage.db.repositories.connector_sessions import (
ConnectorSessionsRepository,
)
from application.storage.db.session import db_readonly
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
sessions_collection = db["connector_sessions"]
session = sessions_collection.find_one({"session_token": session_token})
with db_readonly() as conn:
session = ConnectorSessionsRepository(conn).get_by_session_token(
session_token
)
if not session:
raise ValueError(f"Invalid session token: {session_token}")
raise ValueError(
f"Invalid session token ({session_token_fingerprint(session_token)})"
)
if "token_info" not in session:
raise ValueError("Session missing token information")
token_info = session["token_info"]
token_info = session.get("token_info")
if not token_info:
raise ValueError("Invalid token information")
raise ValueError("Session missing token information")
required_fields = ["access_token", "refresh_token"]
missing_fields = [field for field in required_fields if field not in token_info or not token_info.get(field)]

View File

@@ -205,7 +205,7 @@ class SharePointLoader(BaseConnectorLoader):
try:
url = self._get_item_url(file_id)
params = {'$select': 'id,name,file,createdDateTime,lastModifiedDateTime,size'}
response = requests.get(url, headers=self._get_headers(), params=params)
response = requests.get(url, headers=self._get_headers(), params=params, timeout=100)
response.raise_for_status()
file_metadata = response.json()
@@ -236,9 +236,9 @@ class SharePointLoader(BaseConnectorLoader):
search_url = f"{self.GRAPH_API_BASE}/drives/{drive_id}/root/search(q='{encoded_query}')"
else:
search_url = f"{self.GRAPH_API_BASE}/me/drive/search(q='{encoded_query}')"
response = requests.get(search_url, headers=self._get_headers(), params=params)
response = requests.get(search_url, headers=self._get_headers(), params=params, timeout=100)
else:
response = requests.get(url, headers=self._get_headers(), params=params)
response = requests.get(url, headers=self._get_headers(), params=params, timeout=100)
response.raise_for_status()
@@ -307,7 +307,8 @@ class SharePointLoader(BaseConnectorLoader):
response = requests.get(
f"{self.GRAPH_API_BASE}/me/drive",
headers=self._get_headers(),
params={'$select': 'webUrl'}
params={'$select': 'webUrl'},
timeout=100,
)
response.raise_for_status()
return response.json().get('webUrl')
@@ -352,7 +353,7 @@ class SharePointLoader(BaseConnectorLoader):
headers = self._get_headers()
headers["Content-Type"] = "application/json"
response = requests.post(url, headers=headers, json=body)
response = requests.post(url, headers=headers, json=body, timeout=100)
response.raise_for_status()
results = response.json()
@@ -472,7 +473,7 @@ class SharePointLoader(BaseConnectorLoader):
try:
url = f"{self._get_item_url(file_id)}/content"
response = requests.get(url, headers=self._get_headers())
response = requests.get(url, headers=self._get_headers(), timeout=100)
response.raise_for_status()
try:
@@ -491,7 +492,7 @@ class SharePointLoader(BaseConnectorLoader):
try:
url = self._get_item_url(file_id)
params = {'$select': 'id,name,file'}
response = requests.get(url, headers=self._get_headers(), params=params)
response = requests.get(url, headers=self._get_headers(), params=params, timeout=100)
response.raise_for_status()
metadata = response.json()
@@ -507,7 +508,7 @@ class SharePointLoader(BaseConnectorLoader):
full_path = os.path.join(local_dir, file_name)
download_url = f"{self._get_item_url(file_id)}/content"
download_response = requests.get(download_url, headers=self._get_headers())
download_response = requests.get(download_url, headers=self._get_headers(), timeout=100)
download_response.raise_for_status()
with open(full_path, 'wb') as f:
@@ -527,7 +528,7 @@ class SharePointLoader(BaseConnectorLoader):
params = {'$top': 1000}
while url:
response = requests.get(url, headers=self._get_headers(), params=params)
response = requests.get(url, headers=self._get_headers(), params=params, timeout=100)
response.raise_for_status()
results = response.json()
@@ -609,7 +610,7 @@ class SharePointLoader(BaseConnectorLoader):
try:
url = self._get_item_url(folder_id)
params = {'$select': 'id,name'}
response = requests.get(url, headers=self._get_headers(), params=params)
response = requests.get(url, headers=self._get_headers(), params=params, timeout=100)
response.raise_for_status()
folder_metadata = response.json()

View File

@@ -24,7 +24,7 @@ class PDFParser(BaseParser):
# alternatively you can use local vision capable LLM
with open(file, "rb") as file_loaded:
files = {'file': file_loaded}
response = requests.post(doc2md_service, files=files)
response = requests.post(doc2md_service, files=files, timeout=100)
data = response.json()["markdown"]
return data

View File

@@ -19,25 +19,10 @@ class EpubParser(BaseParser):
def parse_file(self, file: Path, errors: str = "ignore") -> str:
"""Parse file."""
try:
import ebooklib
from ebooklib import epub
from fast_ebook import epub
except ImportError:
raise ValueError("`EbookLib` is required to read Epub files.")
try:
import html2text
except ImportError:
raise ValueError("`html2text` is required to parse Epub files.")
raise ValueError("`fast-ebook` is required to read Epub files.")
text_list = []
book = epub.read_epub(file, options={"ignore_ncx": True})
# Iterate through all chapters.
for item in book.get_items():
# Chapters are typically located in epub documents items.
if item.get_type() == ebooklib.ITEM_DOCUMENT:
text_list.append(
html2text.html2text(item.get_content().decode("utf-8"))
)
text = "\n".join(text_list)
book = epub.read_epub(file)
text = book.to_markdown()
return text

View File

@@ -24,7 +24,7 @@ class ImageParser(BaseParser):
# alternatively you can use local vision capable LLM
with open(file, "rb") as file_loaded:
files = {'file': file_loaded}
response = requests.post(doc2md_service, files=files)
response = requests.post(doc2md_service, files=files, timeout=100)
data = response.json()["markdown"]
else:
data = ""

View File

@@ -77,7 +77,7 @@ class GitHubLoader(BaseRemote):
def _make_request(self, url: str, max_retries: int = 3) -> requests.Response:
"""Make a request with retry logic for rate limiting"""
for attempt in range(max_retries):
response = requests.get(url, headers=self.headers)
response = requests.get(url, headers=self.headers, timeout=100)
if response.status_code == 200:
return response

View File

@@ -4,6 +4,7 @@ import os
import tempfile
import mimetypes
from typing import List, Optional
from application.core.url_validation import SSRFError, validate_url
from application.parser.remote.base import BaseRemote
from application.parser.schema.base import Document
@@ -108,6 +109,11 @@ class S3Loader(BaseRemote):
logger.info(f"Normalized endpoint URL: {normalized_endpoint}")
logger.info(f"Bucket name: '{corrected_bucket}'")
try:
normalized_endpoint = validate_url(normalized_endpoint)
except SSRFError as e:
raise ValueError(f"Invalid S3 endpoint_url: {e}") from e
client_kwargs["endpoint_url"] = normalized_endpoint
# Use path-style addressing for S3-compatible services
# (DigitalOcean Spaces, MinIO, etc.)

View File

@@ -36,6 +36,11 @@ class SitemapLoader(BaseRemote):
if self.limit is not None and processed_urls >= self.limit:
break # Stop processing if the limit is reached
try:
url = validate_url(url)
except SSRFError as e:
logging.error(f"URL validation failed for sitemap entry {url}: {e}")
continue
try:
loader = self.loader([url])
documents.extend(loader.load())
@@ -90,6 +95,15 @@ class SitemapLoader(BaseRemote):
# Check for nested sitemaps
for sitemap in root.findall('.//sitemap/loc'):
nested_sitemap_url = sitemap.text
if not nested_sitemap_url:
continue
try:
nested_sitemap_url = validate_url(nested_sitemap_url)
except SSRFError as e:
logging.error(
f"URL validation failed for nested sitemap {nested_sitemap_url}: {e}"
)
continue
urls.extend(self._extract_urls(nested_sitemap_url))
return urls

View File

@@ -1,8 +1,8 @@
import logging
from application.core.url_validation import SSRFError, validate_url
from application.parser.remote.base import BaseRemote
from application.parser.schema.base import Document
from langchain_community.document_loaders import WebBaseLoader
from urllib.parse import urlparse
headers = {
"User-Agent": "Mozilla/5.0",
@@ -26,9 +26,13 @@ class WebLoader(BaseRemote):
urls = [urls]
documents = []
for url in urls:
# Check if the URL scheme is provided, if not, assume http
if not urlparse(url).scheme:
url = "http://" + url
try:
url = validate_url(url)
except SSRFError as e:
logging.warning(
f"Skipping URL due to SSRF validation failure: {url} - {e}"
)
continue
try:
loader = self.loader([url], header_template=headers)
loaded_docs = loader.load()

View File

@@ -1,9 +1,10 @@
anthropic==0.75.0
boto3==1.42.17
alembic>=1.13,<2
anthropic==0.88.0
boto3==1.42.83
beautifulsoup4==4.14.3
cel-python==0.5.0
celery==5.6.0
cryptography==46.0.3
celery==5.6.3
cryptography==46.0.7
dataclasses-json==0.6.7
defusedxml==0.7.1
docling>=2.16.0
@@ -11,89 +12,83 @@ rapidocr>=1.4.0
onnxruntime>=1.19.0
docx2txt==0.9
ddgs>=8.0.0
ebooklib==0.20
escodegen==1.0.11
esprima==4.0.1
esutils==1.0.1
elevenlabs==2.27.0
Flask==3.1.2
fast-ebook
elevenlabs==2.41.0
Flask==3.1.3
faiss-cpu==1.13.2
fastmcp==2.14.1
fastmcp==3.2.0
flask-restx==1.3.2
google-genai==1.54.0
google-api-python-client==2.187.0
google-auth-httplib2==0.3.0
google-auth-oauthlib==1.2.3
google-genai==1.69.0
google-api-python-client==2.193.0
google-auth-httplib2==0.3.1
google-auth-oauthlib==1.3.1
gTTS==2.5.4
gunicorn==23.0.0
html2text==2025.4.15
javalang==0.13.0
gunicorn==25.3.0
jinja2==3.1.6
jiter==0.12.0
jmespath==1.0.1
jiter==0.13.0
jmespath==1.1.0
joblib==1.5.3
jsonpatch==1.33
jsonpointer==3.0.0
kombu==5.6.1
langchain==1.2.0
jsonpointer==3.1.1
kombu==5.6.2
langchain==1.2.3
langchain-community==0.4.1
langchain-core==1.2.5
langchain-openai==1.1.6
langchain-text-splitters==1.1.0
langsmith==0.5.1
langchain-core==1.2.29
langchain-openai==1.1.12
langchain-text-splitters==1.1.1
langsmith==0.7.31
lazy-object-proxy==1.12.0
lxml==6.0.2
markupsafe==3.0.3
marshmallow>=3.18.0,<5.0.0
mpmath==1.3.0
multidict==6.7.0
msal==1.34.0
multidict==6.7.1
msal==1.35.1
mypy-extensions==1.1.0
networkx==3.6.1
numpy==2.4.0
openai==2.14.0
numpy==2.4.4
openai==2.30.0
openapi3-parser==1.1.22
orjson==3.11.5
packaging==24.2
pandas==2.3.3
orjson==3.11.7
packaging==26.0
pandas==3.0.2
openpyxl==3.1.5
pathable==0.4.4
pathable==0.5.0
pdf2image>=1.17.0
pillow
portalocker>=2.7.0,<3.0.0
prance==25.4.8.0
portalocker>=2.7.0,<4.0.0
prompt-toolkit==3.0.52
protobuf==6.33.2
psycopg2-binary==2.9.11
protobuf==7.34.1
psycopg[binary,pool]>=3.1,<4
py==1.11.0
pydantic
pydantic-core
pydantic-settings
pymongo==4.15.5
pypdf==6.5.0
pypdf==6.9.2
python-dateutil==2.9.0.post0
python-dotenv
python-jose==3.5.0
python-pptx==1.0.2
redis==7.1.0
redis==7.4.0
referencing>=0.28.0,<0.38.0
regex==2025.11.3
requests==2.32.5
regex==2026.4.4
requests==2.33.1
retry==0.9.2
sentence-transformers==5.2.0
sentence-transformers==5.3.0
sqlalchemy>=2.0,<3
tiktoken==0.12.0
tokenizers==0.22.1
torch==2.9.1
tqdm==4.67.1
transformers==4.57.3
tokenizers==0.22.2
torch==2.11.0
tqdm==4.67.3
transformers==5.4.0
typing-extensions==4.15.0
typing-inspect==0.9.0
tzdata==2025.3
tzdata==2026.1
urllib3==2.6.3
vine==5.1.0
wcwidth==0.2.14
wcwidth==0.6.0
werkzeug>=3.1.0
yarl==1.22.0
yarl==1.23.0
markdownify==1.2.2
tldextract==5.3.0
websockets==15.0.1
tldextract==5.3.1
websockets==16.0

View File

@@ -1,7 +1,5 @@
import click
from application.core.mongo_db import MongoDB
from application.core.settings import settings
from application.seed.seeder import DatabaseSeeder
@@ -15,10 +13,7 @@ def seed():
@click.option("--force", is_flag=True, help="Force reseeding even if data exists")
def init(force):
"""Initialize database with seed data"""
mongo = MongoDB.get_client()
db = mongo[settings.MONGO_DB_NAME]
seeder = DatabaseSeeder(db)
seeder = DatabaseSeeder()
seeder.seed_initial_data(force=force)

View File

@@ -1,35 +1,56 @@
"""Database seeder — Postgres-native.
Post-Part-2 cutover: writes template prompts/tools/agents/sources directly
into Postgres via the repository layer. No MongoDB dependencies.
The seeder is invoked by the ``python -m application.seed.commands init``
CLI (not at Flask app startup). All template rows are owned by the
sentinel user id ``__system__`` — kept in sync with the migration
backfill/cleanup-trigger sentinel so template ownership is predictable.
"""
import logging
import os
from datetime import datetime, timezone
from typing import Dict, List, Optional, Union
from typing import Dict, List, Optional
import yaml
from bson import ObjectId
from bson.dbref import DBRef
from dotenv import load_dotenv
from pymongo import MongoClient
from application.agents.tools.tool_manager import ToolManager
from application.api.user.tasks import ingest_remote
from application.storage.db.repositories.agents import AgentsRepository
from application.storage.db.repositories.prompts import PromptsRepository
from application.storage.db.repositories.sources import SourcesRepository
from application.storage.db.repositories.user_tools import UserToolsRepository
from application.storage.db.session import db_readonly, db_session
load_dotenv()
tool_config = {}
tool_manager = ToolManager(config=tool_config)
# Sentinel user id for template rows (agents/prompts/sources/tools).
# Kept in sync with the Postgres backfill / cleanup-trigger sentinel so
# template ownership is predictable across the cutover.
SYSTEM_USER_ID = "__system__"
class DatabaseSeeder:
def __init__(self, db):
self.db = db
self.tools_collection = self.db["user_tools"]
self.sources_collection = self.db["sources"]
self.agents_collection = self.db["agents"]
self.prompts_collection = self.db["prompts"]
self.system_user_id = "system"
"""Postgres-backed seeder.
The constructor accepts an optional positional argument for back
compatibility with legacy callers that used to pass a Mongo ``db``
handle. The value is ignored — all persistence goes through the
Postgres repositories.
"""
def __init__(self, db=None):
self._legacy_db = db # unused; retained for call-site compatibility
self.system_user_id = SYSTEM_USER_ID
self.logger = logging.getLogger(__name__)
def seed_initial_data(self, config_path: str = None, force=False):
"""Main entry point for seeding all initial data"""
"""Main entry point for seeding all initial data."""
if not force and self._is_already_seeded():
self.logger.info("Database already seeded. Use force=True to reseed.")
return
@@ -46,20 +67,18 @@ class DatabaseSeeder:
raise
def _seed_from_config(self, config: Dict):
"""Seed all data from configuration"""
self.logger.info("🌱 Starting seeding...")
"""Seed all data from configuration."""
self.logger.info("Starting seeding...")
if not config.get("agents"):
self.logger.warning("No agents found in config")
return
used_tool_ids = set()
for agent_config in config["agents"]:
try:
self.logger.info(f"Processing agent: {agent_config['name']}")
# 1. Handle Source
source_result = self._handle_source(agent_config)
if source_result is False:
self.logger.error(
@@ -67,64 +86,100 @@ class DatabaseSeeder:
)
continue
source_id = source_result
# 2. Handle Tools
# 2. Handle Tools
tool_ids = self._handle_tools(agent_config)
if len(tool_ids) == 0:
self.logger.warning(
f"No valid tools for agent {agent_config['name']}"
)
used_tool_ids.update(tool_ids)
# 3. Handle Prompt
prompt_id = self._handle_prompt(agent_config)
# 4. Create Agent
# 4. Create or update Agent
self._upsert_agent(agent_config, source_id, tool_ids, prompt_id)
agent_data = {
"user": self.system_user_id,
"name": agent_config["name"],
"description": agent_config["description"],
"image": agent_config.get("image", ""),
"source": (
DBRef("sources", ObjectId(source_id)) if source_id else ""
),
"tools": [str(tid) for tid in tool_ids],
"agent_type": agent_config["agent_type"],
"prompt_id": prompt_id or agent_config.get("prompt_id", "default"),
"chunks": agent_config.get("chunks", "0"),
"retriever": agent_config.get("retriever", ""),
"status": "template",
"createdAt": datetime.now(timezone.utc),
"updatedAt": datetime.now(timezone.utc),
}
existing = self.agents_collection.find_one(
{"user": self.system_user_id, "name": agent_config["name"]}
)
if existing:
self.logger.info(f"Updating existing agent: {agent_config['name']}")
self.agents_collection.update_one(
{"_id": existing["_id"]}, {"$set": agent_data}
)
agent_id = existing["_id"]
else:
self.logger.info(f"Creating new agent: {agent_config['name']}")
result = self.agents_collection.insert_one(agent_data)
agent_id = result.inserted_id
self.logger.info(
f"Successfully processed agent: {agent_config['name']} (ID: {agent_id})"
)
except Exception as e:
self.logger.error(
f"Error processing agent {agent_config['name']}: {str(e)}"
)
continue
self.logger.info("Database seeding completed")
self.logger.info("Database seeding completed")
def _handle_source(self, agent_config: Dict) -> Union[ObjectId, None, bool]:
"""Handle source ingestion and return source ID"""
@staticmethod
def _coerce_uuid_fk(raw) -> Optional[str]:
"""Coerce sentinel/blank values to ``None`` for nullable UUID FK columns.
Mirrors the route-side handling in ``application/api/user/agents/routes.py``:
the literal string ``"default"``, empty string, and ``None`` all map
to ``None`` so the repository layer skips the column and Postgres
keeps the FK NULL (FKs are ``ON DELETE SET NULL``).
"""
if raw in (None, "", "default"):
return None
return str(raw)
def _upsert_agent(
self,
agent_config: Dict,
source_id: Optional[str],
tool_ids: List[str],
prompt_id: Optional[str],
) -> None:
"""Create or update a template agent owned by ``__system__``."""
name = agent_config["name"]
prompt_id_val = self._coerce_uuid_fk(
prompt_id if prompt_id is not None else agent_config.get("prompt_id")
)
folder_id_val = self._coerce_uuid_fk(agent_config.get("folder_id"))
workflow_id_val = self._coerce_uuid_fk(agent_config.get("workflow_id"))
source_id_val = self._coerce_uuid_fk(source_id)
agent_fields = {
"description": agent_config["description"],
"image": agent_config.get("image", ""),
"tools": [str(tid) for tid in tool_ids],
"agent_type": agent_config["agent_type"],
"prompt_id": prompt_id_val,
"chunks": agent_config.get("chunks", "0"),
"retriever": agent_config.get("retriever", ""),
}
if folder_id_val is not None:
agent_fields["folder_id"] = folder_id_val
if workflow_id_val is not None:
agent_fields["workflow_id"] = workflow_id_val
if source_id_val is not None:
agent_fields["source_id"] = source_id_val
with db_session() as conn:
repo = AgentsRepository(conn)
existing = self._find_system_agent_by_name(repo, name)
if existing:
self.logger.info(f"Updating existing agent: {name}")
repo.update(str(existing["id"]), self.system_user_id, agent_fields)
self.logger.info(f"Successfully updated agent: {name} (ID: {existing['id']})")
else:
self.logger.info(f"Creating new agent: {name}")
created = repo.create(
user_id=self.system_user_id,
name=name,
status="template",
**agent_fields,
)
self.logger.info(
f"Successfully created agent: {name} (ID: {created.get('id')})"
)
@staticmethod
def _find_system_agent_by_name(repo: AgentsRepository, name: str) -> Optional[dict]:
"""Find a system-owned agent by name among the template rows."""
for row in repo.list_for_user(SYSTEM_USER_ID):
if row.get("name") == name:
return row
return None
def _handle_source(self, agent_config: Dict):
"""Handle source ingestion and return a source id (UUID string) or ``None``/``False``."""
if not agent_config.get("source"):
self.logger.info(
"No source provided for agent - will create agent without source"
@@ -134,14 +189,15 @@ class DatabaseSeeder:
self.logger.info(f"Ingesting source: {source_config['url']}")
try:
existing = self.sources_collection.find_one(
{"user": self.system_user_id, "remote_data": source_config["url"]}
)
with db_readonly() as conn:
existing = self._find_system_source_by_remote_url(
SourcesRepository(conn), source_config["url"]
)
if existing:
self.logger.info(f"Source already exists: {existing['_id']}")
return existing["_id"]
# Ingest new source using worker
self.logger.info(f"Source already exists: {existing['id']}")
return existing["id"]
# Ingest new source using worker
task = ingest_remote.delay(
source_data=source_config["url"],
job_name=source_config["name"],
@@ -164,9 +220,29 @@ class DatabaseSeeder:
self.logger.error(f"Failed to ingest source: {str(e)}")
return False
def _handle_tools(self, agent_config: Dict) -> List[ObjectId]:
"""Handle tool creation and return list of tool IDs"""
tool_ids = []
@staticmethod
def _find_system_source_by_remote_url(
repo: SourcesRepository, url: str
) -> Optional[dict]:
"""Scan system-owned sources for a row whose remote_data matches ``url``."""
# TODO(migration-postgres): push this into SourcesRepository once a
# remote_data search helper exists; today we keep the scan here to
# stay within this slice's boundaries.
try:
rows = repo.list_for_user(SYSTEM_USER_ID) # type: ignore[attr-defined]
except AttributeError:
return None
for row in rows:
remote = row.get("remote_data")
if remote == url:
return row
if isinstance(remote, dict) and remote.get("url") == url:
return row
return None
def _handle_tools(self, agent_config: Dict) -> List[str]:
"""Handle tool creation and return list of tool ids (UUID strings)."""
tool_ids: List[str] = []
if not agent_config.get("tools"):
return tool_ids
for tool_config in agent_config["tools"]:
@@ -175,37 +251,43 @@ class DatabaseSeeder:
processed_config = self._process_config(tool_config.get("config", {}))
self.logger.info(f"Processing tool: {tool_name}")
existing = self.tools_collection.find_one(
{
"user": self.system_user_id,
"name": tool_name,
"config": processed_config,
}
)
if existing:
self.logger.info(f"Tool already exists: {existing['_id']}")
tool_ids.append(existing["_id"])
continue
tool_data = {
"user": self.system_user_id,
"name": tool_name,
"displayName": tool_config.get("display_name", tool_name),
"description": tool_config.get("description", ""),
"actions": tool_manager.tools[tool_name].get_actions_metadata(),
"config": processed_config,
"status": True,
}
result = self.tools_collection.insert_one(tool_data)
tool_ids.append(result.inserted_id)
self.logger.info(f"Created new tool: {result.inserted_id}")
with db_session() as conn:
repo = UserToolsRepository(conn)
existing = self._find_system_tool(
repo, tool_name, processed_config
)
if existing:
self.logger.info(f"Tool already exists: {existing['id']}")
tool_ids.append(existing["id"])
continue
created = repo.create(
user_id=self.system_user_id,
name=tool_name,
display_name=tool_config.get("display_name", tool_name),
description=tool_config.get("description", ""),
actions=tool_manager.tools[tool_name].get_actions_metadata(),
config=processed_config,
status=True,
)
tool_ids.append(created["id"])
self.logger.info(f"Created new tool: {created['id']}")
except Exception as e:
self.logger.error(f"Failed to process tool {tool_name}: {str(e)}")
continue
return tool_ids
@staticmethod
def _find_system_tool(
repo: UserToolsRepository, name: str, config: dict
) -> Optional[dict]:
"""Locate a system-owned tool by (name, config) among existing rows."""
existing = repo.find_by_user_and_name(SYSTEM_USER_ID, name)
if existing and existing.get("config") == config:
return existing
return None
def _handle_prompt(self, agent_config: Dict) -> Optional[str]:
"""Handle prompt creation and return prompt ID"""
"""Handle prompt creation and return prompt id (UUID string)."""
if not agent_config.get("prompt"):
return None
@@ -222,34 +304,20 @@ class DatabaseSeeder:
self.logger.info(f"Processing prompt: {prompt_name}")
try:
existing = self.prompts_collection.find_one(
{
"user": self.system_user_id,
"name": prompt_name,
"content": prompt_content,
}
)
if existing:
self.logger.info(f"Prompt already exists: {existing['_id']}")
return str(existing["_id"])
prompt_data = {
"name": prompt_name,
"content": prompt_content,
"user": self.system_user_id,
}
result = self.prompts_collection.insert_one(prompt_data)
prompt_id = str(result.inserted_id)
self.logger.info(f"Created new prompt: {prompt_id}")
return prompt_id
with db_session() as conn:
repo = PromptsRepository(conn)
row = repo.find_or_create(
self.system_user_id, prompt_name, prompt_content
)
prompt_id = str(row["id"])
self.logger.info(f"Prompt ready: {prompt_id}")
return prompt_id
except Exception as e:
self.logger.error(f"Failed to process prompt {prompt_name}: {str(e)}")
return None
def _process_config(self, config: Dict) -> Dict:
"""Process config values to replace environment variables"""
"""Process config values to replace environment variables."""
processed = {}
for key, value in config.items():
if (
@@ -264,14 +332,18 @@ class DatabaseSeeder:
return processed
def _is_already_seeded(self) -> bool:
"""Check if premade agents already exist"""
return self.agents_collection.count_documents({"user": self.system_user_id}) > 0
"""Check if premade (system-owned) agents already exist in Postgres."""
with db_readonly() as conn:
repo = AgentsRepository(conn)
return len(repo.list_for_user(SYSTEM_USER_ID)) > 0
@classmethod
def initialize_from_env(cls, worker=None):
"""Factory method to create seeder from environment"""
mongo_uri = os.getenv("MONGO_URI", "mongodb://localhost:27017")
db_name = os.getenv("MONGO_DB_NAME", "docsgpt")
client = MongoClient(mongo_uri)
db = client[db_name]
return cls(db)
"""Factory method to create seeder from environment.
Retained for back compatibility with existing call sites. The
Postgres connection is resolved lazily via the repository layer
(``application.storage.db.engine``), so no explicit wiring is
required here.
"""
return cls()

Some files were not shown because too many files have changed in this diff Show More