mirror of
https://github.com/arc53/DocsGPT.git
synced 2026-05-07 06:30:03 +00:00
Compare commits
35 Commits
0.16.1
...
codex/draf
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
c18f85a050 | ||
|
|
5ecb174567 | ||
|
|
ed7212d016 | ||
|
|
f82acdab5d | ||
|
|
361aebc34c | ||
|
|
bf194c1a0f | ||
|
|
54c396750b | ||
|
|
9adebfec69 | ||
|
|
92c321f163 | ||
|
|
e3d36b9e52 | ||
|
|
8950e11208 | ||
|
|
5de0132a65 | ||
|
|
b92ca91512 | ||
|
|
8e0b2844a2 | ||
|
|
0969db5e30 | ||
|
|
af335a27e8 | ||
|
|
0ae3139284 | ||
|
|
7529ca3dd6 | ||
|
|
1b813320f1 | ||
|
|
02012e9a0b | ||
|
|
c2f027265a | ||
|
|
0ae615c10e | ||
|
|
881d0da344 | ||
|
|
1376de6bae | ||
|
|
362ebfcc0a | ||
|
|
bc77eed3d8 | ||
|
|
1f346588e7 | ||
|
|
2fed5c882b | ||
|
|
aa938d76d7 | ||
|
|
2940628aa6 | ||
|
|
7f23928134 | ||
|
|
20e17c84c7 | ||
|
|
389ddf6068 | ||
|
|
1e2443fb90 | ||
|
|
18e2a829c9 |
99
.github/INCIDENT_RESPONSE.md
vendored
Normal file
99
.github/INCIDENT_RESPONSE.md
vendored
Normal file
@@ -0,0 +1,99 @@
|
||||
# DocsGPT Incident Response Plan (IRP)
|
||||
|
||||
This playbook describes how maintainers respond to confirmed or suspected security incidents.
|
||||
|
||||
- Vulnerability reporting: [`SECURITY.md`](../SECURITY.md)
|
||||
- Non-security bugs/features: [`CONTRIBUTING.md`](../CONTRIBUTING.md)
|
||||
|
||||
## Severity
|
||||
|
||||
| Severity | Definition | Typical examples |
|
||||
|---|---|---|
|
||||
| **Critical** | Active exploitation, supply-chain compromise, or confirmed data breach requiring immediate user action. | Compromised release artifact/image; remote execution. |
|
||||
| **High** | Serious undisclosed vulnerability with no practical workaround, or CVSS >= 7.0. | key leakage; prompt injection enabling cross-tenant access. |
|
||||
| **Medium** | Material impact but constrained by preconditions/scope, or a practical workaround exists. | Auth-required exploit; dependency CVE with limited reachability. |
|
||||
| **Low** | Defense-in-depth or narrow availability impact with no confirmed data exposure. | Missing rate limiting; hardening gap without exploit evidence. |
|
||||
|
||||
|
||||
## Response workflow
|
||||
|
||||
### 1) Triage (target: initial response within 48 hours)
|
||||
|
||||
1. Acknowledge report.
|
||||
2. Validate on latest release and `main`.
|
||||
3. Confirm in-scope security issue vs. hardening item (per `SECURITY.md`).
|
||||
4. Assign severity and open a **draft GitHub Security Advisory (GHSA)** (no public issue).
|
||||
5. Determine whether root cause is DocsGPT code or upstream dependency/provider.
|
||||
|
||||
### 2) Investigation
|
||||
|
||||
1. Identify affected components, versions, and deployment scope (self-hosted, cloud, or both).
|
||||
2. For AI issues, explicitly evaluate prompt injection, document isolation, and output leakage.
|
||||
3. Request a CVE through GHSA for **Medium+** issues.
|
||||
|
||||
### 3) Containment, fix, and disclosure
|
||||
|
||||
1. Implement and test fix in private security workflow (GHSA private fork/branch).
|
||||
2. Merge fix to `main`, cut patched release, and verify published artifacts/images.
|
||||
3. Patch managed cloud deployment (`app.docsgpt.cloud`) and other deployments as soon as validated.
|
||||
4. Publish GHSA with CVE (if assigned), affected/fixed versions, CVSS, mitigations, and upgrade guidance.
|
||||
5. **Critical/High:** coordinate disclosure timing with reporter (goal: <= 90 days) and publish a notice.
|
||||
6. **Medium/Low:** include in next scheduled release unless risk requires immediate out-of-band patching.
|
||||
|
||||
### 4) Post-incident
|
||||
|
||||
1. Monitor support channels (GitHub/Discord) for regressions or exploitation reports.
|
||||
2. Run a short retrospective (root cause, detection, response gaps, prevention work).
|
||||
3. Track follow-up hardening actions with owners/dates.
|
||||
4. Update this IRP and related runbooks as needed.
|
||||
|
||||
## Scenario playbooks
|
||||
|
||||
### Supply-chain compromise
|
||||
|
||||
1. Freeze releases and investigate blast radius.
|
||||
2. Rotate credentials in order: Docker Hub -> GitHub tokens -> LLM provider keys -> DB credentials -> `JWT_SECRET_KEY` -> `ENCRYPTION_SECRET_KEY` -> `INTERNAL_KEY`.
|
||||
3. Replace compromised artifacts/tags with clean releases and revoke/remove bad tags where possible.
|
||||
4. Publish advisory with exact affected versions and required user actions.
|
||||
|
||||
### Data exposure
|
||||
|
||||
1. Determine scope (users, documents, keys, logs, time window).
|
||||
2. Disable affected path or hotfix immediately for managed cloud.
|
||||
3. Notify affected users with concrete remediation steps (for example, rotate keys).
|
||||
4. Continue through standard fix/disclosure workflow.
|
||||
|
||||
### Critical regression with security impact
|
||||
|
||||
1. Identify introducing change (`git bisect` if needed).
|
||||
2. Publish workaround within 24 hours (for example, pin to known-good version).
|
||||
3. Ship patch release with regression test and close incident with public summary.
|
||||
|
||||
## AI-specific guidance
|
||||
|
||||
Treat confirmed AI-specific abuse as security incidents:
|
||||
|
||||
- Prompt injection causing sensitive data exfiltration (from tools that don't belong to the agent) -> **High**
|
||||
- Cross-tenant retrieval/isolation failure -> **High**
|
||||
- API key disclosure in output -> **High**
|
||||
|
||||
## Secret rotation quick reference
|
||||
|
||||
| Secret | Standard rotation action |
|
||||
|---|---|
|
||||
| Docker Hub credentials | Revoke/replace in Docker Hub; update CI/CD secrets |
|
||||
| GitHub tokens/PATs | Revoke/replace in GitHub; update automation secrets |
|
||||
| LLM provider API keys | Rotate in provider console; update runtime/deploy secrets |
|
||||
| Database credentials | Rotate in DB platform; redeploy with new secrets |
|
||||
| `JWT_SECRET_KEY` | Rotate and redeploy (invalidates all active user sessions/tokens) |
|
||||
| `ENCRYPTION_SECRET_KEY` | Rotate and redeploy (re-encrypt stored data if possible; existing encrypted data may become inaccessible) |
|
||||
| `INTERNAL_KEY` | Rotate and redeploy (invalidates worker-to-backend authentication) |
|
||||
|
||||
## Maintenance
|
||||
|
||||
Review this document:
|
||||
|
||||
- after every **Critical/High** incident, and
|
||||
- at least annually.
|
||||
|
||||
Changes should be proposed via pull request to `main`.
|
||||
144
.github/THREAT_MODEL.md
vendored
Normal file
144
.github/THREAT_MODEL.md
vendored
Normal file
@@ -0,0 +1,144 @@
|
||||
# DocsGPT Public Threat Model
|
||||
|
||||
**Classification:** Public
|
||||
**Last updated:** 2026-04-15
|
||||
**Applies to:** Open-source and self-hosted DocsGPT deployments
|
||||
|
||||
## 1) Overview
|
||||
|
||||
DocsGPT ingests content (files/URLs/connectors), indexes it, and answers queries via LLM-backed APIs and optional tools.
|
||||
|
||||
Core components:
|
||||
- Backend API (`application/`)
|
||||
- Workers/ingestion (`application/worker.py` and related modules)
|
||||
- Datastores (MongoDB/Redis/vector stores)
|
||||
- Frontend (`frontend/`)
|
||||
- Optional extensions/integrations (`extensions/`)
|
||||
|
||||
## 2) Scope and assumptions
|
||||
|
||||
In scope:
|
||||
- Application-level threats in this repository.
|
||||
- Local and internet-exposed self-hosted deployments.
|
||||
|
||||
Assumptions:
|
||||
- Internet-facing instances enable auth and use strong secrets.
|
||||
- Datastores/internal services are not publicly exposed.
|
||||
|
||||
Out of scope:
|
||||
- Cloud hardware/provider compromise.
|
||||
- Security guarantees of external LLM vendors.
|
||||
- Full security audits of third-party systems targeted by tools (external DBs/MCP servers/code-exec APIs).
|
||||
|
||||
## 3) Security objectives
|
||||
|
||||
- Protect document/conversation confidentiality.
|
||||
- Preserve integrity of prompts, agents, tools, and indexed data.
|
||||
- Maintain API/worker availability.
|
||||
- Enforce tenant isolation in authenticated deployments.
|
||||
|
||||
## 4) Assets
|
||||
|
||||
- Documents, attachments, chunks/embeddings, summaries.
|
||||
- Conversations, agents, workflows, prompt templates.
|
||||
- Secrets (JWT secret, `INTERNAL_KEY`, provider/API/OAuth credentials).
|
||||
- Operational capacity (worker throughput, queue depth, model quota/cost).
|
||||
|
||||
## 5) Trust boundaries and untrusted input
|
||||
|
||||
Trust boundaries:
|
||||
- Internet ↔ Frontend
|
||||
- Frontend ↔ Backend API
|
||||
- Backend ↔ Workers/internal APIs
|
||||
- Backend/workers ↔ Datastores
|
||||
- Backend ↔ External LLM/connectors/remote URLs
|
||||
|
||||
Untrusted input includes API payloads, file uploads, remote URLs, OAuth/webhook data, retrieved content, and LLM/tool arguments.
|
||||
|
||||
## 6) Main attack surfaces
|
||||
|
||||
1. Auth/authz paths and sharing tokens.
|
||||
2. File upload + parsing pipeline.
|
||||
3. Remote URL fetching and connectors (SSRF risk).
|
||||
4. Agent/tool execution from LLM output.
|
||||
5. Template/workflow rendering.
|
||||
6. Frontend rendering + token storage.
|
||||
7. Internal service endpoints (`INTERNAL_KEY`).
|
||||
8. High-impact integrations (SQL tool, generic API tool, remote MCP tools).
|
||||
|
||||
## 7) Key threats and expected mitigations
|
||||
|
||||
### A. Auth/authz misconfiguration
|
||||
- Threat: weak/no auth or leaked tokens leads to broad data access.
|
||||
- Mitigations: require auth for public deployments, short-lived tokens, rotation/revocation, least-privilege sharing.
|
||||
|
||||
### B. Untrusted file ingestion
|
||||
- Threat: malicious files/archives trigger traversal, parser exploits, or resource exhaustion.
|
||||
- Mitigations: strict path checks, archive safeguards, file limits, patched parser dependencies.
|
||||
|
||||
### C. SSRF/outbound abuse
|
||||
- Threat: URL loaders/tools access private/internal/metadata endpoints.
|
||||
- Mitigations: validate URLs + redirects, block private/link-local ranges, apply egress controls/allowlists.
|
||||
|
||||
### D. Prompt injection + tool abuse
|
||||
- Threat: retrieved text manipulates model behavior and causes unsafe tool calls.
|
||||
- Threat: never rely on the model to "choose correctly" under adversarial input.
|
||||
- Mitigations: treat retrieved/model output as untrusted, enforce tool policies, only expose tools explicitly assigned by the user/admin to that agent, separate system instructions from retrieved content, audit tool calls.
|
||||
|
||||
### E. Dangerous tool capability chaining (SQL/API/MCP)
|
||||
- Threat: write-capable SQL credentials allow destructive queries.
|
||||
- Threat: API tool can trigger side effects (infra/payment/webhook/code-exec endpoints).
|
||||
- Threat: remote MCP tools may expose privileged operations.
|
||||
- Mitigations: read-only-by-default credentials, destination allowlists, explicit approval for write/exec actions, per-tool policy enforcement + logging.
|
||||
|
||||
### F. Frontend/XSS + token theft
|
||||
- Threat: XSS can steal local tokens and call APIs.
|
||||
- Mitigations: reduce unsafe rendering paths, strong CSP, scoped short-lived credentials.
|
||||
|
||||
### G. Internal endpoint exposure
|
||||
- Threat: weak/unset `INTERNAL_KEY` enables internal API abuse.
|
||||
- Mitigations: fail closed, require strong random keys, keep internal APIs private.
|
||||
|
||||
### H. DoS and cost abuse
|
||||
- Threat: request floods, large ingestion jobs, expensive prompts/crawls.
|
||||
- Mitigations: rate limits, quotas, timeouts, queue backpressure, usage budgets.
|
||||
|
||||
## 8) Example attacker stories
|
||||
|
||||
- Internet-exposed deployment runs with weak/no auth and receives unauthorized data access/abuse.
|
||||
- Intranet deployment intentionally using weak/no auth is vulnerable to insider misuse and lateral-movement abuse.
|
||||
- Crafted archive attempts path traversal during extraction.
|
||||
- Malicious URL/redirect chain targets internal services.
|
||||
- Poisoned document causes data exfiltration through tool calls.
|
||||
- Over-privileged SQL/API/MCP tool performs destructive side effects.
|
||||
|
||||
## 9) Severity calibration
|
||||
|
||||
- **Critical:** unauthenticated public data access; prompt-injection-driven exfiltration; SSRF to sensitive internal endpoints.
|
||||
- **High:** cross-tenant leakage, persistent token compromise, over-privileged destructive tools.
|
||||
- **Medium:** DoS/cost amplification and non-critical information disclosure.
|
||||
- **Low:** minor hardening gaps with limited impact.
|
||||
|
||||
## 10) Baseline controls for public deployments
|
||||
|
||||
1. Enforce authentication and secure defaults.
|
||||
2. Set/rotate strong secrets (`JWT`, `INTERNAL_KEY`, encryption keys).
|
||||
3. Restrict CORS and front API with a hardened proxy.
|
||||
4. Add rate limiting/quotas for answer/upload/crawl/token endpoints.
|
||||
5. Enforce URL+redirect SSRF protections and egress restrictions.
|
||||
6. Apply upload/archive/parsing hardening.
|
||||
7. Require least-privilege tool credentials and auditable tool execution.
|
||||
8. Monitor auth failures, tool anomalies, ingestion spikes, and cost anomalies.
|
||||
9. Keep dependencies/images patched and scanned.
|
||||
10. Validate multi-tenant isolation with explicit tests.
|
||||
|
||||
## 11) Maintenance
|
||||
|
||||
Review this model after major auth, ingestion, connector, tool, or workflow changes.
|
||||
|
||||
## References
|
||||
|
||||
- [OWASP Top 10 for LLM Applications](https://owasp.org/www-project-top-10-for-large-language-model-applications/)
|
||||
- [OWASP ASVS](https://owasp.org/www-project-application-security-verification-standard/)
|
||||
- [STRIDE overview](https://learn.microsoft.com/azure/security/develop/threat-modeling-tool-threats)
|
||||
- [DocsGPT SECURITY.md](../SECURITY.md)
|
||||
25
.github/workflows/zizmor.yml
vendored
Normal file
25
.github/workflows/zizmor.yml
vendored
Normal file
@@ -0,0 +1,25 @@
|
||||
name: GitHub Actions Security Analysis
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: ["master"]
|
||||
pull_request:
|
||||
branches: ["**"]
|
||||
|
||||
permissions: {}
|
||||
|
||||
jobs:
|
||||
zizmor:
|
||||
runs-on: ubuntu-latest
|
||||
|
||||
permissions:
|
||||
security-events: write # Required for upload-sarif (used by zizmor-action) to upload SARIF files.
|
||||
|
||||
steps:
|
||||
- name: Checkout repository
|
||||
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
|
||||
with:
|
||||
persist-credentials: false
|
||||
|
||||
- name: Run zizmor 🌈
|
||||
uses: zizmorcore/zizmor-action@71321a20a9ded102f6e9ce5718a2fcec2c4f70d8 # v0.5.2
|
||||
@@ -18,5 +18,5 @@ We aim to acknowledge reports within 48 hours.
|
||||
|
||||
## Incident Handling
|
||||
|
||||
Arc53 maintains internal incident response procedures. If you believe an active exploit is occurring, include **URGENT** in your report subject line.
|
||||
For the public incident response process, see [`INCIDENT_RESPONSE.md`](./.github/INCIDENT_RESPONSE.md). If you believe an active exploit is occurring, include **URGENT** in your report subject line.
|
||||
|
||||
|
||||
@@ -4,7 +4,7 @@ boto3==1.42.83
|
||||
beautifulsoup4==4.14.3
|
||||
cel-python==0.5.0
|
||||
celery==5.6.3
|
||||
cryptography==46.0.6
|
||||
cryptography==46.0.7
|
||||
dataclasses-json==0.6.7
|
||||
defusedxml==0.7.1
|
||||
docling>=2.16.0
|
||||
@@ -29,14 +29,14 @@ jiter==0.13.0
|
||||
jmespath==1.1.0
|
||||
joblib==1.5.3
|
||||
jsonpatch==1.33
|
||||
jsonpointer==3.0.0
|
||||
jsonpointer==3.1.1
|
||||
kombu==5.6.2
|
||||
langchain==1.2.3
|
||||
langchain-community==0.4.1
|
||||
langchain-core==1.2.23
|
||||
langchain-core==1.2.29
|
||||
langchain-openai==1.1.12
|
||||
langchain-text-splitters==1.1.1
|
||||
langsmith==0.7.23
|
||||
langsmith==0.7.31
|
||||
lazy-object-proxy==1.12.0
|
||||
lxml==6.0.2
|
||||
markupsafe==3.0.3
|
||||
@@ -84,7 +84,7 @@ tqdm==4.67.3
|
||||
transformers==5.4.0
|
||||
typing-extensions==4.15.0
|
||||
typing-inspect==0.9.0
|
||||
tzdata==2025.3
|
||||
tzdata==2026.1
|
||||
urllib3==2.6.3
|
||||
vine==5.1.0
|
||||
wcwidth==0.6.0
|
||||
|
||||
1643
extensions/react-widget/package-lock.json
generated
1643
extensions/react-widget/package-lock.json
generated
File diff suppressed because it is too large
Load Diff
@@ -81,7 +81,7 @@
|
||||
"parcel": "^2.16.4",
|
||||
"prettier": "^3.8.1",
|
||||
"process": "^0.11.10",
|
||||
"svgo": "^3.3.3",
|
||||
"svgo": "^4.0.1",
|
||||
"typescript": "^5.3.3"
|
||||
},
|
||||
"publishConfig": {
|
||||
|
||||
696
frontend/package-lock.json
generated
696
frontend/package-lock.json
generated
File diff suppressed because it is too large
Load Diff
@@ -33,12 +33,12 @@
|
||||
"i18next-browser-languagedetector": "^8.2.1",
|
||||
"lodash": "^4.17.21",
|
||||
"lucide-react": "^0.562.0",
|
||||
"mermaid": "^11.12.1",
|
||||
"mermaid": "^11.14.0",
|
||||
"prop-types": "^15.8.1",
|
||||
"radix-ui": "^1.4.3",
|
||||
"react": "^19.1.0",
|
||||
"react-chartjs-2": "^5.3.0",
|
||||
"react-dom": "^19.1.1",
|
||||
"react-dom": "^19.2.5",
|
||||
"react-dropzone": "^14.3.8",
|
||||
"react-google-drive-picker": "^1.2.2",
|
||||
"react-i18next": "^17.0.2",
|
||||
@@ -53,10 +53,10 @@
|
||||
"tailwind-merge": "^3.4.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@tailwindcss/postcss": "^4.1.10",
|
||||
"@tailwindcss/postcss": "^4.2.2",
|
||||
"@types/lodash": "^4.17.20",
|
||||
"@types/react": "^19.1.8",
|
||||
"@types/react-dom": "^19.1.7",
|
||||
"@types/react-dom": "^19.2.3",
|
||||
"@types/react-syntax-highlighter": "^15.5.13",
|
||||
"@typescript-eslint/eslint-plugin": "^8.58.2",
|
||||
"@typescript-eslint/parser": "^8.46.3",
|
||||
@@ -64,7 +64,7 @@
|
||||
"eslint": "^9.39.1",
|
||||
"eslint-config-prettier": "^10.1.5",
|
||||
"eslint-plugin-import": "^2.31.0",
|
||||
"eslint-plugin-n": "^17.23.1",
|
||||
"eslint-plugin-n": "^17.24.0",
|
||||
"eslint-plugin-prettier": "^5.5.4",
|
||||
"eslint-plugin-promise": "^6.6.0",
|
||||
"eslint-plugin-react": "^7.37.5",
|
||||
@@ -74,7 +74,7 @@
|
||||
"postcss": "^8.4.49",
|
||||
"prettier": "^3.5.3",
|
||||
"prettier-plugin-tailwindcss": "^0.7.2",
|
||||
"tailwindcss": "^4.2.1",
|
||||
"tailwindcss": "^4.2.2",
|
||||
"tw-animate-css": "^1.4.0",
|
||||
"typescript": "^5.8.3",
|
||||
"vite": "^8.0.0",
|
||||
|
||||
Reference in New Issue
Block a user