feat(agent): implement context validation and message truncation (#2249)

2026-01-20 05:50:58 +00:00 · 2026-01-05 17:49:28 +00:00
parent d3e9d66b07
commit 5662be12b5
3 changed files with 194 additions and 1 deletions
--- a/docs/pages/Guides/_meta.json
+++ b/docs/pages/Guides/_meta.json
@@ -21,6 +21,10 @@
    "title": "🏗️ Architecture",
    "href": "/Guides/Architecture"
  },
+  "compression": {
+    "title": "🗜️ Context Compression",
+    "href": "/Guides/compression"
+  },
  "Integrations": {
    "title": "🔗 Integrations"
  }
--- a/docs/pages/Guides/compression.md
+++ b/docs/pages/Guides/compression.md
@@ -0,0 +1,37 @@
+# Context Compression
+
+DocsGPT implements a smart context compression system to manage long conversations effectively. This feature prevents conversations from hitting the LLM's context window limit while preserving critical information and continuity.
+
+## How It Works
+
+The compression system operates on a "summarize and truncate" principle:
+
+1.  **Threshold Check**: Before each request, the system calculates the total token count of the conversation history.
+2.  **Trigger**: If the token count exceeds a configured threshold (default: 80% of the model's context limit), compression is triggered.
+3.  **Summarization**: An LLM (potentially a different, cheaper/faster one) processes the older part of the conversation—including previous summaries, user messages, agent responses, and tool outputs.
+4.  **Context Replacement**: The system generates a comprehensive summary of the older history. For subsequent requests, the LLM receives this **Summary + Recent Messages** instead of the full raw history.
+
+### Key Features
+
+*   **Recursive Summarization**: New summaries incorporate previous summaries, ensuring that information from the very beginning of a long chat is not lost.
+*   **Tool Call Support**: The compression logic explicitly handles tool calls and their outputs (e.g., file readings, search results), summarizing their results so the agent retains knowledge of what it has already done.
+*   **"Needle in a Haystack" Preservation**: The prompts are designed to identify and preserve specific, critical details (like passwords, keys, or specific user instructions) even when compressing large amounts of text.
+
+## Configuration
+
+You can configure the compression behavior in your `.env` file or `application/core/settings.py`:
+
+| Setting | Default | Description |
+| :--- | :--- | :--- |
+| `ENABLE_CONVERSATION_COMPRESSION` | `True` | Master switch to enable/disable the feature. |
+| `COMPRESSION_THRESHOLD_PERCENTAGE` | `0.8` | The fraction of the context window (0.0 to 1.0) that triggers compression. |
+| `COMPRESSION_MODEL_OVERRIDE` | `None` | (Optional) Specify a different model ID to use specifically for the summarization task (e.g., using `gpt-3.5-turbo` to compress for `gpt-4`). |
+| `COMPRESSION_MAX_HISTORY_POINTS` | `3` | The number of past compression points to keep in the database (older ones are discarded as they are incorporated into newer summaries). |
+
+## Architecture
+
+The system is modularized into several components:
+
+*   **`CompressionThresholdChecker`**: Calculates token usage and decides when to compress.
+*   **`CompressionService`**: Orchestrates the compression process, manages DB updates, and reconstructs the context (Summary + Recent Messages) for the LLM.
+*   **`CompressionPromptBuilder`**: Constructs the specific prompts used to instruct the LLM to summarize the conversation effectively.