feat(agent): implement context validation and message truncation (#2249)

This commit is contained in:
Alex
2026-01-05 17:49:28 +00:00
committed by GitHub
parent d3e9d66b07
commit 5662be12b5
3 changed files with 194 additions and 1 deletions

View File

@@ -21,6 +21,10 @@
"title": "🏗️ Architecture",
"href": "/Guides/Architecture"
},
"compression": {
"title": "🗜️ Context Compression",
"href": "/Guides/compression"
},
"Integrations": {
"title": "🔗 Integrations"
}

View File

@@ -0,0 +1,37 @@
# Context Compression
DocsGPT implements a smart context compression system to manage long conversations effectively. This feature prevents conversations from hitting the LLM's context window limit while preserving critical information and continuity.
## How It Works
The compression system operates on a "summarize and truncate" principle:
1. **Threshold Check**: Before each request, the system calculates the total token count of the conversation history.
2. **Trigger**: If the token count exceeds a configured threshold (default: 80% of the model's context limit), compression is triggered.
3. **Summarization**: An LLM (potentially a different, cheaper/faster one) processes the older part of the conversation—including previous summaries, user messages, agent responses, and tool outputs.
4. **Context Replacement**: The system generates a comprehensive summary of the older history. For subsequent requests, the LLM receives this **Summary + Recent Messages** instead of the full raw history.
### Key Features
* **Recursive Summarization**: New summaries incorporate previous summaries, ensuring that information from the very beginning of a long chat is not lost.
* **Tool Call Support**: The compression logic explicitly handles tool calls and their outputs (e.g., file readings, search results), summarizing their results so the agent retains knowledge of what it has already done.
* **"Needle in a Haystack" Preservation**: The prompts are designed to identify and preserve specific, critical details (like passwords, keys, or specific user instructions) even when compressing large amounts of text.
## Configuration
You can configure the compression behavior in your `.env` file or `application/core/settings.py`:
| Setting | Default | Description |
| :--- | :--- | :--- |
| `ENABLE_CONVERSATION_COMPRESSION` | `True` | Master switch to enable/disable the feature. |
| `COMPRESSION_THRESHOLD_PERCENTAGE` | `0.8` | The fraction of the context window (0.0 to 1.0) that triggers compression. |
| `COMPRESSION_MODEL_OVERRIDE` | `None` | (Optional) Specify a different model ID to use specifically for the summarization task (e.g., using `gpt-3.5-turbo` to compress for `gpt-4`). |
| `COMPRESSION_MAX_HISTORY_POINTS` | `3` | The number of past compression points to keep in the database (older ones are discarded as they are incorporated into newer summaries). |
## Architecture
The system is modularized into several components:
* **`CompressionThresholdChecker`**: Calculates token usage and decides when to compress.
* **`CompressionService`**: Orchestrates the compression process, manages DB updates, and reconstructs the context (Summary + Recent Messages) for the LLM.
* **`CompressionPromptBuilder`**: Constructs the specific prompts used to instruct the LLM to summarize the conversation effectively.