fix: add runtime model contextTokens caps

2026-04-20 21:23:23 +00:00 · 2026-04-04 09:35:59 +09:00
parent 45675c1698
commit 58d2b9dd46
25 changed files with 350 additions and 52 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -22,6 +22,7 @@ Docs: https://docs.openclaw.ai
 ### Fixes

 - Providers/OpenAI: preserve native `reasoning.effort: "none"` and strict tool schemas on direct OpenAI-family endpoints, keep OpenAI-compatible proxies on the older compat shim path, and enable OpenAI WebSocket warm-up by default for native Responses routes.
+- Providers/OpenAI Codex: split native `contextWindow` from runtime `contextTokens` for `openai-codex/gpt-5.4`, keep the default effective cap at `272000`, and expose a per-model config override via `models.providers.*.models[].contextTokens`.
 - Skills/uv install: block workspace `.env` from overriding `UV_PYTHON` and strip related interpreter override keys from uv skill-install subprocesses so repository-controlled env files cannot steer the selected Python runtime. (#59178) Thanks @pgondhi987.
 - Telegram/reactions: preserve `reactionNotifications: "own"` across gateway restarts by persisting sent-message ownership state instead of treating cold cache as a permissive fallback. (#59207) Thanks @samzong.
 - Gateway/startup: detect PID recycling in gateway lock files on Windows and macOS, and add startup progress so stale lock conflicts no longer block healthy restarts. (#59843) Thanks @TonyDerek-dot.
--- a/docs/concepts/model-providers.md
+++ b/docs/concepts/model-providers.md
@@ -18,6 +18,8 @@ For model selection rules, see [/concepts/models](/concepts/models).
 - CLI helpers: `openclaw onboard`, `openclaw models list`, `openclaw models set <provider/model>`.
 - Fallback runtime rules, cooldown probes, and session-override persistence are
  documented in [/concepts/model-failover](/concepts/model-failover).
+- `models.providers.*.models[].contextWindow` is native model metadata;
+  `models.providers.*.models[].contextTokens` is the effective runtime cap.
 - Provider plugins can inject model catalogs via `registerProvider({ catalog })`;
  OpenClaw merges that output into `models.providers` before writing
  `models.json`.
@@ -187,6 +189,7 @@ OpenClaw ships with the pi‑ai catalog. These providers require **no**
 - `params.serviceTier` is also forwarded on native Codex Responses requests (`chatgpt.com/backend-api`)
 - Shares the same `/fast` toggle and `params.fastMode` config as direct `openai/*`; OpenClaw maps that to `service_tier=priority`
 - `openai-codex/gpt-5.3-codex-spark` remains available when the Codex OAuth catalog exposes it; entitlement-dependent
+- `openai-codex/gpt-5.4` keeps native `contextWindow = 1050000` and a default runtime `contextTokens = 272000`; override the runtime cap with `models.providers.openai-codex.models[].contextTokens`
 - Policy note: OpenAI Codex OAuth is explicitly supported for external tools/workflows like OpenClaw.

 ```json5
@@ -195,6 +198,18 @@ OpenClaw ships with the pi‑ai catalog. These providers require **no**
 }
 ```

+```json5
+{
+  models: {
+    providers: {
+      "openai-codex": {
+        models: [{ id: "gpt-5.4", contextTokens: 160000 }],
+      },
+    },
+  },
+}
+```
+
 ### Other subscription-style hosted options

 - [Qwen / Model Studio](/providers/qwen_modelstudio): Alibaba Cloud Standard pay-as-you-go and Coding Plan subscription endpoints
--- a/docs/gateway/configuration-reference.md
+++ b/docs/gateway/configuration-reference.md
@@ -2186,6 +2186,7 @@ OpenClaw uses the built-in model catalog. Add custom providers via `models.provi
            input: ["text"],
            cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
            contextWindow: 128000,
+            contextTokens: 96000,
            maxTokens: 32000,
          },
        ],
@@ -2204,6 +2205,7 @@ OpenClaw uses the built-in model catalog. Add custom providers via `models.provi
  - SecretRef-managed provider header values are refreshed from source markers (`secretref-env:ENV_VAR_NAME` for env refs, `secretref-managed` for file/exec refs).
  - Empty or missing agent `apiKey`/`baseUrl` fall back to `models.providers` in config.
  - Matching model `contextWindow`/`maxTokens` use the higher value between explicit config and implicit catalog values.
+  - Matching model `contextTokens` preserves an explicit runtime cap when present; use it to limit effective context without changing native model metadata.
  - Use `models.mode: "replace"` when you want config to fully rewrite `models.json`.
  - Marker persistence is source-authoritative: markers are written from the active source config snapshot (pre-resolution), not from resolved runtime secret values.

@@ -2219,6 +2221,8 @@ OpenClaw uses the built-in model catalog. Add custom providers via `models.provi
 - `models.providers.*.baseUrl`: upstream API base URL.
 - `models.providers.*.headers`: extra static headers for proxy/tenant routing.
 - `models.providers.*.models`: explicit provider model catalog entries.
+- `models.providers.*.models.*.contextWindow`: native model context window metadata.
+- `models.providers.*.models.*.contextTokens`: optional runtime context cap. Use this when you want a smaller effective context budget than the model's native `contextWindow`.
 - `models.providers.*.models.*.compat.supportsDeveloperRole`: optional compatibility hint. For `api: "openai-completions"` with a non-empty non-native `baseUrl` (host not `api.openai.com`), OpenClaw forces this to `false` at runtime. Empty/omitted `baseUrl` keeps default OpenAI behavior.
 - `models.bedrockDiscovery`: Bedrock auto-discovery settings root.
 - `models.bedrockDiscovery.enabled`: turn discovery polling on/off.
--- a/docs/providers/openai.md
+++ b/docs/providers/openai.md
@@ -143,6 +143,41 @@ discovers it. Treat it as entitlement-dependent and experimental: Codex Spark is
 separate from GPT-5.4 `/fast`, and availability depends on the signed-in Codex /
 ChatGPT account.

+### Codex context window cap
+
+OpenClaw treats the Codex model metadata and the runtime context cap as separate
+values.
+
+For `openai-codex/gpt-5.4`:
+
+- native `contextWindow`: `1050000`
+- default runtime `contextTokens` cap: `272000`
+
+That keeps model metadata truthful while preserving the smaller default runtime
+window that has better latency and quality characteristics in practice.
+
+If you want a different effective cap, set `models.providers.<provider>.models[].contextTokens`:
+
+```json5
+{
+  models: {
+    providers: {
+      "openai-codex": {
+        models: [
+          {
+            id: "gpt-5.4",
+            contextTokens: 160000,
+          },
+        ],
+      },
+    },
+  },
+}
+```
+
+Use `contextWindow` only when you are declaring or overriding native model
+metadata. Use `contextTokens` when you want to limit the runtime context budget.
+
 ### Transport default

 OpenClaw uses `pi-ai` for model streaming. For both `openai/*` and
--- a/extensions/openai/openai-codex-provider.test.ts
+++ b/extensions/openai/openai-codex-provider.test.ts
@@ -86,4 +86,69 @@ describe("openai codex provider", () => {
      "Deprecated profile. Run `openclaw models auth login --provider openai-codex` or `openclaw configure`.",
    );
  });
+
+  it("resolves gpt-5.4 with native contextWindow plus default contextTokens cap", () => {
+    const provider = buildOpenAICodexProviderPlugin();
+
+    const model = provider.resolveDynamicModel?.({
+      provider: "openai-codex",
+      modelId: "gpt-5.4",
+      modelRegistry: {
+        find: vi.fn((providerId: string, modelId: string) => {
+          if (providerId === "openai-codex" && modelId === "gpt-5.3-codex") {
+            return {
+              id: "gpt-5.3-codex",
+              name: "gpt-5.3-codex",
+              provider: "openai-codex",
+              api: "openai-codex-responses",
+              baseUrl: "https://chatgpt.com/backend-api",
+              reasoning: true,
+              input: ["text", "image"],
+              cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+              contextWindow: 272_000,
+              maxTokens: 128_000,
+            };
+          }
+          return null;
+        }),
+      },
+    });
+
+    expect(model).toMatchObject({
+      id: "gpt-5.4",
+      contextWindow: 1_050_000,
+      contextTokens: 272_000,
+      maxTokens: 128_000,
+    });
+  });
+
+  it("augments catalog with gpt-5.4 native contextWindow and runtime cap", () => {
+    const provider = buildOpenAICodexProviderPlugin();
+
+    const entries = provider.augmentModelCatalog?.({
+      provider: "openai-codex",
+      entries: [
+        {
+          id: "gpt-5.3-codex",
+          name: "gpt-5.3-codex",
+          provider: "openai-codex",
+          api: "openai-codex-responses",
+          baseUrl: "https://chatgpt.com/backend-api",
+          reasoning: true,
+          input: ["text", "image"],
+          cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+          contextWindow: 272_000,
+          maxTokens: 128_000,
+        },
+      ],
+    });
+
+    expect(entries).toContainEqual(
+      expect.objectContaining({
+        id: "gpt-5.4",
+        contextWindow: 1_050_000,
+        contextTokens: 272_000,
+      }),
+    );
+  });
 });
--- a/extensions/openai/openai-codex-provider.ts
+++ b/extensions/openai/openai-codex-provider.ts
@@ -33,7 +33,8 @@ import { wrapOpenAICodexProviderStream } from "./stream-hooks.js";
 const PROVIDER_ID = "openai-codex";
 const OPENAI_CODEX_BASE_URL = "https://chatgpt.com/backend-api";
 const OPENAI_CODEX_GPT_54_MODEL_ID = "gpt-5.4";
-const OPENAI_CODEX_GPT_54_CONTEXT_TOKENS = 400_000;
+const OPENAI_CODEX_GPT_54_NATIVE_CONTEXT_TOKENS = 1_050_000;
+const OPENAI_CODEX_GPT_54_DEFAULT_CONTEXT_TOKENS = 272_000;
 const OPENAI_CODEX_GPT_54_MAX_TOKENS = 128_000;
 const OPENAI_CODEX_GPT_54_COST = {
  input: 2.5,
@@ -100,7 +101,8 @@ function resolveCodexForwardCompatModel(
  if (lower === OPENAI_CODEX_GPT_54_MODEL_ID) {
    templateIds = OPENAI_CODEX_GPT_54_TEMPLATE_MODEL_IDS;
    patch = {
-      contextWindow: OPENAI_CODEX_GPT_54_CONTEXT_TOKENS,
+      contextWindow: OPENAI_CODEX_GPT_54_NATIVE_CONTEXT_TOKENS,
+      contextTokens: OPENAI_CODEX_GPT_54_DEFAULT_CONTEXT_TOKENS,
      maxTokens: OPENAI_CODEX_GPT_54_MAX_TOKENS,
      cost: OPENAI_CODEX_GPT_54_COST,
    };
@@ -140,6 +142,7 @@ function resolveCodexForwardCompatModel(
      input: ["text", "image"],
      cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
      contextWindow: patch?.contextWindow ?? DEFAULT_CONTEXT_TOKENS,
+      contextTokens: patch?.contextTokens,
      maxTokens: patch?.maxTokens ?? DEFAULT_CONTEXT_TOKENS,
    } as ProviderRuntimeModel)
  );
@@ -217,6 +220,7 @@ function buildSyntheticCatalogEntry(
    reasoning: boolean;
    input: readonly ("text" | "image")[];
    contextWindow: number;
+    contextTokens?: number;
  },
 ) {
  if (!template) {
@@ -229,6 +233,7 @@ function buildSyntheticCatalogEntry(
    reasoning: entry.reasoning,
    input: [...entry.input],
    contextWindow: entry.contextWindow,
+    ...(entry.contextTokens === undefined ? {} : { contextTokens: entry.contextTokens }),
  };
 }

@@ -312,7 +317,8 @@ export function buildOpenAICodexProviderPlugin(): ProviderPlugin {
          id: OPENAI_CODEX_GPT_54_MODEL_ID,
          reasoning: true,
          input: ["text", "image"],
-          contextWindow: OPENAI_CODEX_GPT_54_CONTEXT_TOKENS,
+          contextWindow: OPENAI_CODEX_GPT_54_NATIVE_CONTEXT_TOKENS,
+          contextTokens: OPENAI_CODEX_GPT_54_DEFAULT_CONTEXT_TOKENS,
        }),
        buildSyntheticCatalogEntry(sparkTemplate, {
          id: OPENAI_CODEX_GPT_53_SPARK_MODEL_ID,
--- a/src/agents/compaction.ts
+++ b/src/agents/compaction.ts
@@ -524,5 +524,7 @@ export function pruneHistoryForContextShare(params: {
 }

 export function resolveContextWindowTokens(model?: ExtensionContext["model"]): number {
-  return Math.max(1, Math.floor(model?.contextWindow ?? DEFAULT_CONTEXT_TOKENS));
+  const effective =
+    (model as { contextTokens?: number } | undefined)?.contextTokens ?? model?.contextWindow;
+  return Math.max(1, Math.floor(effective ?? DEFAULT_CONTEXT_TOKENS));
 }
--- a/src/agents/context-window-guard.test.ts
+++ b/src/agents/context-window-guard.test.ts
@@ -85,6 +85,45 @@ describe("context-window-guard", () => {
    expect(guard.shouldBlock).toBe(true);
  });

+  it("prefers models.providers.*.models[].contextTokens over contextWindow", () => {
+    const cfg = {
+      models: {
+        providers: {
+          openrouter: {
+            baseUrl: "http://localhost",
+            apiKey: "x",
+            models: [
+              {
+                id: "tiny",
+                name: "tiny",
+                reasoning: false,
+                input: ["text"],
+                cost: { input: 0, output: 0, cacheRead: 0, cacheWrite: 0 },
+                contextWindow: 1_050_000,
+                contextTokens: 12_000,
+                maxTokens: 256,
+              },
+            ],
+          },
+        },
+      },
+    } satisfies OpenClawConfig;
+
+    const info = resolveContextWindowInfo({
+      cfg,
+      provider: "openrouter",
+      modelId: "tiny",
+      modelContextWindow: 64_000,
+      modelContextTokens: 48_000,
+      defaultTokens: 200_000,
+    });
+
+    expect(info).toEqual({
+      source: "modelsConfig",
+      tokens: 12_000,
+    });
+  });
+
  it("normalizes provider aliases when reading models config context windows", () => {
    const cfg = {
      models: {
--- a/src/agents/context-window-guard.ts
+++ b/src/agents/context-window-guard.ts
@@ -23,19 +23,25 @@ export function resolveContextWindowInfo(params: {
  cfg: OpenClawConfig | undefined;
  provider: string;
  modelId: string;
+  modelContextTokens?: number;
  modelContextWindow?: number;
  defaultTokens: number;
 }): ContextWindowInfo {
  const fromModelsConfig = (() => {
    const providers = params.cfg?.models?.providers as
-      | Record<string, { models?: Array<{ id?: string; contextWindow?: number }> }>
+      | Record<
+          string,
+          { models?: Array<{ id?: string; contextTokens?: number; contextWindow?: number }> }
+        >
      | undefined;
    const providerEntry = findNormalizedProviderValue(providers, params.provider);
    const models = Array.isArray(providerEntry?.models) ? providerEntry.models : [];
    const match = models.find((m) => m?.id === params.modelId);
-    return normalizePositiveInt(match?.contextWindow);
+    return normalizePositiveInt(match?.contextTokens) ?? normalizePositiveInt(match?.contextWindow);
  })();
-  const fromModel = normalizePositiveInt(params.modelContextWindow);
+  const fromModel =
+    normalizePositiveInt(params.modelContextTokens) ??
+    normalizePositiveInt(params.modelContextWindow);
  const baseInfo = fromModelsConfig
    ? { tokens: fromModelsConfig, source: "modelsConfig" as const }
    : fromModel
--- a/src/agents/context.lookup.test.ts
+++ b/src/agents/context.lookup.test.ts
@@ -1,6 +1,6 @@
 import { afterEach, beforeEach, describe, expect, it, vi } from "vitest";

-type DiscoveredModel = { id: string; contextWindow: number };
+type DiscoveredModel = { id: string; contextWindow?: number; contextTokens?: number };
 type ContextModule = typeof import("./context.js");

 function mockContextDeps(params: {
@@ -120,6 +120,21 @@ describe("lookupContextTokens", () => {
    );
  });

+  it("prefers config contextTokens over contextWindow on first lookup", async () => {
+    mockContextModuleDeps(() => ({
+      models: {
+        providers: {
+          "openai-codex": {
+            models: [{ id: "gpt-5.4", contextWindow: 1_050_000, contextTokens: 272_000 }],
+          },
+        },
+      },
+    }));
+
+    const { lookupContextTokens } = await importContextModule();
+    expect(lookupContextTokens("gpt-5.4", { allowAsyncLoad: false })).toBe(272_000);
+  });
+
  it("rehydrates config-backed cache entries after module reload when runtime config survives", async () => {
    const firstLoadConfigMock = vi.fn(() => ({
      models: {
--- a/src/agents/context.test.ts
+++ b/src/agents/context.test.ts
@@ -38,6 +38,16 @@ describe("applyDiscoveredContextWindows", () => {
    expect(cache.get("github-copilot/gemini-3.1-pro-preview")).toBe(128_000);
    expect(cache.get("google-gemini-cli/gemini-3.1-pro-preview")).toBe(1_048_576);
  });
+
+  it("prefers discovered contextTokens over contextWindow", () => {
+    const cache = new Map<string, number>();
+    applyDiscoveredContextWindows({
+      cache,
+      models: [{ id: "gpt-5.4", contextWindow: 1_050_000, contextTokens: 272_000 }],
+    });
+
+    expect(cache.get("gpt-5.4")).toBe(272_000);
+  });
 });

 describe("applyConfiguredContextWindows", () => {
@@ -107,6 +117,22 @@ describe("applyConfiguredContextWindows", () => {
    expect(cache.get("custom/model")).toBe(150_000);
    expect(cache.has("bad/model")).toBe(false);
  });
+
+  it("prefers configured contextTokens over contextWindow", () => {
+    const cache = new Map<string, number>();
+    applyConfiguredContextWindows({
+      cache,
+      modelsConfig: {
+        providers: {
+          openrouter: {
+            models: [{ id: "custom/model", contextWindow: 1_050_000, contextTokens: 200_000 }],
+          },
+        },
+      },
+    });
+
+    expect(cache.get("custom/model")).toBe(200_000);
+  });
 });

 describe("createSessionManagerRuntimeRegistry", () => {
@@ -192,4 +218,23 @@ describe("resolveContextTokensForModel", () => {

    expect(result).toBe(200_000);
  });
+
+  it("prefers per-model contextTokens config over contextWindow", () => {
+    const result = resolveContextTokensForModel({
+      cfg: {
+        models: {
+          providers: {
+            "openai-codex": {
+              models: [{ id: "gpt-5.4", contextWindow: 1_050_000, contextTokens: 160_000 }],
+            },
+          },
+        },
+      },
+      provider: "openai-codex",
+      model: "gpt-5.4",
+      fallbackContextTokens: 272_000,
+    });
+
+    expect(result).toBe(160_000);
+  });
 });
--- a/src/agents/context.ts
+++ b/src/agents/context.ts
@@ -13,12 +13,12 @@ import { normalizeProviderId } from "./model-selection.js";

 export { resetContextWindowCacheForTest } from "./context-runtime-state.js";

-type ModelEntry = { id: string; contextWindow?: number };
+type ModelEntry = { id: string; contextWindow?: number; contextTokens?: number };
 type ModelRegistryLike = {
  getAvailable?: () => ModelEntry[];
  getAll: () => ModelEntry[];
 };
-type ConfigModelEntry = { id?: string; contextWindow?: number };
+type ConfigModelEntry = { id?: string; contextWindow?: number; contextTokens?: number };
 type ProviderConfigEntry = { models?: ConfigModelEntry[] };
 type ModelsConfig = { providers?: Record<string, ProviderConfigEntry | undefined> };
 type AgentModelEntry = { params?: Record<string, unknown> };
@@ -40,20 +40,20 @@ export function applyDiscoveredContextWindows(params: {
    if (!model?.id) {
      continue;
    }
-    const contextWindow =
-      typeof model.contextWindow === "number" ? Math.trunc(model.contextWindow) : undefined;
-    if (!contextWindow || contextWindow <= 0) {
+    const contextTokens =
+      typeof model.contextTokens === "number"
+        ? Math.trunc(model.contextTokens)
+        : typeof model.contextWindow === "number"
+          ? Math.trunc(model.contextWindow)
+          : undefined;
+    if (!contextTokens || contextTokens <= 0) {
      continue;
    }
    const existing = params.cache.get(model.id);
-    // When the same bare model id appears under multiple providers with different
-    // limits, keep the smaller window. This cache feeds both display paths and
-    // runtime paths (flush thresholds, session context-token persistence), so
-    // overestimating the limit could delay compaction and cause context overflow.
-    // Callers that know the active provider should use resolveContextTokensForModel,
-    // which tries the provider-qualified key first and falls back here.
-    if (existing === undefined || contextWindow < existing) {
-      params.cache.set(model.id, contextWindow);
+    // Cache the most conservative effective limit. Provider/runtime callers that
+    // know the active provider should still prefer qualified lookups first.
+    if (existing === undefined || contextTokens < existing) {
+      params.cache.set(model.id, contextTokens);
    }
  }
 }
@@ -72,12 +72,16 @@ export function applyConfiguredContextWindows(params: {
    }
    for (const model of provider.models) {
      const modelId = typeof model?.id === "string" ? model.id : undefined;
-      const contextWindow =
-        typeof model?.contextWindow === "number" ? model.contextWindow : undefined;
-      if (!modelId || !contextWindow || contextWindow <= 0) {
+      const contextTokens =
+        typeof model?.contextTokens === "number"
+          ? model.contextTokens
+          : typeof model?.contextWindow === "number"
+            ? model.contextWindow
+            : undefined;
+      if (!modelId || !contextTokens || contextTokens <= 0) {
        continue;
      }
-      params.cache.set(modelId, contextWindow);
+      params.cache.set(modelId, contextTokens);
    }
  }
 }
@@ -307,12 +311,12 @@ function resolveProviderModelRef(params: {
  return { provider, model };
 }

-// Look up an explicit contextWindow override for a specific provider+model
+// Look up an explicit runtime context cap for a specific provider+model
 // directly from config, without going through the shared discovery cache.
 // This avoids the cache keyspace collision where "provider/model" synthetic
 // keys overlap with raw slash-containing model IDs (e.g. OpenRouter's
 // "google/gemini-2.5-pro" stored as a raw catalog entry).
-function resolveConfiguredProviderContextWindow(
+function resolveConfiguredProviderContextTokens(
  cfg: OpenClawConfig | undefined,
  provider: string,
  model: string,
@@ -324,8 +328,8 @@ function resolveConfiguredProviderContextWindow(

  // Mirror the lookup order in pi-embedded-runner/model.ts: exact key first,
  // then normalized fallback. This prevents alias collisions from picking the
-  // wrong contextWindow based on Object.entries iteration order.
-  function findContextWindow(matchProviderId: (id: string) => boolean): number | undefined {
+  // wrong configured cap based on Object.entries iteration order.
+  function findContextTokens(matchProviderId: (id: string) => boolean): number | undefined {
    for (const [providerId, providerConfig] of Object.entries(providers!)) {
      if (!matchProviderId(providerId)) {
        continue;
@@ -334,13 +338,19 @@ function resolveConfiguredProviderContextWindow(
        continue;
      }
      for (const m of providerConfig.models) {
+        const contextTokens =
+          typeof m?.contextTokens === "number"
+            ? m.contextTokens
+            : typeof m?.contextWindow === "number"
+              ? m.contextWindow
+              : undefined;
        if (
          typeof m?.id === "string" &&
          m.id === model &&
-          typeof m?.contextWindow === "number" &&
-          m.contextWindow > 0
+          typeof contextTokens === "number" &&
+          contextTokens > 0
        ) {
-          return m.contextWindow;
+          return contextTokens;
        }
      }
    }
@@ -348,14 +358,14 @@ function resolveConfiguredProviderContextWindow(
  }

  // 1. Exact match (case-insensitive, no alias expansion).
-  const exactResult = findContextWindow((id) => id.trim().toLowerCase() === provider.toLowerCase());
+  const exactResult = findContextTokens((id) => id.trim().toLowerCase() === provider.toLowerCase());
  if (exactResult !== undefined) {
    return exactResult;
  }

  // 2. Normalized fallback: covers alias keys such as "z.ai" → "zai".
  const normalizedProvider = normalizeProviderId(provider);
-  return findContextWindow((id) => normalizeProviderId(id) === normalizedProvider);
+  return findContextTokens((id) => normalizeProviderId(id) === normalizedProvider);
 }

 function isAnthropic1MModel(provider: string, model: string): boolean {
@@ -399,7 +409,7 @@ export function resolveContextTokensForModel(params: {
    // window and misreport context limits for the OpenRouter session.
    // See status.ts log-usage fallback which calls with only { model } set.
    if (explicitProvider) {
-      const configuredWindow = resolveConfiguredProviderContextWindow(
+      const configuredWindow = resolveConfiguredProviderContextTokens(
        params.cfg,
        explicitProvider,
        ref.model,
--- a/src/agents/models-config.merge.ts
+++ b/src/agents/models-config.merge.ts
@@ -85,6 +85,11 @@ export function mergeProviderModels(
      explicitValue: explicitModel.contextWindow,
      implicitValue: implicitModel.contextWindow,
    });
+    const contextTokens = resolvePreferredTokenLimit({
+      explicitPresent: "contextTokens" in explicitModel,
+      explicitValue: explicitModel.contextTokens,
+      implicitValue: implicitModel.contextTokens,
+    });
    const maxTokens = resolvePreferredTokenLimit({
      explicitPresent: "maxTokens" in explicitModel,
      explicitValue: explicitModel.maxTokens,
@@ -96,6 +101,7 @@ export function mergeProviderModels(
      input: implicitModel.input,
      reasoning: "reasoning" in explicitModel ? explicitModel.reasoning : implicitModel.reasoning,
      ...(contextWindow === undefined ? {} : { contextWindow }),
+      ...(contextTokens === undefined ? {} : { contextTokens }),
      ...(maxTokens === undefined ? {} : { maxTokens }),
    };
  });
--- a/src/agents/pi-embedded-runner/compact.ts
+++ b/src/agents/pi-embedded-runner/compact.ts
@@ -441,6 +441,7 @@ export async function compactEmbeddedPiSessionDirect(
      cfg: params.config,
      provider,
      modelId,
+      modelContextTokens: runtimeModel.contextTokens,
      modelContextWindow: runtimeModel.contextWindow,
      defaultTokens: DEFAULT_CONTEXT_TOKENS,
    });
@@ -1031,6 +1032,7 @@ export async function compactEmbeddedPiSession(
          cfg: params.config,
          provider: ceProvider,
          modelId: ceModelId,
+          modelContextTokens: ceModel?.contextTokens,
          modelContextWindow: ceModel?.contextWindow,
          defaultTokens: DEFAULT_CONTEXT_TOKENS,
        });
--- a/src/agents/pi-embedded-runner/extensions.ts
+++ b/src/agents/pi-embedded-runner/extensions.ts
@@ -1,6 +1,6 @@
-import type { Api, Model } from "@mariozechner/pi-ai";
 import type { ExtensionFactory, SessionManager } from "@mariozechner/pi-coding-agent";
 import type { OpenClawConfig } from "../../config/config.js";
+import type { ProviderRuntimeModel } from "../../plugins/types.js";
 import { resolveContextWindowInfo } from "../context-window-guard.js";
 import { DEFAULT_CONTEXT_TOKENS } from "../defaults.js";
 import { setCompactionSafeguardRuntime } from "../pi-hooks/compaction-safeguard-runtime.js";
@@ -17,12 +17,13 @@ function resolveContextWindowTokens(params: {
  cfg: OpenClawConfig | undefined;
  provider: string;
  modelId: string;
-  model: Model<Api> | undefined;
+  model: ProviderRuntimeModel | undefined;
 }): number {
  return resolveContextWindowInfo({
    cfg: params.cfg,
    provider: params.provider,
    modelId: params.modelId,
+    modelContextTokens: params.model?.contextTokens,
    modelContextWindow: params.model?.contextWindow,
    defaultTokens: DEFAULT_CONTEXT_TOKENS,
  }).tokens;
@@ -33,7 +34,7 @@ function buildContextPruningFactory(params: {
  sessionManager: SessionManager;
  provider: string;
  modelId: string;
-  model: Model<Api> | undefined;
+  model: ProviderRuntimeModel | undefined;
 }): ExtensionFactory | undefined {
  const raw = params.cfg?.agents?.defaults?.contextPruning;
  if (raw?.mode !== "cache-ttl") {
@@ -73,7 +74,7 @@ export function buildEmbeddedExtensionFactories(params: {
  sessionManager: SessionManager;
  provider: string;
  modelId: string;
-  model: Model<Api> | undefined;
+  model: ProviderRuntimeModel | undefined;
 }): ExtensionFactory[] {
  const factories: ExtensionFactory[] = [];
  if (resolveCompactionMode(params.cfg) === "safeguard") {
@@ -83,6 +84,7 @@ export function buildEmbeddedExtensionFactories(params: {
      cfg: params.cfg,
      provider: params.provider,
      modelId: params.modelId,
+      modelContextTokens: params.model?.contextTokens,
      modelContextWindow: params.model?.contextWindow,
      defaultTokens: DEFAULT_CONTEXT_TOKENS,
    });
--- a/src/agents/pi-embedded-runner/model.provider-runtime.test-support.ts
+++ b/src/agents/pi-embedded-runner/model.provider-runtime.test-support.ts
@@ -205,7 +205,8 @@ function buildDynamicModel(
            api: "openai-codex-responses",
            baseUrl: OPENAI_CODEX_BASE_URL,
            cost: { input: 2.5, output: 15, cacheRead: 0.25, cacheWrite: 0 },
-            contextWindow: 272_000,
+            contextWindow: 1_050_000,
+            contextTokens: 272_000,
            maxTokens: 128_000,
          },
          fallback,
--- a/src/agents/pi-embedded-runner/model.test-harness.ts
+++ b/src/agents/pi-embedded-runner/model.test-harness.ts
@@ -70,7 +70,8 @@ export function buildOpenAICodexForwardCompatExpectation(
      : isGpt54
        ? { input: 2.5, output: 15, cacheRead: 0.25, cacheWrite: 0 }
        : OPENAI_CODEX_TEMPLATE_MODEL.cost,
-    contextWindow: isGpt54 ? 272_000 : isSpark ? 128_000 : 272000,
+    contextWindow: isGpt54 ? 1_050_000 : isSpark ? 128_000 : 272000,
+    ...(isGpt54 ? { contextTokens: 272_000 } : {}),
    maxTokens: 128000,
  };
 }
--- a/src/agents/pi-embedded-runner/model.ts
+++ b/src/agents/pi-embedded-runner/model.ts
@@ -12,6 +12,7 @@ import {
  runProviderDynamicModel,
  normalizeProviderResolvedModelWithPlugin,
 } from "../../plugins/provider-runtime.js";
+import type { ProviderRuntimeModel } from "../../plugins/types.js";
 import { resolveOpenClawAgentDir } from "../agent-paths.js";
 import { DEFAULT_CONTEXT_TOKENS } from "../defaults.js";
 import { buildModelAliasLines } from "../model-alias-lines.js";
@@ -294,12 +295,12 @@ function resolveConfiguredProviderConfig(

 function applyConfiguredProviderOverrides(params: {
  provider: string;
-  discoveredModel: Model<Api>;
+  discoveredModel: ProviderRuntimeModel;
  providerConfig?: InlineProviderConfig;
  modelId: string;
  cfg?: OpenClawConfig;
  runtimeHooks?: ProviderRuntimeHooks;
-}): Model<Api> {
+}): ProviderRuntimeModel {
  const { discoveredModel, providerConfig, modelId } = params;
  if (!providerConfig) {
    return {
@@ -368,6 +369,7 @@ function applyConfiguredProviderOverrides(params: {
      input: normalizedInput,
      cost: configuredModel?.cost ?? discoveredModel.cost,
      contextWindow: configuredModel?.contextWindow ?? discoveredModel.contextWindow,
+      contextTokens: configuredModel?.contextTokens ?? discoveredModel.contextTokens,
      maxTokens: configuredModel?.maxTokens ?? discoveredModel.maxTokens,
      headers: requestConfig.headers,
      compat: configuredModel?.compat ?? discoveredModel.compat,
@@ -595,6 +597,7 @@ function resolveConfiguredFallbackModel(params: {
          configuredModel?.contextWindow ??
          providerConfig?.models?.[0]?.contextWindow ??
          DEFAULT_CONTEXT_TOKENS,
+        contextTokens: configuredModel?.contextTokens ?? providerConfig?.models?.[0]?.contextTokens,
        maxTokens:
          configuredModel?.maxTokens ??
          providerConfig?.models?.[0]?.maxTokens ??
--- a/src/agents/pi-embedded-runner/run/setup.ts
+++ b/src/agents/pi-embedded-runner/run/setup.ts
@@ -105,6 +105,7 @@ export function resolveEffectiveRuntimeModel(params: {
    cfg: params.cfg,
    provider: params.provider,
    modelId: params.modelId,
+    modelContextTokens: params.runtimeModel.contextTokens,
    modelContextWindow: params.runtimeModel.contextWindow,
    defaultTokens: DEFAULT_CONTEXT_TOKENS,
  });
--- a/src/commands/status.summary.runtime.test.ts
+++ b/src/commands/status.summary.runtime.test.ts
@@ -20,6 +20,25 @@ describe("statusSummaryRuntime.resolveContextTokensForModel", () => {

    expect(contextTokens).toBe(123_456);
  });
+
+  it("prefers per-model contextTokens over contextWindow", () => {
+    const contextTokens = statusSummaryRuntime.resolveContextTokensForModel({
+      cfg: {
+        models: {
+          providers: {
+            "openai-codex": {
+              models: [{ id: "gpt-5.4", contextWindow: 1_050_000, contextTokens: 272_000 }],
+            },
+          },
+        },
+      } as never,
+      provider: "openai-codex",
+      model: "gpt-5.4",
+      fallbackContextTokens: 999,
+    });
+
+    expect(contextTokens).toBe(272_000);
+  });
 });

 describe("statusSummaryRuntime.resolveSessionModelRef", () => {
--- a/src/commands/status.summary.runtime.ts
+++ b/src/commands/status.summary.runtime.ts
@@ -99,7 +99,7 @@ function resolveConfiguredStatusModelRef(params: {
  return { provider: params.defaultProvider, model: params.defaultModel };
 }

-function resolveConfiguredProviderContextWindow(
+function resolveConfiguredProviderContextTokens(
  cfg: OpenClawConfig | undefined,
  provider: string,
  model: string,
@@ -114,13 +114,19 @@ function resolveConfiguredProviderContextWindow(
      continue;
    }
    for (const entry of providerConfig.models) {
+      const contextTokens =
+        typeof entry?.contextTokens === "number"
+          ? entry.contextTokens
+          : typeof entry?.contextWindow === "number"
+            ? entry.contextWindow
+            : undefined;
      if (
        typeof entry?.id === "string" &&
        entry.id === model &&
-        typeof entry.contextWindow === "number" &&
-        entry.contextWindow > 0
+        typeof contextTokens === "number" &&
+        contextTokens > 0
      ) {
-        return entry.contextWindow;
+        return contextTokens;
      }
    }
  }
@@ -180,13 +186,13 @@ function resolveContextTokensForModel(params: {
    return params.contextTokensOverride;
  }
  if (params.provider && params.model) {
-    const configuredWindow = resolveConfiguredProviderContextWindow(
+    const configuredContextTokens = resolveConfiguredProviderContextTokens(
      params.cfg,
      params.provider,
      params.model,
    );
-    if (configuredWindow !== undefined) {
-      return configuredWindow;
+    if (configuredContextTokens !== undefined) {
+      return configuredContextTokens;
    }
  }
  return params.fallbackContextTokens ?? DEFAULT_CONTEXT_TOKENS;
--- a/src/config/schema.base.generated.ts
+++ b/src/config/schema.base.generated.ts
@@ -2318,6 +2318,11 @@ export const GENERATED_BASE_CONFIG_SCHEMA = {
                        type: "number",
                        exclusiveMinimum: 0,
                      },
+                      contextTokens: {
+                        type: "integer",
+                        exclusiveMinimum: 0,
+                        maximum: 9007199254740991,
+                      },
                      maxTokens: {
                        type: "number",
                        exclusiveMinimum: 0,
@@ -22333,7 +22338,7 @@ export const GENERATED_BASE_CONFIG_SCHEMA = {
    },
    "models.mode": {
      label: "Model Catalog Mode",
-      help: 'Controls provider catalog behavior: "merge" keeps built-ins and overlays your custom providers, while "replace" uses only your configured providers. In "merge", matching provider IDs preserve non-empty agent models.json baseUrl values, while apiKey values are preserved only when the provider is not SecretRef-managed in current config/auth-profile context; SecretRef-managed providers refresh apiKey from current source markers, and matching model contextWindow/maxTokens use the higher value between explicit and implicit entries.',
+      help: 'Controls provider catalog behavior: "merge" keeps built-ins and overlays your custom providers, while "replace" uses only your configured providers. In "merge", matching provider IDs preserve non-empty agent models.json baseUrl values, while apiKey values are preserved only when the provider is not SecretRef-managed in current config/auth-profile context; SecretRef-managed providers refresh apiKey from current source markers, matching model contextWindow/maxTokens use the higher value between explicit and implicit entries, and explicit contextTokens runtime caps are preserved.',
      tags: ["models"],
    },
    "models.providers": {
--- a/src/config/types.models.ts
+++ b/src/config/types.models.ts
@@ -60,6 +60,12 @@ export type ModelDefinitionConfig = {
    cacheWrite: number;
  };
  contextWindow: number;
+  /**
+   * Optional effective runtime cap used for compaction/session budgeting.
+   * Keeps provider/native contextWindow metadata intact while letting configs
+   * prefer a smaller practical window.
+   */
+  contextTokens?: number;
  maxTokens: number;
  headers?: Record<string, string>;
  compat?: ModelCompatConfig;
--- a/src/config/zod-schema.core.ts
+++ b/src/config/zod-schema.core.ts
@@ -307,6 +307,7 @@ export const ModelDefinitionSchema = z
      .strict()
      .optional(),
    contextWindow: z.number().positive().optional(),
+    contextTokens: z.number().int().positive().optional(),
    maxTokens: z.number().positive().optional(),
    headers: z.record(z.string(), z.string()).optional(),
    compat: ModelCompatSchema,
--- a/src/plugins/types.ts
+++ b/src/plugins/types.ts
@@ -313,7 +313,9 @@ export type ProviderPluginCatalog = {
 * Runtime hooks below operate on the final `pi-ai` model object after
 * discovery/override merging, just before inference runs.
 */
-export type ProviderRuntimeModel = Model<Api>;
+export type ProviderRuntimeModel = Model<Api> & {
+  contextTokens?: number;
+};

 export type ProviderRuntimeProviderConfig = {
  baseUrl?: string;