startGatewayRuntimeServices() previously started both the cron
scheduler AND heartbeat runner BEFORE gateway sidecars finished
initialising. Because chat.history is marked unavailable until
sidecars complete, any cron job or heartbeat tick that called
chat.history during this window received a hard UNAVAILABLE error.
Fix: create a noop heartbeat placeholder in the early
startGatewayRuntimeServices() call, then activate the real
heartbeat runner, cron scheduler, and pending delivery recovery
in a new activateGatewayScheduledServices() function that runs
AFTER startGatewayPostAttachRuntime() completes.
channelHealthMonitor and model pricing refresh remain in the
early call since they do not depend on chat.history.
Root cause analysis by luban, cross-validated by tongluo.
Reviewer feedback addressed: heartbeat runner is now also
deferred (previously only cron was deferred).
* agents: auto-activate strict-agentic for GPT-5 and emit blocked-exit liveness
Closes two hard blockers on the GPT-5.4 parity completion gate:
1) Criterion 1 (no stalls after planning) is universal, but the pre-existing
strict-agentic execution contract was opt-in only. Out-of-the-box GPT-5
openai / openai-codex users who never set
`agents.defaults.embeddedPi.executionContract` still got only 1
planning-only retry and then fell through to the normal completion path
with the plan-only text, i.e. they still stalled.
Introduce `resolveEffectiveExecutionContract(...)` in
src/agents/execution-contract.ts. Behavior:
- supported provider/model (openai or openai-codex + gpt-5-family) AND
explicit "strict-agentic" or unspecified → "strict-agentic"
- supported provider/model AND explicit "default" → "default" (opt-out)
- unsupported provider/model → "default" regardless of explicit value
`isStrictAgenticExecutionContractActive` now delegates to the effective
resolver so the 2-retry + blocked-state treatment applies by default to
every GPT-5 openai/codex run. Explicit opt-out still works for users who
intentionally want the pre-parity-program behavior.
2) Criterion 4 (replay/liveness failures are explicit, not silent
disappearance) is violated by the strict-agentic blocked exit itself.
Every other terminal return path in src/agents/pi-embedded-runner/run.ts
sets `replayInvalid` + `livenessState` via `setTerminalLifecycleMeta`,
but the strict-agentic exit at run.ts:1615 falls through without them.
Add explicit `livenessState: "abandoned"` + `replayInvalid` (via the
shared `resolveReplayInvalidForAttempt` helper) to that exit, plus a
`setTerminalLifecycleMeta` call so downstream observers (lifecycle log,
ACP bridge, telemetry) see the same explicit terminal state they see on
every other exit branch.
Regressions added:
- `auto-enables update_plan for unconfigured GPT-5 openai runs`
- `respects explicit default contract opt-out on GPT-5 runs`
- `does not auto-enable update_plan for non-openai providers even when unconfigured`
- `emits explicit replayInvalid + abandoned liveness state at the strict-agentic blocked exit`
- `auto-activates strict-agentic for unconfigured GPT-5 openai runs and surfaces the blocked state`
- `respects explicit default contract opt-out on GPT-5 openai runs`
Local validation:
- pnpm test src/agents/openclaw-tools.update-plan.test.ts src/agents/pi-embedded-runner/run.incomplete-turn.test.ts src/agents/pi-embedded-runner.buildembeddedsandboxinfo.test.ts src/agents/system-prompt.test.ts src/agents/openclaw-tools.sessions.test.ts src/agents/pi-embedded-runner/run.overflow-compaction.test.ts
122/122 passing.
Refs #64227
* agents: address loop-6 review comments on strict-agentic contract
Triages all three loop-6 review comments on PR #64679:
1. Copilot: 'The strict-agentic blocked exit returns an error payload
(isError: true) but sets livenessState to "abandoned". Elsewhere in
the runner/lifecycle flow, error terminal states are treated as
"blocked".' Verified: every other hardcoded error terminal branch in
run.ts (role ordering at 1152, image size at 1206, schema error at
1244, compaction timeout at 1128, aborted-with-no-payloads at 606)
uses livenessState: "blocked". Match that convention at the
strict-agentic blocked exit at 1634. Updated the 'emits explicit
replayInvalid + abandoned liveness state' regression test to assert
the new "blocked" value and renamed the assertion commentary.
2. Copilot: 'The JSDoc for resolveEffectiveExecutionContract says
explicit "strict-agentic" in config always resolves to
"strict-agentic", but the implementation collapses to "default"
whenever the provider/mode is unsupported.' Rewrite the JSDoc to
explicitly document the unsupported-provider collapse as the lead
case (strict-agentic is a GPT-5-family openai/openai-codex-only
runtime contract) before listing the supported-lane behavior matrix.
No code change; this is a docstring-only clarification.
3. Greptile P2: 'Non-preferred Anthropic model constant. CLAUDE.md says
to prefer sonnet-4.6 for Anthropic test constants.' Swap
claude-opus-4-6 → claude-sonnet-4-6 in the two update_plan gating
fixtures that assert non-openai providers don't auto-enable the
planning tool. Behavior unchanged; model constant now matches repo
testing guidance.
Local validation:
- pnpm test src/agents/openclaw-tools.update-plan.test.ts src/agents/pi-embedded-runner/run.incomplete-turn.test.ts
29/29 passing.
Refs #64227
* test: rename strict-agentic blocked-exit liveness regression to match blocked state
Addresses loop-7 Copilot finding on PR #64679: loop 6 changed the
assertion to livenessState === 'blocked' to match the rest of the
hard-error terminal branches in run.ts, but the test title still said
'abandoned liveness state', which made failures and test output
misleading. Rename the test title to match the asserted value. No
code change beyond the it(...) title.
Validation: pnpm test src/agents/pi-embedded-runner/run.incomplete-turn.test.ts
(19/19 pass).
Refs #64227
* agents: widen strict-agentic auto-activation to handle prefixed and variant GPT-5 model ids
* Align strict-agentic retry matching
* runtime: harden strict-agentic model matching
---------
Co-authored-by: Eva <eva@100yen.org>