fix(cron): add retry policy for one-shot jobs on transient errors (#24355) (openclaw#24435) thanks @hugenshen

Verified:
- pnpm install --frozen-lockfile
- pnpm check
- pnpm test -- --run src/cron/service.issue-regressions.test.ts src/config/config-misc.test.ts

Co-authored-by: hugenshen <16300669+hugenshen@users.noreply.github.com>
Co-authored-by: Tak Hoffman <781889+Takhoffman@users.noreply.github.com>
This commit is contained in:
NIO
2026-03-01 20:58:03 +08:00
committed by GitHub
parent 62a7683ce6
commit ea3955cd78
10 changed files with 406 additions and 23 deletions

View File

@@ -353,6 +353,38 @@ Notes:
- Isolated cron run sessions in `sessions.json` are pruned by `cron.sessionRetention` (default `24h`; set `false` to disable).
- Override store path: `cron.store` in config.
## Retry policy
When a job fails, OpenClaw classifies errors as **transient** (retryable) or **permanent** (disable immediately).
### Transient errors (retried)
- Rate limit (429, too many requests, resource exhausted)
- Network errors (timeout, ECONNRESET, fetch failed, socket)
- Server errors (5xx)
- Cloudflare-related errors
### Permanent errors (no retry)
- Auth failures (invalid API key, unauthorized)
- Config or validation errors
- Other non-transient errors
### Default behavior (no config)
**One-shot jobs (`schedule.kind: "at"`):**
- On transient error: retry up to 3 times with exponential backoff (30s → 1m → 5m).
- On permanent error: disable immediately.
- On success or skip: disable (or delete if `deleteAfterRun: true`).
**Recurring jobs (`cron` / `every`):**
- On any error: apply exponential backoff (30s → 1m → 5m → 15m → 60m) before the next scheduled run.
- Job stays enabled; backoff resets after the next successful run.
Configure `cron.retry` to override these defaults (see [Configuration](/automation/cron-jobs#configuration)).
## Configuration
```json5
@@ -361,6 +393,12 @@ Notes:
enabled: true, // default true
store: "~/.openclaw/cron/jobs.json",
maxConcurrentRuns: 1, // default 1
// Optional: override retry policy for one-shot jobs
retry: {
maxAttempts: 3,
backoffMs: [60000, 120000, 300000],
retryOn: ["rate_limit", "network", "server_error"],
},
webhook: "https://example.invalid/legacy", // deprecated fallback for stored notify:true jobs
webhookToken: "replace-with-dedicated-webhook-token", // optional bearer token for webhook mode
sessionRetention: "24h", // duration string or false
@@ -617,7 +655,7 @@ openclaw system event --mode now --text "Next heartbeat: check battery."
- OpenClaw applies exponential retry backoff for recurring jobs after consecutive errors:
30s, 1m, 5m, 15m, then 60m between retries.
- Backoff resets automatically after the next successful run.
- One-shot (`at`) jobs disable after a terminal run (`ok`, `error`, or `skipped`) and do not retry.
- One-shot (`at`) jobs retry transient errors (rate limit, network, server_error) up to 3 times with backoff; permanent errors disable immediately. See [Retry policy](/automation/cron-jobs#retry-policy).
### Telegram delivers to the wrong place