moltbot/docs/refactor/piless.md

---
summary: "Plan for reducing OpenClaw's dependency on external PI packages while moving agent state toward SQLite, VFS scratch storage, and worker isolation"
title: "Refactoring"
read_when:
  - Planning work to internalize PI runtime pieces
  - Moving session, transcript, or agent scratch state from JSON files to SQLite
  - Designing agent filesystem boundaries or VFS-backed scratch storage
  - Evaluating Node workers for agent runtime isolation or parallelism
---

This is a planning document for issue
[openclaw/openclaw#78096](https://github.com/openclaw/openclaw/issues/78096).

The goal is not to delete PI in one large rewrite. The goal is to make OpenClaw
own the runtime boundary, state model, filesystem capabilities, and parallel
execution shape so PI can become an implementation detail and eventually be
internalized or replaced in slices.

## Current Shape

OpenClaw currently embeds PI directly. The main loop still imports
`@mariozechner/pi-coding-agent`, `@mariozechner/pi-agent-core`, `@mariozechner/pi-ai`,
and `@mariozechner/pi-tui` across agent runtime, tool, provider, transcript, and
TUI paths. See [PI integration architecture](/pi).

Before this refactor, session and runtime state was split across several
persistence mechanisms:

- Gateway session index: `sessions.json`
- Session transcripts: `*.jsonl`
- Auth profiles: `auth-profiles.json`
- Config: `openclaw.json`
- Task registry: SQLite
- Plugin state: SQLite
- Memory indexes: SQLite or QMD-owned SQLite
- Plugin-specific JSON and JSONL sidecars

That mix was workable, but it created duplicated read, write, migration,
locking, maintenance, and diagnostics code. The branch now moves canonical
runtime state into the shared SQLite database and treats old JSON files as
doctor migration inputs, not runtime compatibility stores.

## Current Implementation Status

This plan has started landing in slices:

- Shared state database exists at `~/.openclaw/state/openclaw.sqlite` with
  WAL, shared schema migration, session, transcript, VFS, and artifact tables.
  The shared `kv` table now has a small typed helper for scoped JSON-compatible
  values so low-risk JSON sidecars can move behind the same SQLite connection
  without each feature reimplementing read/write/delete glue.
- Canonical per-agent session stores use SQLite by default. The `openclaw doctor`
  fix mode imports legacy `sessions.json` indexes into SQLite and removes the
  JSON index after import, instead of keeping a startup migration or parallel
  compatibility/export store. Runtime session reads and writes normalize and
  persist only: no JSON import, row pruning, capping, archive cleanup, or
  disk-budget cleanup runs on the hot path. The old maintenance write options
  and explicit session cleanup command have been removed from the session-store
  API; doctor owns legacy import. Status and discovery now use the primary session-store loader instead of a duplicated
  read-only JSON parser, and SQLite-backed agent session directories remain
  discoverable after doctor deletes the legacy `sessions.json` file. The legacy
  JSON session-store object/serialized cache is gone; JSON fallback reads now
  parse directly while canonical SQLite stores avoid that path. The cron timer
  no longer runs a dedicated session reaper.
- Transcript events are SQLite-primary. OpenClaw-owned append paths require
  agent/session scope and write `transcript_events` directly; `*.jsonl` is no
  longer a runtime mirror for those paths. JSONL is now an explicit
  import/export/debug boundary shape only. The OpenClaw transcript session manager,
  Gateway-injected assistant messages, CLI transcript persistence, Codex
  app-server mirroring, compaction successor transcripts, manual compaction
  boundary rewrites, and reset/header creation all persist through SQLite.
  Scoped latest/tail assistant reads, delivery-mirror idempotency/latest-match
  checks, `/export-session`, `before_reset` hook payloads, silent rotation
  replay, chat/TUI history, restart/subagent recovery, managed media indexing,
  token estimation, title/preview/usage helpers, runtime transcript repair,
  bootstrap completion checks, and bounded inspection all use the scoped SQLite
  transcript. Legacy JSONL import is doctor/import/debug only:
  `openclaw doctor --fix` builds the transcript database from old files and removes the JSONL
  sources after successful import. Runtime paths do not import, prune, or repair
  JSONL files. Pre-compaction checkpoints are SQLite transcript snapshots, not
  `.checkpoint.*.jsonl` copies; branch/restore and checkpoint pruning now work
  against snapshot rows. The old PI session-manager cache/prewarm layer is gone.
- `AgentFilesystem` and `SqliteVirtualAgentFs` exist for scratch storage, with
  `disk`, `vfs-scratch`, and `vfs-only` filesystem modes at the runtime
  boundary. VFS contents can be listed and exported for support bundles. When
  child-process execution is available, VFS-only `exec` projects scratch
  contents into a temporary disk view, runs foreground commands there, and syncs
  created, edited, and deleted files back into SQLite scratch storage.
  Worker-backed PI runs now receive the mode-aware `AgentFilesystem` through
  the rehydrated run params, and the PI attempt consumes the runtime-provided
  artifact store before falling back to the legacy inline SQLite constructor.
  When that runtime filesystem has no host workspace capability, read, write,
  edit, apply_patch, and foreground exec operate on the SQLite scratch VFS when
  allowed; process stays unavailable because background sessions still require a
  real process registry and follow-up polling path.
- `tool_artifacts` has a SQLite store primitive for generated artifact staging,
  export, and per-run cleanup. Runtime trajectory capture now mirrors the
  bounded `*.trajectory.jsonl` sidecar into run-scoped SQLite artifacts while
  retaining the disk sidecar for compatibility. Tool execution now records
  media-result manifests for generated or captured tool media in the same
  run-scoped artifact store while keeping delivery files on disk.
- Managed outgoing image attachment metadata now uses the shared SQLite `kv`
  store as the primary record path. Older per-attachment JSON files are imported
  and removed by `openclaw doctor --fix`; runtime media reads only SQLite.
- Cron job definitions, runtime schedule state, and run history now use the
  shared SQLite state database. `openclaw doctor --fix` imports legacy
  `jobs.json`, `jobs-state.json`, and `cron/runs/*.jsonl` files into SQLite and
  removes those file sources after a successful import. Runtime cron paths no
  longer write job-definition, schedule-state, or run-history JSON files.
- The subagent run registry now uses the shared SQLite `kv` store as the
  primary record path. `openclaw doctor --fix` imports legacy
  `subagents/runs.json` files into SQLite and removes them after import.
  Runtime paths no longer import or delete that JSON file.
- Sandbox container and browser registries now use the shared SQLite
  `sandbox_registry_entries` table as the primary record path. Legacy
  monolithic and sharded registry JSON migrates only through
  `openclaw doctor --fix`; runtime reads and writes no longer touch registry
  JSON.
- OpenRouter model capability cache now uses the shared SQLite `kv` store as
  the primary persistent cache. The older
  `cache/openrouter-models.json` file is imported and removed by
  `openclaw doctor --fix`, not by runtime cache reads.
- Codex app-server thread bindings now use the shared SQLite `kv` store as the
  only runtime record path. The old per-session
  `.codex-app-server.json` sidecar reader/writer has been removed from runtime
  and tests now seed the binding store directly. `openclaw doctor --fix`
  imports old sidecars into SQLite and removes the JSON source.
- TUI last-session restore pointers now use the shared SQLite `kv` store as the
  primary record path. The older `tui/last-session.json` file is imported and
  removed by `openclaw doctor --fix`; runtime TUI reads only SQLite.
- Auth profile runtime routing state now uses the shared SQLite `kv` store as
  the primary record path. Older per-agent `auth-state.json` files are imported
  and removed by `openclaw doctor --fix`; `auth-profiles.json` still owns
  credentials and stays file-backed.
- Device identity, local device auth tokens, bootstrap tokens, device/node
  pairing ledgers, channel pairing requests/allowlists, inferred commitment
  records, subagent run records, TUI restore pointers, auth routing state,
  OpenRouter model cache, web push subscriptions/VAPID keys, APNs registration
  state, and update-check state now
  use the shared SQLite `kv` store. `openclaw doctor --fix` imports the legacy
  `identity/*.json`, `devices/*.json`, `nodes/*.json`,
  `credentials/*-pairing.json`, `credentials/*-allowFrom.json`,
  `commitments/commitments.json`, `subagents/runs.json`,
  `tui/last-session.json`, per-agent `auth-state.json`,
  `cache/openrouter-models.json`, `push/*.json`, and `update-check.json` files
  into SQLite and removes those files after a successful import. Runtime paths
  no longer read or write those JSON ledgers.
- `AgentRuntimeBackend`, `PreparedAgentRun`, and the Node worker runner exist
  for serializable prepared runs. `RunEventBus` owns serial parent event
  delivery for worker event streams. The worker runner enforces prepared-run
  timeouts, terminates on parent abort signals, and flushes async parent event
  handlers in worker message order before resolving the result. The worker entry
  constructs mode-aware filesystem capabilities: `disk` and `vfs-scratch` keep
  host workspace access, while `vfs-only` exposes only SQLite scratch/artifact
  storage. The harness layer can reduce a live attempt into a
  structured-cloneable `PreparedAgentRun` descriptor with prepared delivery
  policy decisions, and the same reducer now works at the higher-level
  `runEmbeddedPiAgent` params boundary before model/auth/registry setup creates
  live objects. That high-level reducer also keeps a sanitized serializable
  `runParams` snapshot so channel routing, sender metadata, images, prompt/tool
  policy, and other data-only fields can cross the worker boundary without
  cloning parent callbacks, abort refs, enqueue functions, or reply-operation
  handles. A worker-side rehydration helper turns that snapshot back into
  `runEmbeddedPiAgent` params and installs callback shims that emit worker
  events for the parent bridge. A PI worker backend module now exists as the
  runnable worker target for that rehydrated high-level path, and a parent-side
  runner can execute that backend through the generic worker runner while
  preserving the full embedded run result. Parent-owned streaming callbacks,
  reply refs, user-message persistence callbacks, and abort signals now have a
  worker event bridge so those functions can stay in the Gateway process instead
  of crossing the worker boundary. Both late harness attempts and higher-level
  `runEmbeddedPiAgent` params now build a single worker-launch request that
  bundles the prepared run, parent event sink, abort signal, and permission
  profile. `runEmbeddedPiAgent` now has a guarded high-level launch point before
  queueing: unset mode defaults to `auto`, explicit `inline` keeps production
  inline, `auto` uses the worker when the run is serializable and falls back
  inline when parent-only blockers remain, and forced `worker` mode dispatches
  through the high-level PI worker backend or fails closed. Worker dispatch runs
  under the existing parent session/global queue envelope. Parent-owned
  reply operations attach a parent backend handle while the worker runs, so
  cancellation, streaming-state checks, and steering messages stay in the
  Gateway process while the live reply-operation object itself is not sent to
  the worker. The worker entry also installs a child-owned abort signal in the
  runtime context and aborts it when parent control sends a cancel message, so
  rehydrated PI run params receive a real local signal instead of an undefined
  placeholder. The PI worker runner is covered by an actual worker-thread smoke
  that exercises the launch request, event bridge, and embedded result
  extraction together. Default production PI runs now prefer workers for
  serializable turns and keep the inline fallback for blocked turns while live
  parity coverage expands.
- Worker permission profile construction exists as a disabled-by-default
  Node-permission seatbelt helper. It grants runtime and SQLite state access,
  grants workspace access only for disk-backed filesystem modes, and does not
  allow nested workers, child processes, native addons, or WASI unless explicitly
  requested. High-level PI worker launches keep permissions off by default for
  disk-backed modes, but `OPENCLAW_AGENT_WORKER_FILESYSTEM_MODE=vfs-only`
  defaults the worker permission mode to `enforce` unless
  `OPENCLAW_AGENT_WORKER_PERMISSION_MODE=off|audit|enforce` overrides it.
- `OPENCLAW_AGENT_WORKER_MODE=inline|auto|worker` controls the worker launch
  path. The default is `auto`, which runs serializable high-level PI turns in a
  worker and falls back inline for blocked turns; explicit `inline` preserves
  the legacy path; forced worker mode fails closed until the high-level PI run
  params are serializable and all live parent-owned callbacks are either
  stripped or bridged.
- Common transcript, model registry, and agent-core types have OpenClaw-owned
  facades. `@mariozechner/pi-coding-agent` package-root imports now route
  through `src/agents/pi-coding-agent-contract.ts` outside test mocks and module
  augmentation. `@mariozechner/pi-agent-core` imports now route through
  `src/agents/agent-core-contract.ts` and the public
  `openclaw/plugin-sdk/agent-core` type facade outside module augmentation.
  The agent-core facade now also carries the small runtime values still needed
  by compatibility tests, such as `Agent` and `runAgentLoop`, so those tests no
  longer import the PI package directly. `@mariozechner/pi-ai` OpenAI response
  stream subpaths have narrow OpenClaw-owned facades for the remaining thinking
  contract coverage.
  `@mariozechner/pi-ai` package-root imports across core now route through
  `src/agents/pi-ai-contract.ts` outside test mocks; production OAuth and
  OpenAI completion conversion subpaths route through narrow OpenClaw facades.
  TUI imports route through `src/agents/pi-tui-contract.ts`, with
  `src/tui/pi-tui-contract.ts` left as a local compatibility re-export.
- Transcript header, entry, tree, parser, legacy migration, context
  builder, and session-manager structural types are now defined by OpenClaw's
  transcript contract. The parser, migration, and context builder runtime
  helpers have one OpenClaw-owned implementation under `src/agents/transcript`
  instead of duplicated facade/file-state logic. OpenClaw also owns a
  synchronous SQLite-backed transcript session manager that implements the live
  `SessionManager` shape over `TranscriptState`, including header creation,
  append persistence, tree, label, branch, session name, branch-summary,
  in-memory, create/open, list/listAll, and fork APIs. Live embedded runs,
  compaction, compatibility tests, and gateway checkpoint helpers now use that
  OpenClaw-owned manager instead of PI's concrete `SessionManager` value. CLI
  budget compaction reads transcript branches through the OpenClaw-owned
  transcript state instead of opening PI `SessionManager` for read-only
  branch extraction. The PI coding-agent facade no longer re-exports transcript
  parser, migration, context, version, entry, or `SessionManager` symbols; those
  now come from the OpenClaw transcript contract.
- Extension, session, tool-definition, and skill structural types are now
  defined by OpenClaw's agent extension contract. Context pruning, compaction
  hooks, embedded subscription, system-prompt assembly, skill formatting, and
  client/tool adapters no longer type against PI's coding-agent package for
  those shapes. The PI coding-agent facade is now limited to runtime values
  still provided by PI plus the `CreateAgentSessionOptions` compatibility type.
- Bundled provider plugin production code now imports provider AI helpers via
  OpenClaw-owned Plugin SDK facades (`openclaw/plugin-sdk/provider-ai` and
  `openclaw/plugin-sdk/provider-ai-oauth`) instead of importing PI packages
  directly.
- The core extension facade boundary test now prevents new direct PI package
  imports from production `src/**` files outside the OpenClaw-owned facade and
  module-augmentation files.
- Provider runtime contract, compaction hook, OAuth profile, BTW, CLI, gateway,
  media, trajectory, tool, token-estimation, and spawn workspace tests now mock
  or type against OpenClaw facades instead of PI packages directly. The facade
  boundary test now scans core PI package-name strings so new direct test mocks
  fail unless they live in a facade, module augmentation, package-graph test, or
  explicit PI compatibility test.

## Target Shape

Use three explicit layers:

```text
agent runtime boundary       OpenClaw-owned interface, PI as one backend
agent state database         SQLite primary store, doctor-only legacy JSON import
agent filesystem boundary    VFS scratch plus host capability filesystem
```

Workers sit around the runtime boundary:

```text
Gateway process
  owns config, channels, HTTP, routing, state DB, policy

Agent worker
  owns one turn or one runtime session lane
  receives a prepared run request
  emits lifecycle, stream, tool, usage, and final events
```

Node permission flags may be useful as defense in depth, but they are not the
security boundary. Node's permission model is process launch policy, not a
rooted filesystem capability API, and it has documented limitations around
workers, symlinks, existing file descriptors, native modules, and loadable
extensions.

## Non Goals

- Do not replace `fs-safe` or pinned filesystem helpers with Node permissions.
- Do not make VFS the only model for workspace edits.
- Do not migrate all agent execution to Platformatic, Regina, or another
  external orchestrator.
- Do not remove Python helper paths until an equally safe portable replacement
  exists.
- Do not hide config and credentials in SQLite before export, doctor, backup,
  and manual repair flows are strong.

## Workstream 0: Remove Duplicate Ownership

Treat duplicated code as a symptom of unclear ownership. The first refactor
should not move bytes between files; it should decide which layer owns each
operation.

Consolidate these repeated patterns behind shared primitives:

- JSON read, write, atomic replace, backup, import, and export helpers.
- Session index lookup, locking, cleanup, and diagnostics.
- Transcript event append, replay, compaction, and support bundle export.
- PI message, tool result, and provider adapter shapes.
- Tool scratch file creation, artifact staging, and cleanup.

Target primitives:

```text
StateStore              durable Gateway and agent state
SessionStoreBackend     session index and metadata ownership
TranscriptStore         append-only event history plus export
AgentRuntimeBackend     PI or future runtime implementation
AgentFilesystem         host capability filesystem plus VFS scratch
RunEventBus             serializable worker to parent event stream
```

Measure progress by deleting repeated helper code, not by adding wrappers. Each
phase should name the old code path it replaces and keep at most one adapter for
compatibility.

## Workstream 1: Own The PI Boundary

Start by shrinking direct PI imports, not by forking PI.

1. Add an OpenClaw-owned runtime facade above `src/agents/harness/*`.
2. Move PI imports into a small adapter package or directory.
3. Keep `agentRuntime.id: "pi"` stable and compatible.
4. Convert common OpenClaw code to use OpenClaw types instead of PI types.
5. Internalize PI functionality in this order:
   - Tool result and message types.
   - Tool adapter and tool loop contracts.
   - Session manager and transcript mutation.
   - Model registry and provider abstractions.
   - TUI pieces, only if still needed after Control UI and CLI paths settle.

Early success means most files outside the adapter no longer import
`@mariozechner/pi-*` directly.

## Workstream 2: Consolidate State In SQLite

OpenClaw already has a shared SQLite state layer. Task, Task Flow, and plugin
state runtime writes use `~/.openclaw/state/openclaw.sqlite`; legacy sidecars
are doctor-import inputs.

- `node:sqlite`
- WAL mode
- `synchronous = NORMAL`
- `busy_timeout`
- `0o700` directory mode
- `0o600` database and sidecar mode
- explicit close paths for tests and Windows cleanup

Create one shared state layer for agent and gateway state. Suggested path:
`~/.openclaw/state/openclaw.sqlite`.

Suggested tables:

```text
schema_migrations(version, applied_at)
kv(scope, key, value_json, updated_at)
agents(agent_id, config_json, created_at, updated_at)
session_entries(agent_id, session_key, entry_json, updated_at)
transcript_events(agent_id, session_id, seq, event_json, created_at)
transcript_files(agent_id, session_id, path, imported_at, exported_at)
vfs_entries(agent_id, namespace, path, kind, content_blob, metadata_json, updated_at)
tool_artifacts(agent_id, run_id, artifact_id, kind, metadata_json, blob, created_at)
```

Migration order:

1. Add shared SQLite connection and migration helpers. Done.
2. Move task registry, Task Flow, and plugin state into shared SQLite. Runtime
   writes are done; legacy sidecar import remains in doctor.
3. Move `sessions.json` behind a `SessionStoreBackend` interface. Done for
   canonical per-agent stores.
4. Make SQLite primary for session entries. Done for canonical per-agent
   stores.
5. Import old `sessions.json` only from `openclaw doctor --fix`, then remove the
   JSON index after SQLite has the rows. Done for session indexes.
6. Import old `*.jsonl` transcripts only from `openclaw doctor --fix`, then
   remove the JSONL source after SQLite has the events. Done for canonical
   transcript files.
7. Keep JSONL export as explicit debug/support output only.

Keep `openclaw.json` and `auth-profiles.json` file-backed until operator
repair, secret audit, and backup flows can handle the SQLite layout naturally.

## Workstream 3: Add VFS Scratch Storage

The filesystem model should distinguish scratch state from real host files.

```text
VirtualAgentFs
  SQLite-backed scratch filesystem
  used for temporary tool files, generated artifacts, staging, diagnostics

HostCapabilityFs
  real host filesystem access
  backed by fs-safe or pinned helpers
  used for workspace edits, media imports, archive extraction, user files
```

Agent tools should receive capability objects, not raw path strings where
possible:

```ts
type AgentFilesystem = {
  scratch: VirtualAgentFs;
  workspace?: HostCapabilityFs;
};
```

Default policy:

- `read`, `write`, `edit`, and `apply_patch` continue to operate on the real
  workspace unless the run is explicitly VFS-only.
- Scratch artifacts use VFS by default.
- Shell commands run on disk when host workspace or sandbox access is granted.
- In VFS-only mode, foreground `exec` may run against an explicit projected
  temporary disk view and sync the result back into VFS. `process` stays
  disk/sandbox-only until background sessions have a VFS-aware lifecycle.

Runtime filesystem modes:

| Mode          | Workspace writes                         | Scratch writes | Shell working directory                   | Primary use                                |
| ------------- | ---------------------------------------- | -------------- | ----------------------------------------- | ------------------------------------------ |
| `disk`        | Host capability FS                       | SQLite VFS     | Real workspace or sandbox root            | Current default with safer scratch storage |
| `vfs-scratch` | Host capability FS                       | SQLite VFS     | Real workspace or sandbox root            | Default target after VFS lands             |
| `vfs-only`    | SQLite VFS unless host grant is explicit | SQLite VFS     | Projected temporary disk view or no shell | Isolated agents, previews, replay, tests   |

The parent process chooses the mode before worker launch and records it in the
run policy. Workers should not be able to upgrade themselves from VFS-only to
host filesystem access.

Good first candidates for VFS:

- Tool temporary files.
- Model diagnostic payloads. Runtime trajectory capture now has a SQLite
  artifact mirror.
- Generated artifact staging. Tool media result manifests now land in SQLite;
  binary delivery files remain on disk until channel delivery supports
  claim-check reads from VFS/artifacts.
- Memory upload batches.
- QA and scenario summaries.
- Plugin scratch state that does not need operator editing.

Poor first candidates:

- User workspaces.
- Git repositories.
- Media files users expect to find on disk.
- Config and credentials.
- Any integration whose dependency requires real paths.

## Workstream 4: Run Agents In Workers

Workerization should improve isolation and parallelism without moving Gateway
ownership into workers.

Initial architecture:

1. Parent Gateway builds a `PreparedAgentRun`.
2. Parent records session routing and policy in SQLite.
3. Parent starts or leases an agent worker.
4. Worker runs the selected harness attempt.
5. Worker streams events back to parent.
6. Parent persists state, delivers channel replies, and enforces lifecycle.

Worker payloads must be serializable. Do not pass live DB handles, plugin API
objects, process handles, or mutable config references into workers.

Start with one worker per active agent run. Later, add a pool keyed by:

- runtime id
- agent id
- model provider
- workspace or sandbox root
- permission profile

Use worker threads first for lower overhead. Add process mode when the run needs
stronger isolation, different Node permission flags, native module separation,
or cleaner crash containment.

## Node Permissions Policy

Use Node permissions only as a seatbelt:

- grant read access to code and required runtime files
- grant read/write to the agent workspace or sandbox root when needed
- grant worker creation only in trusted parent code
- avoid exposing worker creation to model-controlled tools
- keep subprocess and native addon permissions disabled unless the runtime
  profile needs them

Do not treat Node permissions as a substitute for `HostCapabilityFs`.

## Dependency Policy

Before adding `@platformatic/vfs`, Platformatic Runtime, `@cocalc/openat2`, or
similar dependencies:

1. Prototype behind a feature flag.
2. Measure install size and native surface.
3. Check package health, license, and release cadence.
4. Keep dependency ownership local to the feature owner.
5. Avoid root dependencies unless core imports the package at runtime.

Likely choices:

- SQLite VFS can start as an OpenClaw-owned minimal implementation.
- `@platformatic/vfs` can be evaluated as an adapter, not adopted as the core
  contract immediately.
- `@cocalc/openat2` can be an optional Linux fast path inside `fs-safe`, not the
  portable baseline.

## Test Plan

Add tests before each migration step:

- Duplicate adapter deletion checks for PI imports, JSON state helpers, and
  filesystem scratch helpers.
- Session store JSON import to SQLite.
- SQLite to JSON export for support bundles.
- Scoped JSON-compatible KV helper read, list, write, and delete behavior.
- Concurrent session entry updates from multiple workers.
- WAL recovery after simulated crash.
- Transcript JSONL compatibility while PI still owns transcripts.
- VFS path normalization, read, write, rename, remove, and directory listing.
- VFS projection to temporary disk and sync-back of command-side creates,
  edits, deletes, and nested workdirs.
- Host filesystem traversal, symlink, hardlink, rename, copy, remove, and
  time-of-check to time-of-use races.
- Worker lifecycle, cancellation, stream event ordering, and crash recovery.
- Worker prepared-run timeout enforcement, abort handling, and parent event
  flush ordering.
- Worker parent callback bridge for streaming replies, tool output, generic
  agent events, aborts, and reply refs.
- High-level run-param snapshot and worker rehydration for preserving
  serializable channel/tool/prompt policy across the worker boundary.
- Parent-side PI worker runner that preserves `EmbeddedPiRunResult` instead of
  collapsing worker completion to plain text.
- Run-level worker dispatch that preserves parent queue ordering and parent
  reply-operation cancellation, streaming state, and steering messages without
  cloning the live operation into the worker.
- Worker-entry cancellation signal rehydration from parent control messages.
- Worker permission profile construction, including VFS-only path denial.
- Disk, VFS scratch, and VFS-only filesystem mode behavior.
- Plugin state and task registry coexistence with the shared state DB.
- Managed outgoing media record import from legacy JSON, legacy file removal
  after import, plus SQLite-primary serving without JSON exports.
- Subagent run registry import from legacy `subagents/runs.json` during doctor,
  legacy file removal after import, and restore from SQLite without JSON
  exports.
- Sandbox container and browser registry reads from SQLite, while legacy
  monolithic registry migration stays an explicit doctor repair operation.
- OpenRouter model capability cache reads from SQLite, with old cache JSON
  imported and removed only by doctor.
- TUI last-session restore pointers read from SQLite without JSON exports,
  import legacy JSON only through doctor, and clear stale pointers from SQLite.
- Auth profile runtime state reads from SQLite, imports legacy JSON only through
  doctor, and deletes SQLite state when runtime state is empty.

## Rollout Plan

Phase 0: inventory and contracts

- Count direct PI imports by package.
- Count duplicate JSON, transcript, and scratch helper implementations.
- Inventory JSON and JSONL state files.
- Define `AgentRuntimeBackend`, `SessionStoreBackend`, and `AgentFilesystem`.
- Document host path versus VFS-only operations.

Phase 1: SQLite session index

- Add shared state DB helper.
- Add a doctor migration that imports `sessions.json` into SQLite and removes
  the JSON index.
- Move canonical session entries to SQLite by default.
- Prove current session list, patch, reset, cleanup, and UI flows.
- Remove load-time/startup session JSON migration, write-time pruning, and
  migration-era maintenance options from the runtime store path.
- Remove the duplicate status-only session JSON reader and stop requiring a
  physical `sessions.json` file for discovered SQLite-backed agent stores.
- Remove the legacy JSON session-store cache layer.
- Remove the dedicated cron timer session reaper and `cron.sessionRetention`
  config; explicit session cleanup owns row pruning.

Phase 2: VFS scratch

- Add SQLite-backed VFS for scratch artifacts.
- Move low-risk scratch files first.
- Keep real workspace tools on host capability FS.
- Add support bundle export for VFS contents.

Phase 3: PI adapter shrink

- Centralize PI imports.
- Replace PI-exposed types across core with OpenClaw-owned types.
- Keep PI as the implementation of the default harness.

Phase 4: workerized runs

- Run one PI harness attempt inside a worker behind a feature flag.
- Stream events back through the parent.
- Keep parent-owned session and delivery writes authoritative.
- Add cancellation and crash recovery.

Phase 5: transcript ownership

- Move transcript mutation behind OpenClaw APIs.
- Store transcript events in SQLite.
- Import legacy JSONL through doctor only; export JSONL for debugging/support.
- Remove direct PI `SessionManager` usage from non-adapter code.
- Remove file-backed compaction checkpoint copies and the session-manager
  cache/prewarm layer.
- Move Codex app-server binding state from per-session JSON sidecars to the
  shared SQLite `kv` table.

Phase 6: internalize or replace PI pieces

- Internalize the pieces that still force root PI dependencies.
- Keep public runtime behavior and docs stable.
- Remove PI packages only when all runtime, TUI, provider, and transcript users
  have migrated.

## Open Questions

- Which current JSON files must remain human-editable long term?
- Should a VFS-only agent be a separate runtime profile or a per-run filesystem
  mode?
- Should shell commands ever run directly against VFS, or only against projected
  temporary disk views?
- How much transcript history should stay queryable through SQL versus exported
  support bundles?
- What is the minimum useful worker boundary: per turn, per session, or per
  agent?
- Which plugin SDK APIs should expose filesystem capabilities first?

## Done Criteria

This refactor is successful when:

- Core code no longer imports PI packages outside the runtime adapter.
- Repeated JSON, transcript, PI adapter, and scratch filesystem logic has one
  owner each.
- `sessions.json` is a doctor-migrated legacy input, not a compatibility store.
- Scratch state and tool artifacts can live in SQLite-backed VFS.
- Agents can run in disk, VFS scratch, and VFS-only filesystem modes.
- Real workspace writes still use capability-safe host filesystem operations.
- Agent turns can run in workers with preserved streaming, cancellation,
  compaction, tool hooks, and channel delivery.
- Existing users can upgrade without losing sessions, config, credentials, or
  workspaces.