Skip to content

Supervision Subsystem

The Supervision Subsystem is a platform-level, cross-cutting subsystem that standardizes process orchestration for the SmrtHub desktop stack: startup ordering, health, restart/quarantine behavior, and operator control.

Primary implementation component: SmrtApps/CSApps/SmrtHubSupervisor/README.md

Overview & responsibilities

  • Start SmrtHub components deterministically based on a declarative manifest.
  • Enforce containment and clean teardown of process trees.
  • Provide self-healing with backoff, storm guard, and quarantine.
  • Expose operator control surfaces (local control channel) and health/status surfaces.

Contracts and invariants

All supervised components must follow these contracts:

  1. Manifest-driven identity
  2. Components are addressed by stable IDs declared in the Supervisor manifest.
  3. IDs must be unique and stable across releases.

  4. Deterministic, policy-aligned paths

  5. Executable paths are tokenized and resolved into the canonical staging layout (Apps/<Configuration>/<RID>/...).
  6. Components must not rely on ad-hoc relative working directories.

  7. Graceful shutdown

  8. Components must honor a shutdown request and exit within the configured timeout.
  9. Long-running operations must be interruptible.

  10. Readiness semantics (when enabled)

  11. If a component declares a readiness probe, it must expose the probe target reliably and quickly.
  12. Dependents must not start until prerequisites are ready.

  13. Observability is mandatory

  14. Components must emit structured logs via the unified logging subsystem.
  15. Start/stop, readiness, and fatal failures must be visible in logs.

  16. No restart storms

  17. Components should fail fast when misconfigured and provide clear diagnostics.
  18. Supervisor may quarantine components after repeated failures; components must tolerate being unavailable.

Integration points

  • HubWindow / UI control surfaces: issue operator commands (status, restart, shutdown) through the approved control channel.
  • Support and diagnostics: support bundle and evidence capture workflows rely on Supervisor state (quarantine, restart counters).
  • Health monitoring: a health/status surface exposes Supervisor and component states for tools and operators.

Configuration, paths, and operational data

  • Manifest and staging layout are owned by the Supervisor component and the repo build/staging scripts.
  • Logging and config/state artifacts must still follow Operational Data Policy.

Observability & diagnostics

  • Supervisor emits unified logs (JSON + text) and should include:
  • manifest identity/hash
  • component IDs and transitions (starting, ready, exited, restarting, quarantined)
  • restart counters and backoff windows
  • operator commands (who/what, without secrets)

Security/privacy notes

  • Control surfaces are local-only and must be ACL’d to the correct user context.
  • Do not log secrets from environment variables or config payloads.

Testing & validation expectations

  • Validate manifest parsing and validation (unique IDs, no dependency cycles, probe constraints).
  • Validate dependency ordering and readiness behavior.
  • Validate restart policies and quarantine thresholds.
  • Validate shutdown semantics and Job Object teardown.