Skip to content

Structured Output Telemetry (v1.0)

This page is the in-site view of the authoritative telemetry schema in README.Files/System/Policies/SmrtHub-StructuredOutput-Telemetry-v1.0.README.md.

SmrtHub Structured Output Telemetry v1.0

Canonical event naming + minimal field schema for the ExtractStructuredText persistence pipeline.

1. Purpose

Standardize structured telemetry so operators can correlate the structured-output pipeline across:

  • TriggerManager (C# local OCR producer + HTTP persistence client)
  • python-net (Python bridge HTTP ingress)
  • python-runtime (Python queue worker + SmrtSpace persistence)

This schema is metadata-only: do not log extracted content or document bytes.

2. Event Schema (Minimum)

All structured-output telemetry events MUST include:

  • kind: structured-output
  • eventVersion: 1

Recommended correlation fields:

  • runId: correlation for one extraction run (single hotkey run or batch)
  • requestId: correlation for a single persisted item

Common optional fields (when applicable):

  • route: HTTP route (/structured-output, /structured-output/batch)
  • status: HTTP status code (e.g., 202, 400, 413, 500)
  • reason: rejection reason (rejections only)
  • outcome: normalized outcome for result classification
  • outcomeKind: subtype describing the reason/class (http, exception, retryable-failure-exhausted, etc.)
  • attempt, maxAttempts, delayMs, responseLength
  • relativePath: SmrtSpace-relative path (never absolute)

Privacy requirements:

  • Never include extracted text or structured OCR JSON in structured logs.
  • Never include absolute local file paths in structured logs; prefer relativePath and/or sanitized file name metadata.

3. Canonical Event Names

3.1 python-net (Ingress)

Emitted by POST /structured-output and POST /structured-output/batch handlers.

  • structured-output.received
  • structured-output.batch-received
  • structured-output.rejected
  • structured-output.batch-rejected

3.2 python-runtime (Queue + Persistence)

  • structured-output.worker.completed
  • structured-output.persisted
  • outcome: saved | duplicate-suppressed

3.3 TriggerManager (HTTP Persistence)

  • structured-output.post.retry-scheduled
  • structured-output.post.completed
  • structured-output.post.failed
  • structured-output.post.exception
  • structured-output.batch.fallback
  • structured-output.persist.summary

4. Versioning Rules

  • Any breaking change to required fields or event semantics increments eventVersion.
  • Prefer additive evolution (new optional fields) over renames.

5. Where This Is Implemented

  • TriggerManager: SmrtApps/CSApps/TriggerManager
  • python-net ingress routes: SmrtApps/PythonApp/python_core/net/routes
  • python-runtime worker/saver: SmrtApps/PythonApp/python_core/runtime

6. Machine-Readable Contract (Tests)

This repo also ships a machine-readable contract used by regression tests to prevent schema drift:

  • README.Files/System/Policies/SmrtHub-StructuredOutput-Telemetry-v1.0.schema.json

The canonical documentation remains this README; the JSON file exists solely as a stable test contract.