Skip to content

Component: Smrt.ExtractText.Host

Canonical source: SmrtApps/src/Smrt.ExtractText.Host/README.md (mirrored below)


Smrt.ExtractText.Host

Host-side execution implementations for Smrt.ExtractText.

Overview and responsibilities

  • Provides host-owned execution for the ordered candidate list produced by Smrt.ExtractText planning.
  • Implements local Windows OCR executors (and optional cloud OCR executor) behind IOcrTextExecutor.

Public surface / entry points

  • OcrTextExecutorMux
  • Provider executors for local Windows OCR and optional cloud OCR

Dependencies and integrations

  • Consumes contracts/planning from Smrt.ExtractText (OcrTextOrchestrator, OcrTextExecutionContract).
  • Optional cloud OCR executor calls the OCR.Space HTTP API.

Configuration and operational data

  • No canonical config/state files are owned by this library.
  • If the optional OCR.Space executor is enabled, it expects the OCR.Space api-key to be supplied via Smrt.CloudProviders (Credential Manager) for the selected OCR.Space profile.

Observability and diagnostics

  • Never log OCR payloads or extracted text.
  • If outcomes are logged, log metadata only (provider id, elapsed, status).

Testing and validation

  • Build (Debug, win-x64):
  • dotnet build SmrtApps/src/Smrt.ExtractText.Host/Smrt.ExtractText.Host.csproj -c Debug -r win-x64
  • dotnet build SmrtApps/src/Smrt.ExtractText.Tests/Smrt.ExtractText.Tests.csproj -c Debug -r win-x64
  • Unit tests:
  • dotnet test SmrtApps/src/Smrt.ExtractText.Tests/Smrt.ExtractText.Tests.csproj -c Debug -r win-x64 --no-build
  • Integration tests (credential/network-gated):
  • Gate: set SMRTHUB_INTEGRATION_TESTS=1

Support Bundle

  • Not applicable directly (library); collect logs from the hosting application via Support Bundle.

Purpose

Smrt.ExtractText is vendor-agnostic and only defines planning + contracts. This project provides a host implementation of IOcrTextExecutor that can execute the ordered candidate list produced by OcrTextOrchestrator.

This host executor is intended for SmrtHub's local OCR workflows (including clipboard-driven ExtractText actions and any local document extraction provider that chooses to reuse ExtractText as an OCR stage). Cloud document extraction providers typically perform OCR within the provider and do not require this executor.

What’s included

  • OcrTextExecutorMux: executes OcrTextExecutionContract.Candidates in order and falls back on execution failures.
  • Local OCR provider executors:
  • Windows AI OCR (local:windowsAi) via late-binding Microsoft.Windows.AI.Imaging.TextRecognizer
  • Legacy Windows OCR (local:windowsOcrLegacy) via late-binding Windows.Media.Ocr.OcrEngine
  • Cloud OCR provider executor:
  • OCR.Space (cloud:OcrSpace/.../OcrSpace) via https://api.ocr.space/parse/image

Privacy / logging

Executors intentionally avoid logging OCR payloads or extracted text. If you log attempt results, log metadata only (provider id, elapsed ms, status).

Output formatting (best-effort)

When OCR engines provide line/word metadata (for example, Lines with BoundingRect), the host layer attempts to reconstruct a more readable plain text output (line breaks, light indentation, and blank-line gaps). This is best-effort and intentionally does not expose geometry as part of the public ExtractText contract.

This formatting preference is controlled by OcrTextOptions.PreferLayoutAwareText (default on) and can be set via Extract Text Settings / Quick Settings.

Typical wiring

Create the mux with the provider executors you want to enable, then pass it to OcrTextOrchestrator.

This keeps fallback behavior scoped to Smrt.ExtractText and lets app settings control candidate ordering.