Component: Smrt.ExtractText.Host¶
Canonical source:
SmrtApps/src/Smrt.ExtractText.Host/README.md(mirrored below)
Smrt.ExtractText.Host¶
Host-side execution implementations for Smrt.ExtractText.
Overview and responsibilities¶
- Provides host-owned execution for the ordered candidate list produced by
Smrt.ExtractTextplanning. - Implements local Windows OCR executors (and optional cloud OCR executor) behind
IOcrTextExecutor.
Public surface / entry points¶
OcrTextExecutorMux- Provider executors for local Windows OCR and optional cloud OCR
Dependencies and integrations¶
- Consumes contracts/planning from
Smrt.ExtractText(OcrTextOrchestrator,OcrTextExecutionContract). - Optional cloud OCR executor calls the OCR.Space HTTP API.
Configuration and operational data¶
- No canonical config/state files are owned by this library.
- If the optional OCR.Space executor is enabled, it expects the OCR.Space
api-keyto be supplied viaSmrt.CloudProviders(Credential Manager) for the selected OCR.Space profile.
Observability and diagnostics¶
- Never log OCR payloads or extracted text.
- If outcomes are logged, log metadata only (provider id, elapsed, status).
Testing and validation¶
- Build (Debug, win-x64):
dotnet build SmrtApps/src/Smrt.ExtractText.Host/Smrt.ExtractText.Host.csproj -c Debug -r win-x64dotnet build SmrtApps/src/Smrt.ExtractText.Tests/Smrt.ExtractText.Tests.csproj -c Debug -r win-x64- Unit tests:
dotnet test SmrtApps/src/Smrt.ExtractText.Tests/Smrt.ExtractText.Tests.csproj -c Debug -r win-x64 --no-build- Integration tests (credential/network-gated):
- Gate: set
SMRTHUB_INTEGRATION_TESTS=1
Support Bundle¶
- Not applicable directly (library); collect logs from the hosting application via Support Bundle.
Related docs¶
Smrt.ExtractText: SmrtApps/src/Smrt.ExtractText/README.mdSmrt.ExtractStructuredText(structured OCR): SmrtApps/src/Smrt.ExtractStructuredText/README.md
Purpose¶
Smrt.ExtractText is vendor-agnostic and only defines planning + contracts. This project provides a host implementation of IOcrTextExecutor that can execute the ordered candidate list produced by OcrTextOrchestrator.
This host executor is intended for SmrtHub's local OCR workflows (including clipboard-driven ExtractText actions and any local document extraction provider that chooses to reuse ExtractText as an OCR stage). Cloud document extraction providers typically perform OCR within the provider and do not require this executor.
What’s included¶
OcrTextExecutorMux: executesOcrTextExecutionContract.Candidatesin order and falls back on execution failures.- Local OCR provider executors:
- Windows AI OCR (
local:windowsAi) via late-bindingMicrosoft.Windows.AI.Imaging.TextRecognizer - Legacy Windows OCR (
local:windowsOcrLegacy) via late-bindingWindows.Media.Ocr.OcrEngine - Cloud OCR provider executor:
- OCR.Space (
cloud:OcrSpace/.../OcrSpace) viahttps://api.ocr.space/parse/image
Privacy / logging¶
Executors intentionally avoid logging OCR payloads or extracted text. If you log attempt results, log metadata only (provider id, elapsed ms, status).
Output formatting (best-effort)¶
When OCR engines provide line/word metadata (for example, Lines with BoundingRect), the host layer attempts to reconstruct a more readable plain text output (line breaks, light indentation, and blank-line gaps). This is best-effort and intentionally does not expose geometry as part of the public ExtractText contract.
This formatting preference is controlled by OcrTextOptions.PreferLayoutAwareText (default on) and can be set via Extract Text Settings / Quick Settings.
Typical wiring¶
Create the mux with the provider executors you want to enable, then pass it to OcrTextOrchestrator.
This keeps fallback behavior scoped to Smrt.ExtractText and lets app settings control candidate ordering.