Component: SmrtDetect¶
Canonical source:
SmrtApps/PythonApp/python_core/smrtdetect/README.md(mirrored below)
python_core.smrtdetect¶
Intelligent Source Detection & Resolution System
The smrtdetect package provides comprehensive source detection and file resolution capabilities for SmrtHub. It identifies where clipboard content originated from (web pages, local files, etc.) and intelligently resolves file paths through multiple strategies. Module docstrings were refreshed in November 2025 to expose reStructuredText metadata while this README remains the canonical deep-dive for behaviors, heuristics, and integration points.
ποΈ Architecture Overview¶
Core Components¶
- localsource.py - Local file source detection and resolution
- websource.py - Web browser tab tracking and URL extraction
- file_indexer.py - High-performance SQLite-backed file index with lock-free reads
- recent_folders.py / recent_folder_tracker.py - User's recently accessed directories
- filename.py - Filename extraction and parsing utilities
- adobe_parser.py - Specialized Adobe application title parsing
- filetype_detect/ - Rule-based filetype detection engine
π Resolution Flow¶
Local Source Resolution (localsource.py)¶
When a user copies content, SmrtHub attempts to identify the source file through multiple strategies:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β User Copies Content β Active Window Detected β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 1: Window Title Analysis β
β β’ Extract filename from window title β
β β’ Parse application-specific patterns (Word, Adobe, etc.) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 2: Resolution Strategy (Intent-Aware) β
β β
β Priority 1: Windows Explorer β
β β’ Direct file access via COM interface β
β β’ Instant, authoritative resolution β
β β
β Priority 2: Recent Folders (User-Focused) β
β β’ Check user's recently accessed directories β
β β’ Fast, context-aware matching β
β β’ Cancels if focus changes (user moved on) β
β β
β Priority 3: File Index (Comprehensive) β
β β’ Lock-free SQLite index with variant matching β
β β’ Background updates via FileSystemWatcher β
β β’ Cancels if focus changes β
β β
β Priority 4: Full Filesystem Walk (Last Resort) β
β β’ Only when Index/Recent miss AND focus maintained β
β β’ Cancels immediately if user switches focus β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Stage 3: Update Shared State β
β β’ Store resolved path, filename, app identity β
β β’ Trigger snapshot logging (truncated for readability) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Features: - Focus-Aware Cancellation: If user switches windows during resolution, abort the search (they've moved on) - Intent-Driven Strategy: Recent folders for user-focused work, Index for comprehensive coverage - Non-Blocking: Index operations never lock user out (lock-free reads, async updates) - Adobe Retry Logic: Special handling for Adobe apps with malformed titles - COM-First Word Resolution: Prioritizes Word's COM interface for instant access - Exclusion Framework: System apps, dev tools, and SmrtHub components are blocked from source tracking
Exclusion Framework¶
Local source detection includes a robust exclusion system that prevents irrelevant apps from appearing as clipboard sources:
| Check | Location | Purpose |
|---|---|---|
is_excluded_app(app_name) |
Entry of track_local_source() |
Block excluded foreground apps immediately |
is_excluded_window(title) |
Entry of track_local_source() |
Skip windows with dev/system keywords |
is_excluded_app(name) |
_commit_source_and_track() |
Defense-in-depth: block excluded names at commit |
Why Defense-in-Depth?
When a user copies text from Windows Terminal showing "PythonApp.exe", the flow is:
1. Foreground app = WindowsTerminal.exe (excluded at entry β)
2. BUT if that check fails, the extracted filename = PythonApp.exe would be committed
The commit-level check catches this edge case, ensuring excluded app names never reach shared state.
See python_core.config.localsource_exclusions for the canonical exclusion lists and helpers.
π File Indexer Architecture (file_indexer.py v2.5.0)¶
Design Principles¶
The file indexer was completely refactored to eliminate user lockout during refresh operations:
Before (v2.4.1): - β Long-held locks during filesystem walks (30-60 seconds) - β Users locked out during refresh operations - β No true delta refresh despite the name - β Full rebuilds on every startup
After (v2.5.0):
- β
Lock-free get() operations (reads from memory cache)
- β
Batched updates with <100ms lock holds
- β
Real-time FileSystemWatcher with debouncing
- β
Background initial build (users can work immediately)
- β
Async last_seen updates (non-blocking)
Lock-Free Architecture¶
class SQLiteIndex:
def get(self, name: str) -> Optional[str]:
"""Resolve `name` via in-memory variants without blocking writes."""
variants = generate_variants(name)
now = datetime.now(timezone.utc)
for variant in variants:
path = self._memory_index.get(variant)
if path and Path(path).exists():
self._update_last_seen_async(variant, now)
return path
return None
Batched Updates (Non-Blocking Refresh)¶
def refresh_index(specific_folders=None):
"""
Refresh index with batched, non-blocking updates.
Phase 1: Collect all file updates WITHOUT lock (filesystem walk)
Phase 2: Apply updates in batches of 1000 with minimal lock time
"""
# Phase 1: Collect files (NO LOCK)
files_to_add = []
for folder in folders:
for root, dirs, files in os.walk(folder):
for file in files:
files_to_add.append((file.lower(), full_path, stat.st_mtime))
# Phase 2: Batch apply with lock (1000 files at a time)
for i in range(0, len(files_to_add), BATCH_SIZE):
batch = files_to_add[i:i+BATCH_SIZE]
with _index_lock: # Lock held for <100ms
cursor.executemany("INSERT OR REPLACE ...", batch)
conn.commit()
# Release lock between batches
Real-Time FileSystemWatcher¶
class IndexUpdateHandler(FileSystemEventHandler):
"""
Handles filesystem events with debouncing and batch processing.
- Debouncing: 2-second window for bulk operations
- Batch processing: Group rapid changes together
- Graceful degradation: Logs errors but never crashes
"""
def on_created(self, event):
"""File created - add to index after debounce"""
def on_deleted(self, event):
"""File deleted - remove from index after debounce"""
def on_modified(self, event):
"""File modified - update timestamp after debounce"""
def on_moved(self, event):
"""File moved - update path after debounce"""
Configuration:
- enable_filesystem_watcher: true - Enable real-time monitoring (default)
- WATCHER_DEBOUNCE_SECONDS: 2.0 - Wait 2 seconds before processing events
- BATCH_SIZE: 1000 - Process 1000 files per batch
- MAX_LOCK_HOLD_MS: 100 - Target <100ms lock holds
π Web Source Detection (websource.py)¶
Tracks active browser tabs and extracts page metadata:
Flow¶
- Browser Extension sends tab metadata to Flask endpoint
- websource.py extracts title, URL, favicon
- App Identity enriched with browser process info
- Shared State updated with web source type
Key Features¶
- Smart Title Extraction: Handles empty titles, strips " - Google Search", cleans formatting
- Favicon Handling: Base64 encoding for inline display
- Browser Process Detection: Identifies Chrome, Edge, Firefox via foreground window
- Silent Success: Only logs failures (success is evident from shared state update)
Recent Change (2025-10-21): Removed redundant success logging. If shared state updates, that's confirmationβno need for extra "[websource] β Updated..." messages cluttering the terminal.
π― Recent Folders Strategy¶
Why Recent Folders?¶
The file indexer is comprehensive but can be slow for large directory trees. Recent folders provide a user-focused fast path:
- β User's active work directories (high probability)
- β ~10-50 folders vs thousands in index
- β Cancels if focus changes (respects user intent)
- β Falls back to index for comprehensive coverage
Integration with Indexer¶
Resolution Priority:
1. Explorer (instant, authoritative)
2. Recent Folders (fast, user-focused)
3. File Index (comprehensive, lock-free)
4. Filesystem Walk (last resort, cancelable)
π§ Configuration¶
File Indexer Config¶
Located in python-core-file-index.defaults.json:
{
"index": {
"enabled": true,
"database_path": "%LOCALAPPDATA%/SmrtHub/SmrtDB/localsource_index.db",
"enable_filesystem_watcher": true,
"tracked_folders": [
"%USERPROFILE%/Documents",
"%USERPROFILE%/Desktop",
"%USERPROFILE%/Downloads"
]
}
}
Performance Tuning¶
Constants in file_indexer.py:
- BATCH_SIZE = 1000 - Files per batch during refresh
- WATCHER_DEBOUNCE_SECONDS = 2.0 - Event debounce window
- MAX_LOCK_HOLD_MS = 100 - Target lock hold time
- MAX_VARIANTS_PER_FILE = 100 - Prevent pollution attacks
π Database Schema¶
SQLite Index (localsource_index.db)¶
CREATE TABLE file_index (
filename TEXT NOT NULL, -- Lowercase filename for case-insensitive matching
full_path TEXT NOT NULL, -- Absolute path to file
last_seen INTEGER NOT NULL, -- Unix timestamp (mtime)
PRIMARY KEY (filename, full_path)
);
CREATE INDEX idx_filename ON file_index(filename);
CREATE INDEX idx_last_seen ON file_index(last_seen DESC);
Features: - WAL mode for concurrent reads during writes - Composite primary key prevents duplicates - Indexes optimized for lookup and cleanup queries - Variant tracking: multiple paths per filename
π Debugging¶
Terminal Logging¶
Enable console output in python-core-logging.defaults.json:
Watch shared state updates in real-time:
======================================================================
π₯ [SHARED STATE UPDATED]
π’ Source Type: web
π Source Name: GitHub - Home
π Source Location: https://github.com/
π Clipboard: Empty
======================================================================
Truncation (2025-10-21): - Source names: 100 character limit - URLs: 150 character limit - Prevents terminal spam from long URLs/titles
Common Issues¶
Index not finding files:
- Check tracked_folders config includes the directory
- Verify filesystem watcher is enabled
- Check logs for permission errors during indexing
Slow resolution: - Recent folders should resolve in <100ms - Index lookups are lock-free and instant - If falling back to filesystem walk, consider adding folder to tracked_folders
Focus-aware cancellation: - If user switches windows during resolution, search aborts - This is intentionalβuser has moved on, no point continuing - Check logs for "[LocalSource] π« Focus changed..." messages
Mouse Hook Named Pipe (localsource.py)¶
The local source tracker listens for lowβlatency mouse events from the C# hook over a Windows named pipe:
- Pipe name:
\\.\pipe\smrthub_mousehook - Server:
localsource._named_pipe_listener()(Python) - Client:
MouseHookPipe(C#)
Implementation notes:
- Security attributes: the pipe is created with
CreateNamedPipe(..., sa=None)which maps to a NULLSECURITY_ATTRIBUTES. This is the correct and most compatible choice (especially under PyInstaller). Attempting to pass a barepywintypes.SECURITY_ATTRIBUTES()results in runtime errors like βThe object is not a PySECURITY_ATTRIBUTES objectβ. - Disconnects are expected: when the C# client connects, writes a small message, and closes,
ReadFilemay raise 109 (ERROR_BROKEN_PIPE) or 232 (ERROR_NO_DATA). These are handled as normal disconnects; the listener logs, disconnects, and waits for the next connection. - Debugger tip: if your IDE is set to break on all βraisedβ (firstβchance) exceptions, it may pause on these expected 109/232 cases even though theyβre handled. Configure exception settings to break on uncaught exceptions to avoid noise.
Stability:
- Errors outside 109/232 are logged and the listener restarts after a short delay.
- The worker that triggers track_local_source runs on a background thread with a tiny delay and respects the global cancel event.
Performance Metrics¶
Lock-Free Operations (v2.5.0)¶
get()lookup: 0ms lock time (reads from memory)- Batch refresh: <100ms per 1000 files (lock held)
- Initial build: Non-blocking (background thread)
- Filesystem watcher: 2s debounce (batched updates)
Resolution Speed¶
- Explorer: ~10ms (COM interface)
- Recent Folders: ~50-100ms (focused search)
- File Index: ~5ms (lock-free SQLite)
- Filesystem Walk: Variable (cancels on focus change)
π Version History¶
v2.5.3 (2025-11-08) β Docstring Metadata Refresh¶
- Added reStructuredText
:param:/:returns:metadata across file indexer helpers. - Confirmed README remains authoritative for architectural details referenced by docstrings.
- No behavioral changes; updates are documentation-only.
v2.5.2 (2025-11-08) β Documentation Cleanup¶
- Trimmed smrtdetect module docstrings to concise summaries that point here for full details.
- Refreshed README examples to match current lock-free indexer implementation.
- Normalized README headings to ASCII for policy compliance.
v2.5.1 (2025-11-03) β Named Pipe Robustness¶
- Fix: Create the named pipe with
SECURITY_ATTRIBUTES=Noneto support packaged (PyInstaller) runtime and preventPySECURITY_ATTRIBUTESerrors. - Behavior: Treat ERROR_BROKEN_PIPE (109) and ERROR_NO_DATA (232) as expected client disconnects; restart accept loop.
- Docs: Added debugger guidance for firstβchance exceptions.
v2.5.0 (2025-10-21) - Lock-Free Architecture¶
- Breaking Change: Refactored file_indexer to lock-free reads
- Added FileSystemWatcher with debouncing
- Batched updates with <100ms lock holds
- Background initial build (non-blocking startup)
- Async last_seen updates
v2.4.1 - Focus-Aware Cancellation¶
- Added intent-driven resolution strategy
- Focus change detection during slow operations
- Prioritized Recent Folders over Index
v2.4.0 - Adobe Retry Logic¶
- Special handling for malformed Adobe titles
- COM-first Word resolution
- Enhanced filename extraction
π Related Documentation¶
- Logging System:
../utils/README.mdβ../../../README.Files/Reference-Guides/SmrtHub.Logging.README.md - Shared State:
../runtime/shared_state.py(centralized state store) - Configuration:
README.Files/Python_Core.Config.README.md - Architecture:
README.Files/System/Policies/SmrtHub-Architecture_Platform_Policy.README.md
Maintained by: Christopher C. Flaten
Last Updated: 2025-10-21