Skip to content

Component: SmrtDetect

Canonical source: SmrtApps/PythonApp/python_core/smrtdetect/README.md (mirrored below)


python_core.smrtdetect

Intelligent Source Detection & Resolution System

The smrtdetect package provides comprehensive source detection and file resolution capabilities for SmrtHub. It identifies where clipboard content originated from (web pages, local files, etc.) and intelligently resolves file paths through multiple strategies. Module docstrings were refreshed in November 2025 to expose reStructuredText metadata while this README remains the canonical deep-dive for behaviors, heuristics, and integration points.


πŸ—οΈ Architecture Overview

Core Components

  1. localsource.py - Local file source detection and resolution
  2. websource.py - Web browser tab tracking and URL extraction
  3. file_indexer.py - High-performance SQLite-backed file index with lock-free reads
  4. recent_folders.py / recent_folder_tracker.py - User's recently accessed directories
  5. filename.py - Filename extraction and parsing utilities
  6. adobe_parser.py - Specialized Adobe application title parsing
  7. filetype_detect/ - Rule-based filetype detection engine

πŸ”„ Resolution Flow

Local Source Resolution (localsource.py)

When a user copies content, SmrtHub attempts to identify the source file through multiple strategies:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  User Copies Content β†’ Active Window Detected               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Stage 1: Window Title Analysis                             β”‚
β”‚  β€’ Extract filename from window title                        β”‚
β”‚  β€’ Parse application-specific patterns (Word, Adobe, etc.)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Stage 2: Resolution Strategy (Intent-Aware)                β”‚
β”‚                                                              β”‚
β”‚  Priority 1: Windows Explorer                               β”‚
β”‚  β€’ Direct file access via COM interface                      β”‚
β”‚  β€’ Instant, authoritative resolution                         β”‚
β”‚                                                              β”‚
β”‚  Priority 2: Recent Folders (User-Focused)                  β”‚
β”‚  β€’ Check user's recently accessed directories                β”‚
β”‚  β€’ Fast, context-aware matching                             β”‚
β”‚  β€’ Cancels if focus changes (user moved on)                 β”‚
β”‚                                                              β”‚
β”‚  Priority 3: File Index (Comprehensive)                     β”‚
β”‚  β€’ Lock-free SQLite index with variant matching             β”‚
β”‚  β€’ Background updates via FileSystemWatcher                  β”‚
β”‚  β€’ Cancels if focus changes                                 β”‚
β”‚                                                              β”‚
β”‚  Priority 4: Full Filesystem Walk (Last Resort)             β”‚
β”‚  β€’ Only when Index/Recent miss AND focus maintained         β”‚
β”‚  β€’ Cancels immediately if user switches focus               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Stage 3: Update Shared State                               β”‚
β”‚  β€’ Store resolved path, filename, app identity               β”‚
β”‚  β€’ Trigger snapshot logging (truncated for readability)     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Features: - Focus-Aware Cancellation: If user switches windows during resolution, abort the search (they've moved on) - Intent-Driven Strategy: Recent folders for user-focused work, Index for comprehensive coverage - Non-Blocking: Index operations never lock user out (lock-free reads, async updates) - Adobe Retry Logic: Special handling for Adobe apps with malformed titles - COM-First Word Resolution: Prioritizes Word's COM interface for instant access - Exclusion Framework: System apps, dev tools, and SmrtHub components are blocked from source tracking

Exclusion Framework

Local source detection includes a robust exclusion system that prevents irrelevant apps from appearing as clipboard sources:

Check Location Purpose
is_excluded_app(app_name) Entry of track_local_source() Block excluded foreground apps immediately
is_excluded_window(title) Entry of track_local_source() Skip windows with dev/system keywords
is_excluded_app(name) _commit_source_and_track() Defense-in-depth: block excluded names at commit

Why Defense-in-Depth?

When a user copies text from Windows Terminal showing "PythonApp.exe", the flow is: 1. Foreground app = WindowsTerminal.exe (excluded at entry βœ“) 2. BUT if that check fails, the extracted filename = PythonApp.exe would be committed

The commit-level check catches this edge case, ensuring excluded app names never reach shared state.

See python_core.config.localsource_exclusions for the canonical exclusion lists and helpers.


πŸ“ File Indexer Architecture (file_indexer.py v2.5.0)

Design Principles

The file indexer was completely refactored to eliminate user lockout during refresh operations:

Before (v2.4.1): - ❌ Long-held locks during filesystem walks (30-60 seconds) - ❌ Users locked out during refresh operations - ❌ No true delta refresh despite the name - ❌ Full rebuilds on every startup

After (v2.5.0): - βœ… Lock-free get() operations (reads from memory cache) - βœ… Batched updates with <100ms lock holds - βœ… Real-time FileSystemWatcher with debouncing - βœ… Background initial build (users can work immediately) - βœ… Async last_seen updates (non-blocking)

Lock-Free Architecture

class SQLiteIndex:
    def get(self, name: str) -> Optional[str]:
        """Resolve `name` via in-memory variants without blocking writes."""
        variants = generate_variants(name)
        now = datetime.now(timezone.utc)

        for variant in variants:
            path = self._memory_index.get(variant)
            if path and Path(path).exists():
                self._update_last_seen_async(variant, now)
                return path

        return None

Batched Updates (Non-Blocking Refresh)

def refresh_index(specific_folders=None):
    """
    Refresh index with batched, non-blocking updates.

    Phase 1: Collect all file updates WITHOUT lock (filesystem walk)
    Phase 2: Apply updates in batches of 1000 with minimal lock time
    """
    # Phase 1: Collect files (NO LOCK)
    files_to_add = []
    for folder in folders:
        for root, dirs, files in os.walk(folder):
            for file in files:
                files_to_add.append((file.lower(), full_path, stat.st_mtime))

    # Phase 2: Batch apply with lock (1000 files at a time)
    for i in range(0, len(files_to_add), BATCH_SIZE):
        batch = files_to_add[i:i+BATCH_SIZE]
        with _index_lock:  # Lock held for <100ms
            cursor.executemany("INSERT OR REPLACE ...", batch)
            conn.commit()
        # Release lock between batches

Real-Time FileSystemWatcher

class IndexUpdateHandler(FileSystemEventHandler):
    """
    Handles filesystem events with debouncing and batch processing.

    - Debouncing: 2-second window for bulk operations
    - Batch processing: Group rapid changes together
    - Graceful degradation: Logs errors but never crashes
    """

    def on_created(self, event):
        """File created - add to index after debounce"""

    def on_deleted(self, event):
        """File deleted - remove from index after debounce"""

    def on_modified(self, event):
        """File modified - update timestamp after debounce"""

    def on_moved(self, event):
        """File moved - update path after debounce"""

Configuration: - enable_filesystem_watcher: true - Enable real-time monitoring (default) - WATCHER_DEBOUNCE_SECONDS: 2.0 - Wait 2 seconds before processing events - BATCH_SIZE: 1000 - Process 1000 files per batch - MAX_LOCK_HOLD_MS: 100 - Target <100ms lock holds


🌐 Web Source Detection (websource.py)

Tracks active browser tabs and extracts page metadata:

Flow

  1. Browser Extension sends tab metadata to Flask endpoint
  2. websource.py extracts title, URL, favicon
  3. App Identity enriched with browser process info
  4. Shared State updated with web source type

Key Features

  • Smart Title Extraction: Handles empty titles, strips " - Google Search", cleans formatting
  • Favicon Handling: Base64 encoding for inline display
  • Browser Process Detection: Identifies Chrome, Edge, Firefox via foreground window
  • Silent Success: Only logs failures (success is evident from shared state update)

Recent Change (2025-10-21): Removed redundant success logging. If shared state updates, that's confirmationβ€”no need for extra "[websource] βœ… Updated..." messages cluttering the terminal.


🎯 Recent Folders Strategy

Why Recent Folders?

The file indexer is comprehensive but can be slow for large directory trees. Recent folders provide a user-focused fast path:

  • βœ… User's active work directories (high probability)
  • βœ… ~10-50 folders vs thousands in index
  • βœ… Cancels if focus changes (respects user intent)
  • βœ… Falls back to index for comprehensive coverage

Integration with Indexer

Resolution Priority:
1. Explorer (instant, authoritative)
2. Recent Folders (fast, user-focused)
3. File Index (comprehensive, lock-free)
4. Filesystem Walk (last resort, cancelable)

πŸ”§ Configuration

File Indexer Config

Located in python-core-file-index.defaults.json:

{
  "index": {
    "enabled": true,
    "database_path": "%LOCALAPPDATA%/SmrtHub/SmrtDB/localsource_index.db",
    "enable_filesystem_watcher": true,
    "tracked_folders": [
      "%USERPROFILE%/Documents",
      "%USERPROFILE%/Desktop",
      "%USERPROFILE%/Downloads"
    ]
  }
}

Performance Tuning

Constants in file_indexer.py: - BATCH_SIZE = 1000 - Files per batch during refresh - WATCHER_DEBOUNCE_SECONDS = 2.0 - Event debounce window - MAX_LOCK_HOLD_MS = 100 - Target lock hold time - MAX_VARIANTS_PER_FILE = 100 - Prevent pollution attacks


πŸ“Š Database Schema

SQLite Index (localsource_index.db)

CREATE TABLE file_index (
    filename TEXT NOT NULL,           -- Lowercase filename for case-insensitive matching
    full_path TEXT NOT NULL,          -- Absolute path to file
    last_seen INTEGER NOT NULL,       -- Unix timestamp (mtime)
    PRIMARY KEY (filename, full_path)
);

CREATE INDEX idx_filename ON file_index(filename);
CREATE INDEX idx_last_seen ON file_index(last_seen DESC);

Features: - WAL mode for concurrent reads during writes - Composite primary key prevents duplicates - Indexes optimized for lookup and cleanup queries - Variant tracking: multiple paths per filename


πŸ› Debugging

Terminal Logging

Enable console output in python-core-logging.defaults.json:

{
  "console": true,
  "level": "INFO"
}

Watch shared state updates in real-time:

======================================================================
πŸ“₯ [SHARED STATE UPDATED]
πŸ”’ Source Type: web
πŸ“› Source Name: GitHub - Home
πŸ“ Source Location: https://github.com/
πŸ“‹ Clipboard: Empty
======================================================================

Truncation (2025-10-21): - Source names: 100 character limit - URLs: 150 character limit - Prevents terminal spam from long URLs/titles

Common Issues

Index not finding files: - Check tracked_folders config includes the directory - Verify filesystem watcher is enabled - Check logs for permission errors during indexing

Slow resolution: - Recent folders should resolve in <100ms - Index lookups are lock-free and instant - If falling back to filesystem walk, consider adding folder to tracked_folders

Focus-aware cancellation: - If user switches windows during resolution, search aborts - This is intentionalβ€”user has moved on, no point continuing - Check logs for "[LocalSource] 🚫 Focus changed..." messages


Mouse Hook Named Pipe (localsource.py)

The local source tracker listens for low‑latency mouse events from the C# hook over a Windows named pipe:

  • Pipe name: \\.\pipe\smrthub_mousehook
  • Server: localsource._named_pipe_listener() (Python)
  • Client: MouseHookPipe (C#)

Implementation notes:

  • Security attributes: the pipe is created with CreateNamedPipe(..., sa=None) which maps to a NULL SECURITY_ATTRIBUTES. This is the correct and most compatible choice (especially under PyInstaller). Attempting to pass a bare pywintypes.SECURITY_ATTRIBUTES() results in runtime errors like β€œThe object is not a PySECURITY_ATTRIBUTES object”.
  • Disconnects are expected: when the C# client connects, writes a small message, and closes, ReadFile may raise 109 (ERROR_BROKEN_PIPE) or 232 (ERROR_NO_DATA). These are handled as normal disconnects; the listener logs, disconnects, and waits for the next connection.
  • Debugger tip: if your IDE is set to break on all β€œraised” (first‑chance) exceptions, it may pause on these expected 109/232 cases even though they’re handled. Configure exception settings to break on uncaught exceptions to avoid noise.

Stability: - Errors outside 109/232 are logged and the listener restarts after a short delay. - The worker that triggers track_local_source runs on a background thread with a tiny delay and respects the global cancel event.

Performance Metrics

Lock-Free Operations (v2.5.0)

  • get() lookup: 0ms lock time (reads from memory)
  • Batch refresh: <100ms per 1000 files (lock held)
  • Initial build: Non-blocking (background thread)
  • Filesystem watcher: 2s debounce (batched updates)

Resolution Speed

  • Explorer: ~10ms (COM interface)
  • Recent Folders: ~50-100ms (focused search)
  • File Index: ~5ms (lock-free SQLite)
  • Filesystem Walk: Variable (cancels on focus change)

πŸ“ Version History

v2.5.3 (2025-11-08) β€” Docstring Metadata Refresh

  • Added reStructuredText :param:/:returns: metadata across file indexer helpers.
  • Confirmed README remains authoritative for architectural details referenced by docstrings.
  • No behavioral changes; updates are documentation-only.

v2.5.2 (2025-11-08) β€” Documentation Cleanup

  • Trimmed smrtdetect module docstrings to concise summaries that point here for full details.
  • Refreshed README examples to match current lock-free indexer implementation.
  • Normalized README headings to ASCII for policy compliance.

v2.5.1 (2025-11-03) β€” Named Pipe Robustness

  • Fix: Create the named pipe with SECURITY_ATTRIBUTES=None to support packaged (PyInstaller) runtime and prevent PySECURITY_ATTRIBUTES errors.
  • Behavior: Treat ERROR_BROKEN_PIPE (109) and ERROR_NO_DATA (232) as expected client disconnects; restart accept loop.
  • Docs: Added debugger guidance for first‑chance exceptions.

v2.5.0 (2025-10-21) - Lock-Free Architecture

  • Breaking Change: Refactored file_indexer to lock-free reads
  • Added FileSystemWatcher with debouncing
  • Batched updates with <100ms lock holds
  • Background initial build (non-blocking startup)
  • Async last_seen updates

v2.4.1 - Focus-Aware Cancellation

  • Added intent-driven resolution strategy
  • Focus change detection during slow operations
  • Prioritized Recent Folders over Index

v2.4.0 - Adobe Retry Logic

  • Special handling for malformed Adobe titles
  • COM-first Word resolution
  • Enhanced filename extraction

  • Logging System: ../utils/README.md β†’ ../../../README.Files/Reference-Guides/SmrtHub.Logging.README.md
  • Shared State: ../runtime/shared_state.py (centralized state store)
  • Configuration: README.Files/Python_Core.Config.README.md
  • Architecture: README.Files/System/Policies/SmrtHub-Architecture_Platform_Policy.README.md

Maintained by: Christopher C. Flaten
Last Updated: 2025-10-21