Heartbeat Monitoring System¶

The SmrtHub heartbeat monitoring system provides comprehensive real-time health tracking for all critical services and background processes across the Python core engine.

Overview¶

The heartbeat system ensures reliable operation by continuously monitoring the health of key components and providing early warning of potential issues. Each critical service reports its status at regular intervals, enabling proactive maintenance and troubleshooting.

Architecture¶

Heartbeat Components¶

graph TB
    subgraph "Python Core Services"
        SW[Save Worker<br/>30s interval]
        FB[Flask Bridge<br/>15s interval]
        WL[Web Listener<br/>20s interval]
        FI[File Indexer<br/>60s interval]
        RT[Router Tasks<br/>45s interval]
    end

    subgraph "Monitoring Infrastructure"
        HB[Heartbeat Manager]
        LOG[Health Logger]
        ALT[Alert System]
    end

    subgraph "External Monitors"
        CS[C# Components]
        UI[Status UI]
        API[Health API]
    end

    SW --> HB
    FB --> HB
    WL --> HB
    FI --> HB
    RT --> HB

    HB --> LOG
    HB --> ALT

    HB --> CS
    HB --> UI
    HB --> API

Health Status Levels¶

Level	Description	Action Required
🟢 Healthy	Service responding normally	None
🟡 Warning	Service delayed but functional	Monitor closely
🟠 Degraded	Service experiencing issues	Investigation recommended
🔴 Critical	Service unresponsive	Immediate attention required

Implementation Status¶

✅ Critical Services (Completed)¶

1. Save Worker - `runtime/save_worker.py`¶

Location: save_and_route_worker() function
Interval: 30 seconds
Tag: "save_worker"
Purpose: Monitors the background save queue processing
Status: ✅ Fully Implemented

# Initialize heartbeat monitoring for save worker health tracking
heartbeat = python_core.Heartbeat(interval=30, tag="save_worker")
heartbeat.start()

# Worker loop with heartbeat updates
while True:
    try:
        # Process save queue
        process_save_queue()
        heartbeat.pulse()  # Signal healthy operation
    except Exception as e:
        heartbeat.error(str(e))  # Report error condition

2. Flask Bridge Server - `main.py`¶

Location: start_flask_bridge() function
Interval: 15 seconds
Tag: "flask_bridge"
Purpose: Monitors C#/Python communication bridge (port 5001)
Status: ✅ Fully Implemented

# Initialize heartbeat monitoring for Flask bridge health tracking
heartbeat = python_core.Heartbeat(interval=15, tag="flask_bridge")
heartbeat.start()

# Flask application with health monitoring
@app.before_request
def before_request():
    heartbeat.pulse()  # Signal request processing

3. Web Listener - `main.py`¶

Location: start_web_listener() function
Interval: 20 seconds
Tag: "web_listener"
Purpose: Monitors browser extension communication (port 5000)
Status: ✅ Fully Implemented

# Initialize heartbeat monitoring for web listener health tracking
heartbeat = python_core.Heartbeat(interval=20, tag="web_listener")
heartbeat.start()

✅ Secondary Services (Completed)¶

4. File Indexer - `smrtdetect/file_indexer.py`¶

Location: start_background_indexing() function
Interval: 60 seconds
Tag: "file_indexer"
Purpose: Monitors file system indexing operations
Status: ✅ Implemented

5. Router Background Tasks - `runtime/router.py`¶

Location: start_background_tasks() function
Interval: 45 seconds
Tag: "router_tasks"
Purpose: Monitors routing and organization tasks
Status: ✅ Implemented

API Integration¶

Health Check Endpoints¶

The heartbeat system exposes REST API endpoints for external monitoring:

# Health status endpoint
GET /api/health/status
{
    "services": {
        "save_worker": {
            "status": "healthy",
            "last_heartbeat": "2025-07-14T10:30:45Z",
            "interval": 30,
            "consecutive_failures": 0
        },
        "flask_bridge": {
            "status": "healthy", 
            "last_heartbeat": "2025-07-14T10:30:50Z",
            "interval": 15,
            "consecutive_failures": 0
        }
    },
    "overall_status": "healthy"
}

# Individual service status
GET /api/health/service/{service_name}
{
    "service": "save_worker",
    "status": "healthy",
    "last_heartbeat": "2025-07-14T10:30:45Z",
    "uptime": "2d 14h 23m",
    "total_pulses": 4320,
    "error_count": 0
}

C# Integration¶

C# components can monitor Python service health:

// Health monitoring client in C# components
public class HealthMonitor
{
    private readonly HttpClient _client;

    public async Task<HealthStatus> CheckPythonCoreHealth()
    {
        var response = await _client.GetAsync("http://localhost:5001/api/health/status");
        return await response.Content.ReadFromJsonAsync<HealthStatus>();
    }

    public async Task<bool> IsServiceHealthy(string serviceName)
    {
        var status = await CheckPythonCoreHealth();
        return status.Services[serviceName].Status == "healthy";
    }
}

Monitoring and Alerting¶

Log Integration¶

Heartbeat events are integrated with the SmrtHub logging system:

# Logging configuration
import logging
logger = logging.getLogger('smrthub.heartbeat')

# Health events are logged with context
logger.info(f"Service {service_name} heartbeat: {status}")
logger.warning(f"Service {service_name} missed heartbeat (attempt {attempt}/3)")
logger.error(f"Service {service_name} is unresponsive - marking as critical")

Alert Thresholds¶

Condition	Threshold	Alert Level
Missed heartbeat	1 interval	Warning
Consecutive misses	3 intervals	Degraded
Service timeout	5 intervals	Critical
Error reported	Any exception	Warning
Critical error	System exception	Critical

Visual Indicators¶

The SmrtHub component displays real-time health status:

🟢 Green Pulse - All services healthy
🟡 Yellow Flash - Service warnings detected
🔴 Red Alert - Critical service issues
⚫ No Signal - Heartbeat system offline

Configuration¶

Heartbeat Settings¶

# Example configuration shape
HEARTBEAT_CONFIG = {
    "enabled": True,
    "default_interval": 30,
    "timeout_multiplier": 3,
    "alert_on_miss": True,
    "log_level": "INFO",
    "services": {
        "save_worker": {"interval": 30, "critical": True},
        "flask_bridge": {"interval": 15, "critical": True}, 
        "web_listener": {"interval": 20, "critical": False},
        "file_indexer": {"interval": 60, "critical": False},
        "router_tasks": {"interval": 45, "critical": False}
    }
}

Custom Service Registration¶

Add heartbeat monitoring to new services:

import python_core

def my_background_service():
    """Custom service with heartbeat monitoring."""
    # Register heartbeat with custom interval
    heartbeat = python_core.Heartbeat(
        interval=25, 
        tag="my_service",
        critical=True
    )
    heartbeat.start()

    while True:
        try:
            # Perform service work
            do_service_work()
            heartbeat.pulse()  # Signal successful operation

        except Exception as e:
            heartbeat.error(str(e))  # Report error condition
            time.sleep(heartbeat.interval)

Troubleshooting¶

Common Issues¶

Heartbeat Not Starting

# Check if heartbeat is enabled in configuration
import python_core
config = python_core.get_config()
print(f"Heartbeat enabled: {config.heartbeat.enabled}")

# Verify service registration
heartbeat = python_core.Heartbeat(interval=30, tag="test")
print(f"Heartbeat created: {heartbeat.is_active}")

Service Marked as Unhealthy

# Check service logs for errors
import logging
logging.getLogger('smrthub.heartbeat').setLevel(logging.DEBUG)

# Verify service is calling pulse() regularly
heartbeat.pulse()  # Should be called within service loop

# Check for exceptions in service code
try:
    service_operation()
    heartbeat.pulse()
except Exception as e:
    heartbeat.error(str(e))  # Properly report errors

High Memory Usage

# Monitor heartbeat resource usage
import psutil
process = psutil.Process()
print(f"Memory usage: {process.memory_info().rss / 1024 / 1024:.2f} MB")

# Cleanup old heartbeat data
heartbeat.cleanup_old_data(days=7)

Performance Impact¶

The heartbeat system is designed for minimal performance impact:

CPU Usage: < 0.1% on average
Memory Usage: ~2MB for all services
Network Traffic: ~100 bytes per heartbeat pulse
Disk I/O: Minimal logging only on status changes

Future Enhancements¶

Planned Features¶

Predictive Health Analysis - ML-based health trend prediction
Distributed Monitoring - Multi-machine heartbeat coordination
Custom Metrics - Service-specific performance metrics
Integration Monitoring - Health checks for external dependencies
Mobile Alerts - Push notifications for critical issues

API Improvements¶

WebSocket Health Stream - Real-time health event streaming
Historical Data - Long-term health trend analysis
Custom Dashboards - Configurable health monitoring interfaces
Export Capabilities - Health data export for external tools

The heartbeat monitoring system ensures SmrtHub operates reliably and provides early warning of potential issues. For detailed API documentation, see the Python API Reference.