Skip to content

Heartbeat Monitoring System

The SmrtHub heartbeat monitoring system provides comprehensive real-time health tracking for all critical services and background processes across the Python core engine.

Overview

The heartbeat system ensures reliable operation by continuously monitoring the health of key components and providing early warning of potential issues. Each critical service reports its status at regular intervals, enabling proactive maintenance and troubleshooting.

Architecture

Heartbeat Components

graph TB
    subgraph "Python Core Services"
        SW[Save Worker<br/>30s interval]
        FB[Flask Bridge<br/>15s interval]
        WL[Web Listener<br/>20s interval]
        FI[File Indexer<br/>60s interval]
        RT[Router Tasks<br/>45s interval]
    end

    subgraph "Monitoring Infrastructure"
        HB[Heartbeat Manager]
        LOG[Health Logger]
        ALT[Alert System]
    end

    subgraph "External Monitors"
        CS[C# Components]
        UI[Status UI]
        API[Health API]
    end

    SW --> HB
    FB --> HB
    WL --> HB
    FI --> HB
    RT --> HB

    HB --> LOG
    HB --> ALT

    HB --> CS
    HB --> UI
    HB --> API

Health Status Levels

Level Description Action Required
🟢 Healthy Service responding normally None
🟡 Warning Service delayed but functional Monitor closely
🟠 Degraded Service experiencing issues Investigation recommended
🔴 Critical Service unresponsive Immediate attention required

Implementation Status

✅ Critical Services (Completed)

1. Save Worker - runtime/save_worker.py

  • Location: save_and_route_worker() function
  • Interval: 30 seconds
  • Tag: "save_worker"
  • Purpose: Monitors the background save queue processing
  • Status: ✅ Fully Implemented
# Initialize heartbeat monitoring for save worker health tracking
heartbeat = python_core.Heartbeat(interval=30, tag="save_worker")
heartbeat.start()

# Worker loop with heartbeat updates
while True:
    try:
        # Process save queue
        process_save_queue()
        heartbeat.pulse()  # Signal healthy operation
    except Exception as e:
        heartbeat.error(str(e))  # Report error condition

2. Flask Bridge Server - __main__.py

  • Location: start_flask_bridge() function
  • Interval: 15 seconds
  • Tag: "flask_bridge"
  • Purpose: Monitors C#/Python communication bridge (port 5001)
  • Status: ✅ Fully Implemented
# Initialize heartbeat monitoring for Flask bridge health tracking
heartbeat = python_core.Heartbeat(interval=15, tag="flask_bridge")
heartbeat.start()

# Flask application with health monitoring
@app.before_request
def before_request():
    heartbeat.pulse()  # Signal request processing

3. Web Listener - __main__.py

  • Location: start_web_listener() function
  • Interval: 20 seconds
  • Tag: "web_listener"
  • Purpose: Monitors browser extension communication (port 5000)
  • Status: ✅ Fully Implemented
# Initialize heartbeat monitoring for web listener health tracking
heartbeat = python_core.Heartbeat(interval=20, tag="web_listener")
heartbeat.start()

✅ Secondary Services (Completed)

4. File Indexer - smrtdetect/file_indexer.py

  • Location: start_background_indexing() function
  • Interval: 60 seconds
  • Tag: "file_indexer"
  • Purpose: Monitors file system indexing operations
  • Status: ✅ Implemented

5. Router Background Tasks - runtime/router.py

  • Location: start_background_tasks() function
  • Interval: 45 seconds
  • Tag: "router_tasks"
  • Purpose: Monitors routing and organization tasks
  • Status: ✅ Implemented

API Integration

Health Check Endpoints

The heartbeat system exposes REST API endpoints for external monitoring:

# Health status endpoint
GET /api/health/status
{
    "services": {
        "save_worker": {
            "status": "healthy",
            "last_heartbeat": "2025-07-14T10:30:45Z",
            "interval": 30,
            "consecutive_failures": 0
        },
        "flask_bridge": {
            "status": "healthy", 
            "last_heartbeat": "2025-07-14T10:30:50Z",
            "interval": 15,
            "consecutive_failures": 0
        }
    },
    "overall_status": "healthy"
}

# Individual service status
GET /api/health/service/{service_name}
{
    "service": "save_worker",
    "status": "healthy",
    "last_heartbeat": "2025-07-14T10:30:45Z",
    "uptime": "2d 14h 23m",
    "total_pulses": 4320,
    "error_count": 0
}

C# Integration

C# components can monitor Python service health:

// Health monitoring client in C# components
public class HealthMonitor
{
    private readonly HttpClient _client;

    public async Task<HealthStatus> CheckPythonCoreHealth()
    {
        var response = await _client.GetAsync("http://localhost:5001/api/health/status");
        return await response.Content.ReadFromJsonAsync<HealthStatus>();
    }

    public async Task<bool> IsServiceHealthy(string serviceName)
    {
        var status = await CheckPythonCoreHealth();
        return status.Services[serviceName].Status == "healthy";
    }
}

Monitoring and Alerting

Log Integration

Heartbeat events are integrated with the SmrtHub logging system:

# Logging configuration
import logging
logger = logging.getLogger('smrthub.heartbeat')

# Health events are logged with context
logger.info(f"Service {service_name} heartbeat: {status}")
logger.warning(f"Service {service_name} missed heartbeat (attempt {attempt}/3)")
logger.error(f"Service {service_name} is unresponsive - marking as critical")

Alert Thresholds

Condition Threshold Alert Level
Missed heartbeat 1 interval Warning
Consecutive misses 3 intervals Degraded
Service timeout 5 intervals Critical
Error reported Any exception Warning
Critical error System exception Critical

Visual Indicators

The SmrtHub component displays real-time health status:

  • 🟢 Green Pulse - All services healthy
  • 🟡 Yellow Flash - Service warnings detected
  • 🔴 Red Alert - Critical service issues
  • No Signal - Heartbeat system offline

Configuration

Heartbeat Settings

# Example configuration shape
HEARTBEAT_CONFIG = {
    "enabled": True,
    "default_interval": 30,
    "timeout_multiplier": 3,
    "alert_on_miss": True,
    "log_level": "INFO",
    "services": {
        "save_worker": {"interval": 30, "critical": True},
        "flask_bridge": {"interval": 15, "critical": True}, 
        "web_listener": {"interval": 20, "critical": False},
        "file_indexer": {"interval": 60, "critical": False},
        "router_tasks": {"interval": 45, "critical": False}
    }
}

Custom Service Registration

Add heartbeat monitoring to new services:

import python_core

def my_background_service():
    """Custom service with heartbeat monitoring."""
    # Register heartbeat with custom interval
    heartbeat = python_core.Heartbeat(
        interval=25, 
        tag="my_service",
        critical=True
    )
    heartbeat.start()

    while True:
        try:
            # Perform service work
            do_service_work()
            heartbeat.pulse()  # Signal successful operation

        except Exception as e:
            heartbeat.error(str(e))  # Report error condition
            time.sleep(heartbeat.interval)

Troubleshooting

Common Issues

Heartbeat Not Starting

# Check if heartbeat is enabled in configuration
import python_core
config = python_core.get_config()
print(f"Heartbeat enabled: {config.heartbeat.enabled}")

# Verify service registration
heartbeat = python_core.Heartbeat(interval=30, tag="test")
print(f"Heartbeat created: {heartbeat.is_active}")

Service Marked as Unhealthy

# Check service logs for errors
import logging
logging.getLogger('smrthub.heartbeat').setLevel(logging.DEBUG)

# Verify service is calling pulse() regularly
heartbeat.pulse()  # Should be called within service loop

# Check for exceptions in service code
try:
    service_operation()
    heartbeat.pulse()
except Exception as e:
    heartbeat.error(str(e))  # Properly report errors

High Memory Usage

# Monitor heartbeat resource usage
import psutil
process = psutil.Process()
print(f"Memory usage: {process.memory_info().rss / 1024 / 1024:.2f} MB")

# Cleanup old heartbeat data
heartbeat.cleanup_old_data(days=7)

Performance Impact

The heartbeat system is designed for minimal performance impact:

  • CPU Usage: < 0.1% on average
  • Memory Usage: ~2MB for all services
  • Network Traffic: ~100 bytes per heartbeat pulse
  • Disk I/O: Minimal logging only on status changes

Future Enhancements

Planned Features

  • Predictive Health Analysis - ML-based health trend prediction
  • Distributed Monitoring - Multi-machine heartbeat coordination
  • Custom Metrics - Service-specific performance metrics
  • Integration Monitoring - Health checks for external dependencies
  • Mobile Alerts - Push notifications for critical issues

API Improvements

  • WebSocket Health Stream - Real-time health event streaming
  • Historical Data - Long-term health trend analysis
  • Custom Dashboards - Configurable health monitoring interfaces
  • Export Capabilities - Health data export for external tools

The heartbeat monitoring system ensures SmrtHub operates reliably and provides early warning of potential issues. For detailed API documentation, see the Python API Reference.