Heartbeat Monitoring System¶
The SmrtHub heartbeat monitoring system provides comprehensive real-time health tracking for all critical services and background processes across the Python core engine.
Overview¶
The heartbeat system ensures reliable operation by continuously monitoring the health of key components and providing early warning of potential issues. Each critical service reports its status at regular intervals, enabling proactive maintenance and troubleshooting.
Architecture¶
Heartbeat Components¶
graph TB
subgraph "Python Core Services"
SW[Save Worker<br/>30s interval]
FB[Flask Bridge<br/>15s interval]
WL[Web Listener<br/>20s interval]
FI[File Indexer<br/>60s interval]
RT[Router Tasks<br/>45s interval]
end
subgraph "Monitoring Infrastructure"
HB[Heartbeat Manager]
LOG[Health Logger]
ALT[Alert System]
end
subgraph "External Monitors"
CS[C# Components]
UI[Status UI]
API[Health API]
end
SW --> HB
FB --> HB
WL --> HB
FI --> HB
RT --> HB
HB --> LOG
HB --> ALT
HB --> CS
HB --> UI
HB --> API
Health Status Levels¶
| Level | Description | Action Required |
|---|---|---|
| 🟢 Healthy | Service responding normally | None |
| 🟡 Warning | Service delayed but functional | Monitor closely |
| 🟠 Degraded | Service experiencing issues | Investigation recommended |
| 🔴 Critical | Service unresponsive | Immediate attention required |
Implementation Status¶
✅ Critical Services (Completed)¶
1. Save Worker - runtime/save_worker.py¶
- Location:
save_and_route_worker()function - Interval: 30 seconds
- Tag:
"save_worker" - Purpose: Monitors the background save queue processing
- Status: ✅ Fully Implemented
# Initialize heartbeat monitoring for save worker health tracking
heartbeat = python_core.Heartbeat(interval=30, tag="save_worker")
heartbeat.start()
# Worker loop with heartbeat updates
while True:
try:
# Process save queue
process_save_queue()
heartbeat.pulse() # Signal healthy operation
except Exception as e:
heartbeat.error(str(e)) # Report error condition
2. Flask Bridge Server - __main__.py¶
- Location:
start_flask_bridge()function - Interval: 15 seconds
- Tag:
"flask_bridge" - Purpose: Monitors C#/Python communication bridge (port 5001)
- Status: ✅ Fully Implemented
# Initialize heartbeat monitoring for Flask bridge health tracking
heartbeat = python_core.Heartbeat(interval=15, tag="flask_bridge")
heartbeat.start()
# Flask application with health monitoring
@app.before_request
def before_request():
heartbeat.pulse() # Signal request processing
3. Web Listener - __main__.py¶
- Location:
start_web_listener()function - Interval: 20 seconds
- Tag:
"web_listener" - Purpose: Monitors browser extension communication (port 5000)
- Status: ✅ Fully Implemented
# Initialize heartbeat monitoring for web listener health tracking
heartbeat = python_core.Heartbeat(interval=20, tag="web_listener")
heartbeat.start()
✅ Secondary Services (Completed)¶
4. File Indexer - smrtdetect/file_indexer.py¶
- Location:
start_background_indexing()function - Interval: 60 seconds
- Tag:
"file_indexer" - Purpose: Monitors file system indexing operations
- Status: ✅ Implemented
5. Router Background Tasks - runtime/router.py¶
- Location:
start_background_tasks()function - Interval: 45 seconds
- Tag:
"router_tasks" - Purpose: Monitors routing and organization tasks
- Status: ✅ Implemented
API Integration¶
Health Check Endpoints¶
The heartbeat system exposes REST API endpoints for external monitoring:
# Health status endpoint
GET /api/health/status
{
"services": {
"save_worker": {
"status": "healthy",
"last_heartbeat": "2025-07-14T10:30:45Z",
"interval": 30,
"consecutive_failures": 0
},
"flask_bridge": {
"status": "healthy",
"last_heartbeat": "2025-07-14T10:30:50Z",
"interval": 15,
"consecutive_failures": 0
}
},
"overall_status": "healthy"
}
# Individual service status
GET /api/health/service/{service_name}
{
"service": "save_worker",
"status": "healthy",
"last_heartbeat": "2025-07-14T10:30:45Z",
"uptime": "2d 14h 23m",
"total_pulses": 4320,
"error_count": 0
}
C# Integration¶
C# components can monitor Python service health:
// Health monitoring client in C# components
public class HealthMonitor
{
private readonly HttpClient _client;
public async Task<HealthStatus> CheckPythonCoreHealth()
{
var response = await _client.GetAsync("http://localhost:5001/api/health/status");
return await response.Content.ReadFromJsonAsync<HealthStatus>();
}
public async Task<bool> IsServiceHealthy(string serviceName)
{
var status = await CheckPythonCoreHealth();
return status.Services[serviceName].Status == "healthy";
}
}
Monitoring and Alerting¶
Log Integration¶
Heartbeat events are integrated with the SmrtHub logging system:
# Logging configuration
import logging
logger = logging.getLogger('smrthub.heartbeat')
# Health events are logged with context
logger.info(f"Service {service_name} heartbeat: {status}")
logger.warning(f"Service {service_name} missed heartbeat (attempt {attempt}/3)")
logger.error(f"Service {service_name} is unresponsive - marking as critical")
Alert Thresholds¶
| Condition | Threshold | Alert Level |
|---|---|---|
| Missed heartbeat | 1 interval | Warning |
| Consecutive misses | 3 intervals | Degraded |
| Service timeout | 5 intervals | Critical |
| Error reported | Any exception | Warning |
| Critical error | System exception | Critical |
Visual Indicators¶
The SmrtHub component displays real-time health status:
- 🟢 Green Pulse - All services healthy
- 🟡 Yellow Flash - Service warnings detected
- 🔴 Red Alert - Critical service issues
- ⚫ No Signal - Heartbeat system offline
Configuration¶
Heartbeat Settings¶
# Example configuration shape
HEARTBEAT_CONFIG = {
"enabled": True,
"default_interval": 30,
"timeout_multiplier": 3,
"alert_on_miss": True,
"log_level": "INFO",
"services": {
"save_worker": {"interval": 30, "critical": True},
"flask_bridge": {"interval": 15, "critical": True},
"web_listener": {"interval": 20, "critical": False},
"file_indexer": {"interval": 60, "critical": False},
"router_tasks": {"interval": 45, "critical": False}
}
}
Custom Service Registration¶
Add heartbeat monitoring to new services:
import python_core
def my_background_service():
"""Custom service with heartbeat monitoring."""
# Register heartbeat with custom interval
heartbeat = python_core.Heartbeat(
interval=25,
tag="my_service",
critical=True
)
heartbeat.start()
while True:
try:
# Perform service work
do_service_work()
heartbeat.pulse() # Signal successful operation
except Exception as e:
heartbeat.error(str(e)) # Report error condition
time.sleep(heartbeat.interval)
Troubleshooting¶
Common Issues¶
Heartbeat Not Starting
# Check if heartbeat is enabled in configuration
import python_core
config = python_core.get_config()
print(f"Heartbeat enabled: {config.heartbeat.enabled}")
# Verify service registration
heartbeat = python_core.Heartbeat(interval=30, tag="test")
print(f"Heartbeat created: {heartbeat.is_active}")
Service Marked as Unhealthy
# Check service logs for errors
import logging
logging.getLogger('smrthub.heartbeat').setLevel(logging.DEBUG)
# Verify service is calling pulse() regularly
heartbeat.pulse() # Should be called within service loop
# Check for exceptions in service code
try:
service_operation()
heartbeat.pulse()
except Exception as e:
heartbeat.error(str(e)) # Properly report errors
High Memory Usage
# Monitor heartbeat resource usage
import psutil
process = psutil.Process()
print(f"Memory usage: {process.memory_info().rss / 1024 / 1024:.2f} MB")
# Cleanup old heartbeat data
heartbeat.cleanup_old_data(days=7)
Performance Impact¶
The heartbeat system is designed for minimal performance impact:
- CPU Usage: < 0.1% on average
- Memory Usage: ~2MB for all services
- Network Traffic: ~100 bytes per heartbeat pulse
- Disk I/O: Minimal logging only on status changes
Future Enhancements¶
Planned Features¶
- Predictive Health Analysis - ML-based health trend prediction
- Distributed Monitoring - Multi-machine heartbeat coordination
- Custom Metrics - Service-specific performance metrics
- Integration Monitoring - Health checks for external dependencies
- Mobile Alerts - Push notifications for critical issues
API Improvements¶
- WebSocket Health Stream - Real-time health event streaming
- Historical Data - Long-term health trend analysis
- Custom Dashboards - Configurable health monitoring interfaces
- Export Capabilities - Health data export for external tools
The heartbeat monitoring system ensures SmrtHub operates reliably and provides early warning of potential issues. For detailed API documentation, see the Python API Reference.