|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek
Package Dependencies
System Dependencies
Dependant Packages
Launch files
Messages
Services
Plugins
Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange
|
ros2_medkit_fault_manager package from ros2_medkit reporos2_medkit_diagnostic_bridge ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_serialization |
ROS Distro
|
Package Summary
| Version | 0.3.0 |
| License | Apache-2.0 |
| Build type | AMENT_CMAKE |
| Use | RECOMMENDED |
Repository Summary
| Checkout URI | https://github.com/selfpatch/ros2_medkit.git |
| VCS Type | git |
| VCS Version | main |
| Last Updated | 2026-03-05 |
| Dev Status | DEVELOPED |
| Released | UNRELEASED |
| Contributing |
Help Wanted (-)
Good First Issues (-) Pull Requests to Review (-) |
Package Description
Maintainers
- bburda
Authors
ros2_medkit_fault_manager
Central fault manager node for the ros2_medkit fault management system.
Overview
The FaultManager node provides a central point for fault aggregation and lifecycle management.
It receives fault reports from multiple sources, aggregates them by fault_code, and provides
query and clearing interfaces.
Quick Start
By default, faults are confirmed immediately when reported - no additional configuration needed.
# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py
# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
"{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"
# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
"{statuses: ['CONFIRMED']}"
# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
"{fault_code: 'MOTOR_OVERHEAT'}"
Services
| Service | Type | Description |
|---|---|---|
~/report_fault |
ros2_medkit_msgs/srv/ReportFault |
Report a fault occurrence |
~/list_faults |
ros2_medkit_msgs/srv/ListFaults |
Query faults with filtering |
~/clear_fault |
ros2_medkit_msgs/srv/ClearFault |
Clear/acknowledge a fault |
~/get_snapshots |
ros2_medkit_msgs/srv/GetSnapshots |
Get topic snapshots for a fault |
Features
-
Multi-source aggregation: Same
fault_codefrom different sources creates a single fault - Occurrence tracking: Counts total reports and tracks all reporting sources
- Severity escalation: Fault severity is updated if a higher severity is reported
- Persistent storage: SQLite backend ensures faults survive node restarts
- Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
- Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
- Fault correlation (optional): Root cause analysis with symptom muting and auto-clear
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
storage_type |
string | "sqlite" |
Storage backend: "sqlite" or "memory"
|
database_path |
string | "/var/lib/ros2_medkit/faults.db" |
Path to SQLite database file |
confirmation_threshold |
int | -1 |
Counter value at which faults are confirmed |
healing_enabled |
bool | false |
Enable automatic healing via PASSED events |
healing_threshold |
int | 3 |
Counter value at which faults are healed |
auto_confirm_after_sec |
double | 0.0 |
Auto-confirm PREFAILED faults after timeout (0 = disabled) |
Snapshot Parameters
Snapshots capture topic data when faults are confirmed for post-mortem debugging.
| Parameter | Type | Default | Description |
|---|---|---|---|
snapshots.enabled |
bool | true |
Enable/disable snapshot capture |
snapshots.background_capture |
bool | false |
Use background subscriptions (caches latest message) vs on-demand capture |
snapshots.timeout_sec |
double | 1.0 |
Timeout waiting for topic message (on-demand mode) |
snapshots.max_message_size |
int | 65536 |
Maximum message size in bytes (larger messages skipped) |
snapshots.default_topics |
string[] | [] |
Topics to capture for all faults |
snapshots.config_file |
string | "" |
Path to YAML config for fault_specific and patterns
|
Topic Resolution Priority:
-
fault_specific- Exact match for fault code (configured via YAML config file) -
patterns- Regex pattern match (configured via YAML config file) -
default_topics- Fallback for all faults
Example YAML config file (snapshots.yaml):
fault_specific:
MOTOR_OVERHEAT:
- /joint_states
- /motor/temperature
patterns:
"MOTOR_.*":
- /joint_states
- /cmd_vel
Storage Backends
SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.
Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.
Usage
Launch
File truncated at 100 lines see the full file
Changelog for package ros2_medkit_fault_manager
0.3.0 (2026-02-27)
- Accurate HIGHEST_SEVERITY reassignment and stale
fault_to_cluster_cleanup (#221) - Clean up
pending_clusters_when fault cleared beforemin_count(#211) - Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
- Contributors: \@bburda, \@eclipse0922
0.2.0 (2026-02-07)
- Initial rosdistro release
- Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
- Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
- Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
- Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
- Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
- Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
- FaultEvent publishing on ~/events topic for SSE streaming
- Wall clock timestamps (compatible with use_sim_time)
- Contributors: Bartosz Burda, Michal Faferek