ros2_medkit_fault_manager

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

No version for distro kilted showing jazzy. Known supported distros are highlighted in the buttons above.

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

No version for distro rolling showing jazzy. Known supported distros are highlighted in the buttons above.

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

No version for distro ardent showing jazzy. Known supported distros are highlighted in the buttons above.

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

No version for distro bouncy showing jazzy. Known supported distros are highlighted in the buttons above.

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

No version for distro crystal showing jazzy. Known supported distros are highlighted in the buttons above.

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

No version for distro eloquent showing jazzy. Known supported distros are highlighted in the buttons above.

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

No version for distro dashing showing jazzy. Known supported distros are highlighted in the buttons above.

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

No version for distro galactic showing jazzy. Known supported distros are highlighted in the buttons above.

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

No version for distro foxy showing jazzy. Known supported distros are highlighted in the buttons above.

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

No version for distro iron showing jazzy. Known supported distros are highlighted in the buttons above.

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

No version for distro lunar showing jazzy. Known supported distros are highlighted in the buttons above.

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

No version for distro jade showing jazzy. Known supported distros are highlighted in the buttons above.

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

No version for distro indigo showing jazzy. Known supported distros are highlighted in the buttons above.

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

No version for distro hydro showing jazzy. Known supported distros are highlighted in the buttons above.

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

No version for distro kinetic showing jazzy. Known supported distros are highlighted in the buttons above.

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

No version for distro melodic showing jazzy. Known supported distros are highlighted in the buttons above.

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange

No version for distro noetic showing jazzy. Known supported distros are highlighted in the buttons above.

ros2_medkit_fault_manager package from ros2_medkit repo

ros2_medkit_cmake ros2_medkit_diagnostic_bridge ros2_medkit_beacon_common ros2_medkit_linux_introspection ros2_medkit_param_beacon ros2_medkit_topic_beacon ros2_medkit_fault_manager ros2_medkit_fault_reporter ros2_medkit_gateway ros2_medkit_integration_tests ros2_medkit_msgs ros2_medkit_graph_provider ros2_medkit_serialization

ROS Distro
jazzy

Package Summary

Version	0.4.0
License	Apache-2.0
Build type	AMENT_CMAKE
Use	RECOMMENDED

Repository Summary

Checkout URI	https://github.com/selfpatch/ros2_medkit.git
VCS Type	git
VCS Version	main
Last Updated	2026-03-22
Dev Status	DEVELOPED
Released	RELEASED
Contributing	Help Wanted (-) Good First Issues (-) Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service	Type	Description
`~/report_fault`	`ros2_medkit_msgs/srv/ReportFault`	Report a fault occurrence
`~/list_faults`	`ros2_medkit_msgs/srv/ListFaults`	Query faults with filtering
`~/clear_fault`	`ros2_medkit_msgs/srv/ClearFault`	Clear/acknowledge a fault
`~/get_snapshots`	`ros2_medkit_msgs/srv/GetSnapshots`	Get topic snapshots for a fault

Features

Multi-source aggregation: Same fault_code from different sources creates a single fault
Occurrence tracking: Counts total reports and tracks all reporting sources
Severity escalation: Fault severity is updated if a higher severity is reported
Persistent storage: SQLite backend ensures faults survive node restarts
Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation with per-entity threshold overrides
Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter	Type	Default	Description
`storage_type`	string	`"sqlite"`	Storage backend: `"sqlite"` or `"memory"`
`database_path`	string	`"/var/lib/ros2_medkit/faults.db"`	Path to SQLite database file
`confirmation_threshold`	int	`-1`	Counter value at which faults are confirmed
`healing_enabled`	bool	`false`	Enable automatic healing via PASSED events
`healing_threshold`	int	`3`	Counter value at which faults are healed
`auto_confirm_after_sec`	double	`0.0`	Auto-confirm PREFAILED faults after timeout (0 = disabled)
`entity_thresholds.config_file`	string	`""`	Path to YAML file with per-entity debounce threshold overrides

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter	Type	Default	Description
`snapshots.enabled`	bool	`true`	Enable/disable snapshot capture
`snapshots.background_capture`	bool	`false`	Use background subscriptions (caches latest message) vs on-demand capture
`snapshots.timeout_sec`	double	`1.0`	Timeout waiting for topic message (on-demand mode)
`snapshots.max_message_size`	int	`65536`	Maximum message size in bytes (larger messages skipped)
`snapshots.default_topics`	string[]	`[]`	Topics to capture for all faults
`snapshots.config_file`	string	`""`	Path to YAML config for `fault_specific` and `patterns`

Topic Resolution Priority:

fault_specific - Exact match for fault code (configured via YAML config file)
patterns - Regex pattern match (configured via YAML config file)
default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.4.0 (2026-03-20)

Per-entity confirmation and healing thresholds via manifest configuration (#269)
Default rosbag storage format changed from sqlite3 to mcap
Support for namespaced fault manager nodes - gateway resolves service/topic names when the fault manager runs in a custom namespace
Build: use shared cmake modules from ros2_medkit_cmake package
Build: centralized clang-tidy configuration
Contributors: \@bburda

0.3.0 (2026-02-27)

Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
Clean up pending_clusters_ when fault cleared before min_count (#211)
Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

Initial rosdistro release
Central fault management node with ROS 2 services:
- ReportFault - report FAILED/PASSED events with debounce filtering
- GetFaults - query faults with filtering by severity, status, correlation
- ClearFault - clear/acknowledge faults
Debounce filtering with configurable thresholds:
- FAILED events decrement counter, PASSED events increment
- Configurable confirmation_threshold (default: -1, immediate)
- Optional healing support (healing_enabled, healing_threshold)
- Time-based auto-confirmation (auto_confirm_after_sec)
- CRITICAL severity bypasses debounce
Dual storage backends:
- SQLite persistent storage with WAL mode (default)
- In-memory storage for testing/lightweight deployments
Snapshot capture on fault confirmation:
- Topic data captured as JSON with configurable topic resolution
- Priority: fault_specific > patterns > default_topics
- Stored in SQLite with indexed fault_code lookup
- Auto-cleanup on fault clear
Rosbag capture with ring buffer:
- Configurable duration, post-fault recording, topic selection
- Lazy start mode (start on PREFAILED) or immediate
- Auto-cleanup of bag files, storage limits (max_bag_size_mb)
- GetRosbag service for bag file metadata
Fault correlation engine:
- Hierarchical mode: root cause to symptom relationships
- Auto-cluster mode: group similar faults within time window
- YAML-based configuration with pattern wildcards
- Muted faults tracking, auto-clear on root cause resolution
FaultEvent publishing on ~/events topic for SSE streaming
Wall clock timestamps (compatible with use_sim_time)
Contributors: Bartosz Burda, Michal Faferek

Package Dependencies

Deps	Name
	ament_cmake
	ros2_medkit_cmake
	ament_lint_auto
	ament_lint_common
	ament_cmake_clang_format
	ament_cmake_clang_tidy
	ament_cmake_gtest
	launch_testing_ament_cmake
	launch_testing_ros
	sensor_msgs
	std_msgs
	rclcpp
	ros2_medkit_msgs
	ros2_medkit_serialization
	rosbag2_cpp
	rosbag2_storage
	rosbag2_storage_mcap

System Dependencies

Name
libsqlite3-dev
nlohmann-json-dev

Dependant Packages

Name	Deps
ros2_medkit_diagnostic_bridge
ros2_medkit_fault_reporter
ros2_medkit_integration_tests

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged `ros2_medkit_fault_manager` at Robotics Stack Exchange