No version for distro humble showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro kilted showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro rolling showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro ardent showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro bouncy showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro crystal showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro eloquent showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro dashing showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro galactic showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro foxy showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro iron showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro lunar showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro jade showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro indigo showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro hydro showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro kinetic showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro melodic showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange

No version for distro noetic showing jazzy. Known supported distros are highlighted in the buttons above.

Package Summary

Version 0.3.0
License Apache-2.0
Build type AMENT_CMAKE
Use RECOMMENDED

Repository Summary

Checkout URI https://github.com/selfpatch/ros2_medkit.git
VCS Type git
VCS Version main
Last Updated 2026-03-05
Dev Status DEVELOPED
Released UNRELEASED
Contributing Help Wanted (-)
Good First Issues (-)
Pull Requests to Review (-)

Package Description

Central fault manager node for ros2_medkit fault management system

Maintainers

  • bburda

Authors

No additional authors.

ros2_medkit_fault_manager

Central fault manager node for the ros2_medkit fault management system.

Overview

The FaultManager node provides a central point for fault aggregation and lifecycle management. It receives fault reports from multiple sources, aggregates them by fault_code, and provides query and clearing interfaces.

Quick Start

By default, faults are confirmed immediately when reported - no additional configuration needed.

# Start the fault manager
ros2 launch ros2_medkit_fault_manager fault_manager.launch.py

# Report a fault - it's immediately CONFIRMED
ros2 service call /fault_manager/report_fault ros2_medkit_msgs/srv/ReportFault \
  "{fault_code: 'MOTOR_OVERHEAT', event_type: 0, severity: 2, description: 'Motor temp exceeded', source_id: '/motor_node'}"

# Query faults
ros2 service call /fault_manager/list_faults ros2_medkit_msgs/srv/ListFaults \
  "{statuses: ['CONFIRMED']}"

# Clear a fault
ros2 service call /fault_manager/clear_fault ros2_medkit_msgs/srv/ClearFault \
  "{fault_code: 'MOTOR_OVERHEAT'}"

Services

Service Type Description
~/report_fault ros2_medkit_msgs/srv/ReportFault Report a fault occurrence
~/list_faults ros2_medkit_msgs/srv/ListFaults Query faults with filtering
~/clear_fault ros2_medkit_msgs/srv/ClearFault Clear/acknowledge a fault
~/get_snapshots ros2_medkit_msgs/srv/GetSnapshots Get topic snapshots for a fault

Features

  • Multi-source aggregation: Same fault_code from different sources creates a single fault
  • Occurrence tracking: Counts total reports and tracks all reporting sources
  • Severity escalation: Fault severity is updated if a higher severity is reported
  • Persistent storage: SQLite backend ensures faults survive node restarts
  • Debounce filtering (optional): AUTOSAR DEM-style counter-based fault confirmation
  • Snapshot capture: Captures topic data when faults are confirmed for debugging (snapshots are deleted when fault is cleared)
  • Fault correlation (optional): Root cause analysis with symptom muting and auto-clear

Parameters

Parameter Type Default Description
storage_type string "sqlite" Storage backend: "sqlite" or "memory"
database_path string "/var/lib/ros2_medkit/faults.db" Path to SQLite database file
confirmation_threshold int -1 Counter value at which faults are confirmed
healing_enabled bool false Enable automatic healing via PASSED events
healing_threshold int 3 Counter value at which faults are healed
auto_confirm_after_sec double 0.0 Auto-confirm PREFAILED faults after timeout (0 = disabled)

Snapshot Parameters

Snapshots capture topic data when faults are confirmed for post-mortem debugging.

Parameter Type Default Description
snapshots.enabled bool true Enable/disable snapshot capture
snapshots.background_capture bool false Use background subscriptions (caches latest message) vs on-demand capture
snapshots.timeout_sec double 1.0 Timeout waiting for topic message (on-demand mode)
snapshots.max_message_size int 65536 Maximum message size in bytes (larger messages skipped)
snapshots.default_topics string[] [] Topics to capture for all faults
snapshots.config_file string "" Path to YAML config for fault_specific and patterns

Topic Resolution Priority:

  1. fault_specific - Exact match for fault code (configured via YAML config file)
  2. patterns - Regex pattern match (configured via YAML config file)
  3. default_topics - Fallback for all faults

Example YAML config file (snapshots.yaml):

fault_specific:
  MOTOR_OVERHEAT:
    - /joint_states
    - /motor/temperature
patterns:
  "MOTOR_.*":
    - /joint_states
    - /cmd_vel

Storage Backends

SQLite (default): Faults are persisted to disk and survive node restarts. Uses WAL mode for optimal performance.

Memory: Faults are stored in memory only. Useful for testing or when persistence is not required.

Usage

Launch

File truncated at 100 lines see the full file

CHANGELOG

Changelog for package ros2_medkit_fault_manager

0.3.0 (2026-02-27)

  • Accurate HIGHEST_SEVERITY reassignment and stale fault_to_cluster_ cleanup (#221)
  • Clean up pending_clusters_ when fault cleared before min_count (#211)
  • Multi-distro CI support for ROS 2 Humble, Jazzy, and Rolling (#219, #242)
  • Contributors: \@bburda, \@eclipse0922

0.2.0 (2026-02-07)

  • Initial rosdistro release
  • Central fault management node with ROS 2 services:
    • ReportFault - report FAILED/PASSED events with debounce filtering
    • GetFaults - query faults with filtering by severity, status, correlation
    • ClearFault - clear/acknowledge faults
  • Debounce filtering with configurable thresholds:
    • FAILED events decrement counter, PASSED events increment
    • Configurable confirmation_threshold (default: -1, immediate)
    • Optional healing support (healing_enabled, healing_threshold)
    • Time-based auto-confirmation (auto_confirm_after_sec)
    • CRITICAL severity bypasses debounce
  • Dual storage backends:
    • SQLite persistent storage with WAL mode (default)
    • In-memory storage for testing/lightweight deployments
  • Snapshot capture on fault confirmation:
    • Topic data captured as JSON with configurable topic resolution
    • Priority: fault_specific > patterns > default_topics
    • Stored in SQLite with indexed fault_code lookup
    • Auto-cleanup on fault clear
  • Rosbag capture with ring buffer:
    • Configurable duration, post-fault recording, topic selection
    • Lazy start mode (start on PREFAILED) or immediate
    • Auto-cleanup of bag files, storage limits (max_bag_size_mb)
    • GetRosbag service for bag file metadata
  • Fault correlation engine:
    • Hierarchical mode: root cause to symptom relationships
    • Auto-cluster mode: group similar faults within time window
    • YAML-based configuration with pattern wildcards
    • Muted faults tracking, auto-clear on root cause resolution
  • FaultEvent publishing on ~/events topic for SSE streaming
  • Wall clock timestamps (compatible with use_sim_time)
  • Contributors: Bartosz Burda, Michal Faferek

Launch files

No launch files found

Messages

No message files found.

Services

No service files found

Plugins

No plugins found.

Recent questions tagged ros2_medkit_fault_manager at Robotics Stack Exchange