|
Package Summary
Tags | No category tags. |
Version | 2.1.28 |
License | BSD |
Build type | CATKIN |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://github.com/jsk-ros-pkg/jsk_3rdparty.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2024-07-10 |
Dev Status | DEVELOPED |
CI status | No Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Yuki Furuta
Authors
- Yuki Furuta
ros_speech_recognition
A ROS package for speech-to-text services.
This package uses Python package SpeechRecognition as a backend.
Tutorials
Normal tutorial
- Install this package and SpeechReconition
sudo apt install ros-${ROS_DISTRO}-ros-speech-recognition
- Launch speech recognition node
roslaunch ros_speech_recognition speech_recognition.launch
- Echo
/speech_to_text
rostopic echo /speech_to_text
# you can get the recognition result
Parrotry tutorial
Parrotry mean オウム返し in Japanese
# english
roslaunch ros_speech_recognition parrotry.launch
# japanese
roslaunch ros_speech_recognition parrotry.launch language:=ja-JP
speech_recognition_node.py
Interface
Publishing Topics
-
~voice_topic
(speech_recognition_msgs/SpeechRecognitionCandidates
)Speech recognition candidates topic name.
Topic name is set by parameter
~voice_topic
, and default value isspeech_to_text
. -
sound_play
(sound_play/SoundRequestAction
)Action client to play sound on events. If the action server is not available or
~enable_sound_effect
isFalse
, no sound is played.
Subscribing Topics
-
~audio_topic
(audio_common_msgs/AudioData
)Audio stream data to be recognized.
Topis name is set by parameter
~audio_topic
and default value isaudio
.
Advertising Services
-
speech_recognition
(speech_recognition_msgs/SpeechRecognition
)Service for speech recognition
-
speech_recognition/start
(std_srvs/Empty
)Start service for speech recognition
This service is available when parameter
~contiunous
isTrue
. -
speech_recognition/start
(std_srvs/Empty
)Stop service for speech recognition
This service is available when parameter
~contiunous
isTrue
.
Parameters
-
~voice_topic
(String
, default:speech_to_text
)Publishing voice topic name
-
~audio_topic
(String
, default:audio
)Subscribing audio topic name
-
~enable_sound_effect
(Bool
, default:True
)Flag to enable or disable sound to play sound on recognition.
-
~language
(String
, default:en-US
)Language to be recognized
-
~engine
(Enum[String]
, default:Google
)Speech-to-text engine (To see full options use
dynamic_reconfigure
) -
~energy_threshold
(Double
, default:300
)Threshold for Voice activity detection
-
~dynamic_energy_threshold
(Bool
, default:True
)Adaptive estimation for
energy_threshold
-
~dynamic_energy_adjustment_damping
(Double
, default:0.15
)Damping threshold for dynamic VAD
-
~dynamic_energy_ratio
(Double
, default:1.5
)Energy ratio for dynamic VAD
-
~pause_threshold
(Double
, default:0.8
)Seconds of non-speaking audio before a phrase is considered complete
-
~operation_timeout
(Double
, default:0.0
)Seconds after an internal operation (e.g., an API request) starts before it times out
-
~listen_timeout
(Double
, default:0.0
)The maximum number of seconds that this will wait for a phrase to start before giving up
-
~phrase_time_limit
(Double
, default:10.0
)The maximum number of seconds that this will allow a phrase to continue before stopping and returning the part of the phrase processed before the time limit was reached
-
~phrase_threshold
(Double
, default:0.3
)Minimum seconds of speaking audio before we consider the speaking audio a phrase
-
~non_speaking_duration
(Double
, default:0.5
)Seconds of non-speaking audio to keep on both sides of the recording
-
~duration
(Double
, default:10.0
)Seconds of waiting for speech
-
~depth
(Int
, default:16
)Depth of audio signal
-
~n_channel
(Int
, default:1
)Total number of channels in audio data (e.g. 1: mono, 2: stereo)
-
~sample_rate
(Int
, default:16000
)Sample rate of audio signal
-
~buffer_size
(Int
, default:10240
)Maximum buffer size to store audio data for speech recognition
-
~start_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/bell.ogg
)Path to sound file for bell on the start of audio caption
-
~recognized_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/message.ogg
)Path to sound file for bell on the end of audio caption
-
~success_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/message-new-instant.ogg
)Path to sound file for bell on getting successful recognition result
-
~timeout_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/network-connectivity-lost.ogg
)Path to sound file for bell on timeout for recognition
-
~continuous
(Bool
, default: False)Selecting to use topic or service. By default, service is used.
-
~auto_start
(Bool
, default: True)Starting the speech recognition when launching.
-
~self_cancellation
(Bool
, default:True
)Whether the node recognize the sound heard when
~tts_action_names
is running or not.This options is for ignoring self voice sounds from recognition.
-
~tts_action_names
(List[String]
, default:['sound_play']
)Text-to-speech action name for self cancellation.
The node ignores the voice heard when these Text-to-speech action is running.
-
~tts_tolerance
(Float
, default:1.0
)Tolerance seconds for self cancellation.
The node ignores the voice with this tolerance seconds after
~tts_action_names
finish running. -
~google_key
(String
, default:None
)Auth Key for Google API. If
None
, use public key. (No guarantee to be blocked.)
This is valid only if~engine
isGoogle
. -
~google_cloud_credentials_json
(String
, default:None
)Path to credential json file. For JSK users, you can download from Google Drive link. This is valid only if
~engine
isGoogleCloud
. -
~google_cloud_preferred_phrases
([String]
, default:None
)Preferred phrases parameters. This is valid only if
~engine
isGoogleCloud
. -
~bing_key
(String
, default:None
)Auth key for Bing API.
This is valid only if~engine
isbing
. -
~vosk_model_path
(String
, default:None
)Path to trainded model for Vosk API. This is valid only if
~engine
isVosk
.If
en-US
orja
is selected as~language
, you do not need to specify the path. To load other models, please download them from Model list.
Author
Yuki Furuta «furushchev@jsk.imi.i.u-tokyo.ac.jp»
Changelog for package ros_speech_recognition
2.1.28 (2023-07-24)
2.1.27 (2023-06-24)
- fix package.xml/CMakeLists.txt to supress catkin_lint errors (#479)
- Contributors: Kei Okada
2.1.26 (2023-06-14)
- add LICENSE files (#476)
- Contributors: Kei Okada
2.1.25 (2023-06-08)
- [ros_speech_recognition] Add vosk engine (#474)
- Pr/use sound themes freedesktop (#472)
- add test to check if ros node is loadable (#463)
- add self.conf_thresh in __init_ function (#457)
- [ros_speech_recognition] add ubuntu-sounds dependency (#453)
- [ros_speech_recognition] Return if result is empty (#443)
- [ros_speece_recognition] Set confidence value of google (#434)
- [ros_speech_recognition] add parrotry.launch (#414)
- [ros_ speech_recognition] update default arg for speech_recognition.launch (#412)
- [ros_speech_recogniton, respeaker_ros] add confidence field (#411)
- [ros_speech_recognition] add self cancellation for speech recogntion (#413)
- [#405 and #410] Fix CI (#415)
- add ROS interface for https://cloud.google.com/natural-language (#304)
- GithubAction: add test for aarch64(melodic) / indigo (arm64)
(#365)
- pgm_learner/respeaker_ros/ros_speech_recognition/rosping: increase time-limit/wait-time
- Explicit python interpreter in catkin_virtualenv (#367)
- .github/workflow: integrate all yaml to one (#338)
- [ros_speech_recognition] Fixed the behavior of launch file (#336)
- [ros_speech_recognition] add auto_start in speech_recognition_node.py (#301)
- [ros_speech_recognition] add SpeechRecognitionCandidatesToString node (#303)
- Enable sound play flag (#315)
- Contributors: Aiko Ichikura, Aoi Nakane, Kei Okada, Koki Shinjo, Naoto Tsukamoto, Naoya Yamaguchi, Shingo Kitagawa, Yoshiki Obinata, Iory Yanokura
2.1.24 (2021-07-26)
2.1.23 (2021-07-21)
2.1.22 (2021-06-10)
- enable to change topic name from speech_recognition.launch (#254)
- support SpeakerDiarization, see
https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize#SpeechRecognitionAlternative
(#244)
- [ros_speech_recognition] Add doc to speech_recognition.launch add doc to args, and we need to use rosparm for device, not param. because 'device: ' causes load_parameters: unable to set parameters (last param was [/speech_recognition/depth=16]): cannot marshal None unless allow_none is enabled error
- more exception message for self.recognize
- Use PYTHON_INTERPRETER python3 in ros_speech_recognition (#225)
- Contributors: Kei Okada, Naoya Yamaguchi, Shingo Kitagawa
2.1.21 (2020-08-19)
- add missing packages, closes https://github.com/ros/rosdistro/pull/26216 (#211)
- Contributors: Kei Okada
2.1.20 (2020-08-07)
2.1.19 (2020-07-21)
- Fixed issue #201 as requested, see https://github.com/jsk-ros-pkg/jsk_3rdparty/pull/202
- Contributors: MrMarshy
2.1.18 (2020-07-20)
- Fix for noetic
(#200)
- fix 2to3, with print, raise, exception
- [ros_speech_recognition] Enable multi channel audio recognition
(#198)
- adjust type code to the CPU platform
- replace rosparam name: channels -> n_channel
- add rosparam description to README
- enable multi channel audio recognition
- Add args to ros_speech_recognition
(#197)
- Add flac as run_depend for SpeechRecognition pip package
- Use catkin_virtualenv to use SpeechRecognition pip package
- Add arguments and params to pass rostest
- Add test for ros_speech_recognition
- add args to launch
- add pip install to tutorials
- add param description to README
- Contributors: Kei Okada, Naoya Yamaguchi
2.1.17 (2020-04-16)
2.1.16 (2020-04-16)
2.1.15 (2019-12-12)
2.1.14 (2019-11-21)
- set SoundRequest.volume for kinetic (#173)
- Contributors: Kei Okada
2.1.13 (2019-07-10)
2.1.12 (2019-05-25)
- fixes GoogleCloud auth (#158)
- Contributors: jonasius
2.1.11 (2018-08-29)
2.1.10 (2018-04-25)
2.1.9 (2018-04-24)
2.1.8 (2018-04-17)
2.1.7 (2018-04-09)
2.1.6 (2017-11-21)
2.1.5 (2017-11-20)
- ros_speech_recognition: add continuous mode (#127)
- ros_speech_recognition: add README (#123)
- add ros_speech_recognition package (#121)
- Contributors: Yuki Furuta
2.1.4 (2017-07-16)
2.1.3 (2017-07-07)
2.1.2 (2017-07-06)
2.1.1 (2017-07-05)
2.1.0 (2017-07-02)
2.0.20 (2017-05-09)
2.0.19 (2017-02-22)
2.0.18 (2016-10-28)
2.0.17 (2016-10-22)
2.0.16 (2016-10-17)
2.0.15 (2016-10-16)
2.0.14 (2016-03-20)
2.0.13 (2015-12-15)
2.0.12 (2015-11-26)
2.0.11 (2015-10-07 14:16)
2.0.10 (2015-10-07 12:47)
2.0.9 (2015-09-26)
2.0.8 (2015-09-15)
2.0.7 (2015-09-14)
2.0.6 (2015-09-08)
2.0.5 (2015-08-23)
2.0.4 (2015-08-18)
2.0.3 (2015-08-01)
2.0.2 (2015-06-29)
2.0.1 (2015-06-19 21:21)
2.0.0 (2015-06-19 10:41)
1.0.71 (2015-05-17)
1.0.70 (2015-05-08)
1.0.69 (2015-05-05 12:28)
1.0.68 (2015-05-05 09:49)
1.0.67 (2015-05-03)
1.0.66 (2015-04-03)
1.0.65 (2015-04-02)
1.0.64 (2015-03-29)
1.0.63 (2015-02-19)
1.0.62 (2015-02-17)
1.0.61 (2015-02-11)
1.0.60 (2015-02-03 10:12)
1.0.59 (2015-02-03 04:05)
1.0.58 (2015-01-07)
1.0.57 (2014-12-23)
1.0.56 (2014-12-17)
1.0.55 (2014-12-09)
1.0.54 (2014-11-15)
1.0.53 (2014-11-01)
1.0.52 (2014-10-23)
1.0.51 (2014-10-20 16:01)
1.0.50 (2014-10-20 01:50)
1.0.49 (2014-10-13)
1.0.48 (2014-10-12)
1.0.47 (2014-10-08)
1.0.46 (2014-10-03)
1.0.45 (2014-09-29)
1.0.44 (2014-09-26 09:17)
1.0.43 (2014-09-26 01:08)
1.0.42 (2014-09-25)
1.0.41 (2014-09-23)
1.0.40 (2014-09-19)
1.0.39 (2014-09-17)
1.0.38 (2014-09-13)
1.0.37 (2014-09-08)
1.0.36 (2014-09-01)
1.0.35 (2014-08-16)
1.0.34 (2014-08-14)
1.0.33 (2014-07-28)
1.0.32 (2014-07-26)
1.0.31 (2014-07-23)
1.0.30 (2014-07-15)
1.0.29 (2014-07-02)
1.0.28 (2014-06-24)
1.0.27 (2014-06-10)
1.0.26 (2014-05-30)
1.0.25 (2014-05-26)
1.0.24 (2014-05-24)
1.0.23 (2014-05-23)
1.0.22 (2014-05-22)
1.0.21 (2014-05-20)
1.0.20 (2014-05-09)
1.0.19 (2014-05-06)
1.0.18 (2014-05-04)
1.0.17 (2014-04-20)
1.0.16 (2014-04-19 23:29)
1.0.15 (2014-04-19 20:19)
1.0.14 (2014-04-19 12:52)
1.0.13 (2014-04-19 11:06)
1.0.12 (2014-04-18 16:58)
1.0.11 (2014-04-18 08:18)
1.0.10 (2014-04-17)
1.0.9 (2014-04-12)
1.0.8 (2014-04-11)
1.0.7 (2014-04-10)
1.0.6 (2014-04-07)
1.0.5 (2014-03-31)
1.0.4 (2014-03-29)
1.0.3 (2014-03-19)
1.0.2 (2014-03-12)
1.0.1 (2014-03-07)
1.0.0 (2014-03-05)
Wiki Tutorials
Package Dependencies
Deps | Name |
---|---|
catkin_virtualenv | |
dynamic_reconfigure | |
jsk_data | |
speech_recognition_msgs | |
catkin | |
audio_capture | |
audio_common_msgs | |
sound_play | |
rostest | |
roslaunch |
System Dependencies
Dependant Packages
Name | Deps |
---|---|
jsk_3rdparty |
Launch files
- launch/speech_recognition.launch
-
- launch_sound_play [default: true] — Launch sound_play node to speak
- launch_audio_capture [default: true] — Launch audio_capture node to publish audio topic from microphone
- audio_topic [default: /audio] — Name of audio topic captured from microphone
- voice_topic [default: /speech_to_text] — Name of text topic of recognized speech
- n_channel [default: 1] — Number of channels of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- depth [default: 16] — Bit depth of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- sample_rate [default: 16000] — Frame rate of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- device [default: ] — Card and device number of microphone (e.g. hw:0,0). you can check card number and device number by '$ arecord -l', then uses hw:[card number],[device number]
- engine [default: Google] — Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM
- language [default: en-US] — Speech to text language. For Japanese, set ja-JP.
- continuous [default: true] — If false, /speech_recognition service is published. If true, /speech_to_text topic is published.
- auto_start [default: true] — Whether speech_recognition starts automatically or not. This parameter works when continuous is true
- self_cancellation [default: true] — Do not recognize the audio when robot is speaking or not.
- tts_tolerance [default: 1.0] — Tolerance second for recognizing whether robot is speaking or not
- tts_action_names [default: ['sound_play']] — tts action name. these servers outputs are ignored by sound_recognition
- launch/parrotry.launch
-
- use_google [default: true]
- language [default: en-US]
- confidence_threshold [default: 0.8]
Messages
Services
Plugins
Recent questions tagged ros_speech_recognition at Robotics Stack Exchange
|
Package Summary
Tags | No category tags. |
Version | 2.1.28 |
License | BSD |
Build type | CATKIN |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://github.com/jsk-ros-pkg/jsk_3rdparty.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2024-07-10 |
Dev Status | DEVELOPED |
CI status | No Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Yuki Furuta
Authors
- Yuki Furuta
ros_speech_recognition
A ROS package for speech-to-text services.
This package uses Python package SpeechRecognition as a backend.
Tutorials
Normal tutorial
- Install this package and SpeechReconition
sudo apt install ros-${ROS_DISTRO}-ros-speech-recognition
- Launch speech recognition node
roslaunch ros_speech_recognition speech_recognition.launch
- Echo
/speech_to_text
rostopic echo /speech_to_text
# you can get the recognition result
Parrotry tutorial
Parrotry mean オウム返し in Japanese
# english
roslaunch ros_speech_recognition parrotry.launch
# japanese
roslaunch ros_speech_recognition parrotry.launch language:=ja-JP
speech_recognition_node.py
Interface
Publishing Topics
-
~voice_topic
(speech_recognition_msgs/SpeechRecognitionCandidates
)Speech recognition candidates topic name.
Topic name is set by parameter
~voice_topic
, and default value isspeech_to_text
. -
sound_play
(sound_play/SoundRequestAction
)Action client to play sound on events. If the action server is not available or
~enable_sound_effect
isFalse
, no sound is played.
Subscribing Topics
-
~audio_topic
(audio_common_msgs/AudioData
)Audio stream data to be recognized.
Topis name is set by parameter
~audio_topic
and default value isaudio
.
Advertising Services
-
speech_recognition
(speech_recognition_msgs/SpeechRecognition
)Service for speech recognition
-
speech_recognition/start
(std_srvs/Empty
)Start service for speech recognition
This service is available when parameter
~contiunous
isTrue
. -
speech_recognition/start
(std_srvs/Empty
)Stop service for speech recognition
This service is available when parameter
~contiunous
isTrue
.
Parameters
-
~voice_topic
(String
, default:speech_to_text
)Publishing voice topic name
-
~audio_topic
(String
, default:audio
)Subscribing audio topic name
-
~enable_sound_effect
(Bool
, default:True
)Flag to enable or disable sound to play sound on recognition.
-
~language
(String
, default:en-US
)Language to be recognized
-
~engine
(Enum[String]
, default:Google
)Speech-to-text engine (To see full options use
dynamic_reconfigure
) -
~energy_threshold
(Double
, default:300
)Threshold for Voice activity detection
-
~dynamic_energy_threshold
(Bool
, default:True
)Adaptive estimation for
energy_threshold
-
~dynamic_energy_adjustment_damping
(Double
, default:0.15
)Damping threshold for dynamic VAD
-
~dynamic_energy_ratio
(Double
, default:1.5
)Energy ratio for dynamic VAD
-
~pause_threshold
(Double
, default:0.8
)Seconds of non-speaking audio before a phrase is considered complete
-
~operation_timeout
(Double
, default:0.0
)Seconds after an internal operation (e.g., an API request) starts before it times out
-
~listen_timeout
(Double
, default:0.0
)The maximum number of seconds that this will wait for a phrase to start before giving up
-
~phrase_time_limit
(Double
, default:10.0
)The maximum number of seconds that this will allow a phrase to continue before stopping and returning the part of the phrase processed before the time limit was reached
-
~phrase_threshold
(Double
, default:0.3
)Minimum seconds of speaking audio before we consider the speaking audio a phrase
-
~non_speaking_duration
(Double
, default:0.5
)Seconds of non-speaking audio to keep on both sides of the recording
-
~duration
(Double
, default:10.0
)Seconds of waiting for speech
-
~depth
(Int
, default:16
)Depth of audio signal
-
~n_channel
(Int
, default:1
)Total number of channels in audio data (e.g. 1: mono, 2: stereo)
-
~sample_rate
(Int
, default:16000
)Sample rate of audio signal
-
~buffer_size
(Int
, default:10240
)Maximum buffer size to store audio data for speech recognition
-
~start_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/bell.ogg
)Path to sound file for bell on the start of audio caption
-
~recognized_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/message.ogg
)Path to sound file for bell on the end of audio caption
-
~success_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/message-new-instant.ogg
)Path to sound file for bell on getting successful recognition result
-
~timeout_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/network-connectivity-lost.ogg
)Path to sound file for bell on timeout for recognition
-
~continuous
(Bool
, default: False)Selecting to use topic or service. By default, service is used.
-
~auto_start
(Bool
, default: True)Starting the speech recognition when launching.
-
~self_cancellation
(Bool
, default:True
)Whether the node recognize the sound heard when
~tts_action_names
is running or not.This options is for ignoring self voice sounds from recognition.
-
~tts_action_names
(List[String]
, default:['sound_play']
)Text-to-speech action name for self cancellation.
The node ignores the voice heard when these Text-to-speech action is running.
-
~tts_tolerance
(Float
, default:1.0
)Tolerance seconds for self cancellation.
The node ignores the voice with this tolerance seconds after
~tts_action_names
finish running. -
~google_key
(String
, default:None
)Auth Key for Google API. If
None
, use public key. (No guarantee to be blocked.)
This is valid only if~engine
isGoogle
. -
~google_cloud_credentials_json
(String
, default:None
)Path to credential json file. For JSK users, you can download from Google Drive link. This is valid only if
~engine
isGoogleCloud
. -
~google_cloud_preferred_phrases
([String]
, default:None
)Preferred phrases parameters. This is valid only if
~engine
isGoogleCloud
. -
~bing_key
(String
, default:None
)Auth key for Bing API.
This is valid only if~engine
isbing
. -
~vosk_model_path
(String
, default:None
)Path to trainded model for Vosk API. This is valid only if
~engine
isVosk
.If
en-US
orja
is selected as~language
, you do not need to specify the path. To load other models, please download them from Model list.
Author
Yuki Furuta «furushchev@jsk.imi.i.u-tokyo.ac.jp»
Changelog for package ros_speech_recognition
2.1.28 (2023-07-24)
2.1.27 (2023-06-24)
- fix package.xml/CMakeLists.txt to supress catkin_lint errors (#479)
- Contributors: Kei Okada
2.1.26 (2023-06-14)
- add LICENSE files (#476)
- Contributors: Kei Okada
2.1.25 (2023-06-08)
- [ros_speech_recognition] Add vosk engine (#474)
- Pr/use sound themes freedesktop (#472)
- add test to check if ros node is loadable (#463)
- add self.conf_thresh in __init_ function (#457)
- [ros_speech_recognition] add ubuntu-sounds dependency (#453)
- [ros_speech_recognition] Return if result is empty (#443)
- [ros_speece_recognition] Set confidence value of google (#434)
- [ros_speech_recognition] add parrotry.launch (#414)
- [ros_ speech_recognition] update default arg for speech_recognition.launch (#412)
- [ros_speech_recogniton, respeaker_ros] add confidence field (#411)
- [ros_speech_recognition] add self cancellation for speech recogntion (#413)
- [#405 and #410] Fix CI (#415)
- add ROS interface for https://cloud.google.com/natural-language (#304)
- GithubAction: add test for aarch64(melodic) / indigo (arm64)
(#365)
- pgm_learner/respeaker_ros/ros_speech_recognition/rosping: increase time-limit/wait-time
- Explicit python interpreter in catkin_virtualenv (#367)
- .github/workflow: integrate all yaml to one (#338)
- [ros_speech_recognition] Fixed the behavior of launch file (#336)
- [ros_speech_recognition] add auto_start in speech_recognition_node.py (#301)
- [ros_speech_recognition] add SpeechRecognitionCandidatesToString node (#303)
- Enable sound play flag (#315)
- Contributors: Aiko Ichikura, Aoi Nakane, Kei Okada, Koki Shinjo, Naoto Tsukamoto, Naoya Yamaguchi, Shingo Kitagawa, Yoshiki Obinata, Iory Yanokura
2.1.24 (2021-07-26)
2.1.23 (2021-07-21)
2.1.22 (2021-06-10)
- enable to change topic name from speech_recognition.launch (#254)
- support SpeakerDiarization, see
https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize#SpeechRecognitionAlternative
(#244)
- [ros_speech_recognition] Add doc to speech_recognition.launch add doc to args, and we need to use rosparm for device, not param. because 'device: ' causes load_parameters: unable to set parameters (last param was [/speech_recognition/depth=16]): cannot marshal None unless allow_none is enabled error
- more exception message for self.recognize
- Use PYTHON_INTERPRETER python3 in ros_speech_recognition (#225)
- Contributors: Kei Okada, Naoya Yamaguchi, Shingo Kitagawa
2.1.21 (2020-08-19)
- add missing packages, closes https://github.com/ros/rosdistro/pull/26216 (#211)
- Contributors: Kei Okada
2.1.20 (2020-08-07)
2.1.19 (2020-07-21)
- Fixed issue #201 as requested, see https://github.com/jsk-ros-pkg/jsk_3rdparty/pull/202
- Contributors: MrMarshy
2.1.18 (2020-07-20)
- Fix for noetic
(#200)
- fix 2to3, with print, raise, exception
- [ros_speech_recognition] Enable multi channel audio recognition
(#198)
- adjust type code to the CPU platform
- replace rosparam name: channels -> n_channel
- add rosparam description to README
- enable multi channel audio recognition
- Add args to ros_speech_recognition
(#197)
- Add flac as run_depend for SpeechRecognition pip package
- Use catkin_virtualenv to use SpeechRecognition pip package
- Add arguments and params to pass rostest
- Add test for ros_speech_recognition
- add args to launch
- add pip install to tutorials
- add param description to README
- Contributors: Kei Okada, Naoya Yamaguchi
2.1.17 (2020-04-16)
2.1.16 (2020-04-16)
2.1.15 (2019-12-12)
2.1.14 (2019-11-21)
- set SoundRequest.volume for kinetic (#173)
- Contributors: Kei Okada
2.1.13 (2019-07-10)
2.1.12 (2019-05-25)
- fixes GoogleCloud auth (#158)
- Contributors: jonasius
2.1.11 (2018-08-29)
2.1.10 (2018-04-25)
2.1.9 (2018-04-24)
2.1.8 (2018-04-17)
2.1.7 (2018-04-09)
2.1.6 (2017-11-21)
2.1.5 (2017-11-20)
- ros_speech_recognition: add continuous mode (#127)
- ros_speech_recognition: add README (#123)
- add ros_speech_recognition package (#121)
- Contributors: Yuki Furuta
2.1.4 (2017-07-16)
2.1.3 (2017-07-07)
2.1.2 (2017-07-06)
2.1.1 (2017-07-05)
2.1.0 (2017-07-02)
2.0.20 (2017-05-09)
2.0.19 (2017-02-22)
2.0.18 (2016-10-28)
2.0.17 (2016-10-22)
2.0.16 (2016-10-17)
2.0.15 (2016-10-16)
2.0.14 (2016-03-20)
2.0.13 (2015-12-15)
2.0.12 (2015-11-26)
2.0.11 (2015-10-07 14:16)
2.0.10 (2015-10-07 12:47)
2.0.9 (2015-09-26)
2.0.8 (2015-09-15)
2.0.7 (2015-09-14)
2.0.6 (2015-09-08)
2.0.5 (2015-08-23)
2.0.4 (2015-08-18)
2.0.3 (2015-08-01)
2.0.2 (2015-06-29)
2.0.1 (2015-06-19 21:21)
2.0.0 (2015-06-19 10:41)
1.0.71 (2015-05-17)
1.0.70 (2015-05-08)
1.0.69 (2015-05-05 12:28)
1.0.68 (2015-05-05 09:49)
1.0.67 (2015-05-03)
1.0.66 (2015-04-03)
1.0.65 (2015-04-02)
1.0.64 (2015-03-29)
1.0.63 (2015-02-19)
1.0.62 (2015-02-17)
1.0.61 (2015-02-11)
1.0.60 (2015-02-03 10:12)
1.0.59 (2015-02-03 04:05)
1.0.58 (2015-01-07)
1.0.57 (2014-12-23)
1.0.56 (2014-12-17)
1.0.55 (2014-12-09)
1.0.54 (2014-11-15)
1.0.53 (2014-11-01)
1.0.52 (2014-10-23)
1.0.51 (2014-10-20 16:01)
1.0.50 (2014-10-20 01:50)
1.0.49 (2014-10-13)
1.0.48 (2014-10-12)
1.0.47 (2014-10-08)
1.0.46 (2014-10-03)
1.0.45 (2014-09-29)
1.0.44 (2014-09-26 09:17)
1.0.43 (2014-09-26 01:08)
1.0.42 (2014-09-25)
1.0.41 (2014-09-23)
1.0.40 (2014-09-19)
1.0.39 (2014-09-17)
1.0.38 (2014-09-13)
1.0.37 (2014-09-08)
1.0.36 (2014-09-01)
1.0.35 (2014-08-16)
1.0.34 (2014-08-14)
1.0.33 (2014-07-28)
1.0.32 (2014-07-26)
1.0.31 (2014-07-23)
1.0.30 (2014-07-15)
1.0.29 (2014-07-02)
1.0.28 (2014-06-24)
1.0.27 (2014-06-10)
1.0.26 (2014-05-30)
1.0.25 (2014-05-26)
1.0.24 (2014-05-24)
1.0.23 (2014-05-23)
1.0.22 (2014-05-22)
1.0.21 (2014-05-20)
1.0.20 (2014-05-09)
1.0.19 (2014-05-06)
1.0.18 (2014-05-04)
1.0.17 (2014-04-20)
1.0.16 (2014-04-19 23:29)
1.0.15 (2014-04-19 20:19)
1.0.14 (2014-04-19 12:52)
1.0.13 (2014-04-19 11:06)
1.0.12 (2014-04-18 16:58)
1.0.11 (2014-04-18 08:18)
1.0.10 (2014-04-17)
1.0.9 (2014-04-12)
1.0.8 (2014-04-11)
1.0.7 (2014-04-10)
1.0.6 (2014-04-07)
1.0.5 (2014-03-31)
1.0.4 (2014-03-29)
1.0.3 (2014-03-19)
1.0.2 (2014-03-12)
1.0.1 (2014-03-07)
1.0.0 (2014-03-05)
Wiki Tutorials
Package Dependencies
Deps | Name |
---|---|
catkin_virtualenv | |
dynamic_reconfigure | |
jsk_data | |
speech_recognition_msgs | |
catkin | |
audio_capture | |
audio_common_msgs | |
sound_play | |
rostest | |
roslaunch |
System Dependencies
Dependant Packages
Name | Deps |
---|---|
jsk_3rdparty |
Launch files
- launch/speech_recognition.launch
-
- launch_sound_play [default: true] — Launch sound_play node to speak
- launch_audio_capture [default: true] — Launch audio_capture node to publish audio topic from microphone
- audio_topic [default: /audio] — Name of audio topic captured from microphone
- voice_topic [default: /speech_to_text] — Name of text topic of recognized speech
- n_channel [default: 1] — Number of channels of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- depth [default: 16] — Bit depth of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- sample_rate [default: 16000] — Frame rate of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- device [default: ] — Card and device number of microphone (e.g. hw:0,0). you can check card number and device number by '$ arecord -l', then uses hw:[card number],[device number]
- engine [default: Google] — Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM
- language [default: en-US] — Speech to text language. For Japanese, set ja-JP.
- continuous [default: true] — If false, /speech_recognition service is published. If true, /speech_to_text topic is published.
- auto_start [default: true] — Whether speech_recognition starts automatically or not. This parameter works when continuous is true
- self_cancellation [default: true] — Do not recognize the audio when robot is speaking or not.
- tts_tolerance [default: 1.0] — Tolerance second for recognizing whether robot is speaking or not
- tts_action_names [default: ['sound_play']] — tts action name. these servers outputs are ignored by sound_recognition
- launch/parrotry.launch
-
- use_google [default: true]
- language [default: en-US]
- confidence_threshold [default: 0.8]
Messages
Services
Plugins
Recent questions tagged ros_speech_recognition at Robotics Stack Exchange
|
Package Summary
Tags | No category tags. |
Version | 2.1.28 |
License | BSD |
Build type | CATKIN |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://github.com/jsk-ros-pkg/jsk_3rdparty.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2024-07-10 |
Dev Status | DEVELOPED |
CI status | Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Yuki Furuta
Authors
- Yuki Furuta
ros_speech_recognition
A ROS package for speech-to-text services.
This package uses Python package SpeechRecognition as a backend.
Tutorials
Normal tutorial
- Install this package and SpeechReconition
sudo apt install ros-${ROS_DISTRO}-ros-speech-recognition
- Launch speech recognition node
roslaunch ros_speech_recognition speech_recognition.launch
- Echo
/speech_to_text
rostopic echo /speech_to_text
# you can get the recognition result
Parrotry tutorial
Parrotry mean オウム返し in Japanese
# english
roslaunch ros_speech_recognition parrotry.launch
# japanese
roslaunch ros_speech_recognition parrotry.launch language:=ja-JP
speech_recognition_node.py
Interface
Publishing Topics
-
~voice_topic
(speech_recognition_msgs/SpeechRecognitionCandidates
)Speech recognition candidates topic name.
Topic name is set by parameter
~voice_topic
, and default value isspeech_to_text
. -
sound_play
(sound_play/SoundRequestAction
)Action client to play sound on events. If the action server is not available or
~enable_sound_effect
isFalse
, no sound is played.
Subscribing Topics
-
~audio_topic
(audio_common_msgs/AudioData
)Audio stream data to be recognized.
Topis name is set by parameter
~audio_topic
and default value isaudio
.
Advertising Services
-
speech_recognition
(speech_recognition_msgs/SpeechRecognition
)Service for speech recognition
-
speech_recognition/start
(std_srvs/Empty
)Start service for speech recognition
This service is available when parameter
~contiunous
isTrue
. -
speech_recognition/start
(std_srvs/Empty
)Stop service for speech recognition
This service is available when parameter
~contiunous
isTrue
.
Parameters
-
~voice_topic
(String
, default:speech_to_text
)Publishing voice topic name
-
~audio_topic
(String
, default:audio
)Subscribing audio topic name
-
~enable_sound_effect
(Bool
, default:True
)Flag to enable or disable sound to play sound on recognition.
-
~language
(String
, default:en-US
)Language to be recognized
-
~engine
(Enum[String]
, default:Google
)Speech-to-text engine (To see full options use
dynamic_reconfigure
) -
~energy_threshold
(Double
, default:300
)Threshold for Voice activity detection
-
~dynamic_energy_threshold
(Bool
, default:True
)Adaptive estimation for
energy_threshold
-
~dynamic_energy_adjustment_damping
(Double
, default:0.15
)Damping threshold for dynamic VAD
-
~dynamic_energy_ratio
(Double
, default:1.5
)Energy ratio for dynamic VAD
-
~pause_threshold
(Double
, default:0.8
)Seconds of non-speaking audio before a phrase is considered complete
-
~operation_timeout
(Double
, default:0.0
)Seconds after an internal operation (e.g., an API request) starts before it times out
-
~listen_timeout
(Double
, default:0.0
)The maximum number of seconds that this will wait for a phrase to start before giving up
-
~phrase_time_limit
(Double
, default:10.0
)The maximum number of seconds that this will allow a phrase to continue before stopping and returning the part of the phrase processed before the time limit was reached
-
~phrase_threshold
(Double
, default:0.3
)Minimum seconds of speaking audio before we consider the speaking audio a phrase
-
~non_speaking_duration
(Double
, default:0.5
)Seconds of non-speaking audio to keep on both sides of the recording
-
~duration
(Double
, default:10.0
)Seconds of waiting for speech
-
~depth
(Int
, default:16
)Depth of audio signal
-
~n_channel
(Int
, default:1
)Total number of channels in audio data (e.g. 1: mono, 2: stereo)
-
~sample_rate
(Int
, default:16000
)Sample rate of audio signal
-
~buffer_size
(Int
, default:10240
)Maximum buffer size to store audio data for speech recognition
-
~start_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/bell.ogg
)Path to sound file for bell on the start of audio caption
-
~recognized_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/message.ogg
)Path to sound file for bell on the end of audio caption
-
~success_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/message-new-instant.ogg
)Path to sound file for bell on getting successful recognition result
-
~timeout_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/network-connectivity-lost.ogg
)Path to sound file for bell on timeout for recognition
-
~continuous
(Bool
, default: False)Selecting to use topic or service. By default, service is used.
-
~auto_start
(Bool
, default: True)Starting the speech recognition when launching.
-
~self_cancellation
(Bool
, default:True
)Whether the node recognize the sound heard when
~tts_action_names
is running or not.This options is for ignoring self voice sounds from recognition.
-
~tts_action_names
(List[String]
, default:['sound_play']
)Text-to-speech action name for self cancellation.
The node ignores the voice heard when these Text-to-speech action is running.
-
~tts_tolerance
(Float
, default:1.0
)Tolerance seconds for self cancellation.
The node ignores the voice with this tolerance seconds after
~tts_action_names
finish running. -
~google_key
(String
, default:None
)Auth Key for Google API. If
None
, use public key. (No guarantee to be blocked.)
This is valid only if~engine
isGoogle
. -
~google_cloud_credentials_json
(String
, default:None
)Path to credential json file. For JSK users, you can download from Google Drive link. This is valid only if
~engine
isGoogleCloud
. -
~google_cloud_preferred_phrases
([String]
, default:None
)Preferred phrases parameters. This is valid only if
~engine
isGoogleCloud
. -
~bing_key
(String
, default:None
)Auth key for Bing API.
This is valid only if~engine
isbing
. -
~vosk_model_path
(String
, default:None
)Path to trainded model for Vosk API. This is valid only if
~engine
isVosk
.If
en-US
orja
is selected as~language
, you do not need to specify the path. To load other models, please download them from Model list.
Author
Yuki Furuta «furushchev@jsk.imi.i.u-tokyo.ac.jp»
Changelog for package ros_speech_recognition
2.1.28 (2023-07-24)
2.1.27 (2023-06-24)
- fix package.xml/CMakeLists.txt to supress catkin_lint errors (#479)
- Contributors: Kei Okada
2.1.26 (2023-06-14)
- add LICENSE files (#476)
- Contributors: Kei Okada
2.1.25 (2023-06-08)
- [ros_speech_recognition] Add vosk engine (#474)
- Pr/use sound themes freedesktop (#472)
- add test to check if ros node is loadable (#463)
- add self.conf_thresh in __init_ function (#457)
- [ros_speech_recognition] add ubuntu-sounds dependency (#453)
- [ros_speech_recognition] Return if result is empty (#443)
- [ros_speece_recognition] Set confidence value of google (#434)
- [ros_speech_recognition] add parrotry.launch (#414)
- [ros_ speech_recognition] update default arg for speech_recognition.launch (#412)
- [ros_speech_recogniton, respeaker_ros] add confidence field (#411)
- [ros_speech_recognition] add self cancellation for speech recogntion (#413)
- [#405 and #410] Fix CI (#415)
- add ROS interface for https://cloud.google.com/natural-language (#304)
- GithubAction: add test for aarch64(melodic) / indigo (arm64)
(#365)
- pgm_learner/respeaker_ros/ros_speech_recognition/rosping: increase time-limit/wait-time
- Explicit python interpreter in catkin_virtualenv (#367)
- .github/workflow: integrate all yaml to one (#338)
- [ros_speech_recognition] Fixed the behavior of launch file (#336)
- [ros_speech_recognition] add auto_start in speech_recognition_node.py (#301)
- [ros_speech_recognition] add SpeechRecognitionCandidatesToString node (#303)
- Enable sound play flag (#315)
- Contributors: Aiko Ichikura, Aoi Nakane, Kei Okada, Koki Shinjo, Naoto Tsukamoto, Naoya Yamaguchi, Shingo Kitagawa, Yoshiki Obinata, Iory Yanokura
2.1.24 (2021-07-26)
2.1.23 (2021-07-21)
2.1.22 (2021-06-10)
- enable to change topic name from speech_recognition.launch (#254)
- support SpeakerDiarization, see
https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize#SpeechRecognitionAlternative
(#244)
- [ros_speech_recognition] Add doc to speech_recognition.launch add doc to args, and we need to use rosparm for device, not param. because 'device: ' causes load_parameters: unable to set parameters (last param was [/speech_recognition/depth=16]): cannot marshal None unless allow_none is enabled error
- more exception message for self.recognize
- Use PYTHON_INTERPRETER python3 in ros_speech_recognition (#225)
- Contributors: Kei Okada, Naoya Yamaguchi, Shingo Kitagawa
2.1.21 (2020-08-19)
- add missing packages, closes https://github.com/ros/rosdistro/pull/26216 (#211)
- Contributors: Kei Okada
2.1.20 (2020-08-07)
2.1.19 (2020-07-21)
- Fixed issue #201 as requested, see https://github.com/jsk-ros-pkg/jsk_3rdparty/pull/202
- Contributors: MrMarshy
2.1.18 (2020-07-20)
- Fix for noetic
(#200)
- fix 2to3, with print, raise, exception
- [ros_speech_recognition] Enable multi channel audio recognition
(#198)
- adjust type code to the CPU platform
- replace rosparam name: channels -> n_channel
- add rosparam description to README
- enable multi channel audio recognition
- Add args to ros_speech_recognition
(#197)
- Add flac as run_depend for SpeechRecognition pip package
- Use catkin_virtualenv to use SpeechRecognition pip package
- Add arguments and params to pass rostest
- Add test for ros_speech_recognition
- add args to launch
- add pip install to tutorials
- add param description to README
- Contributors: Kei Okada, Naoya Yamaguchi
2.1.17 (2020-04-16)
2.1.16 (2020-04-16)
2.1.15 (2019-12-12)
2.1.14 (2019-11-21)
- set SoundRequest.volume for kinetic (#173)
- Contributors: Kei Okada
2.1.13 (2019-07-10)
2.1.12 (2019-05-25)
- fixes GoogleCloud auth (#158)
- Contributors: jonasius
2.1.11 (2018-08-29)
2.1.10 (2018-04-25)
2.1.9 (2018-04-24)
2.1.8 (2018-04-17)
2.1.7 (2018-04-09)
2.1.6 (2017-11-21)
2.1.5 (2017-11-20)
- ros_speech_recognition: add continuous mode (#127)
- ros_speech_recognition: add README (#123)
- add ros_speech_recognition package (#121)
- Contributors: Yuki Furuta
2.1.4 (2017-07-16)
2.1.3 (2017-07-07)
2.1.2 (2017-07-06)
2.1.1 (2017-07-05)
2.1.0 (2017-07-02)
2.0.20 (2017-05-09)
2.0.19 (2017-02-22)
2.0.18 (2016-10-28)
2.0.17 (2016-10-22)
2.0.16 (2016-10-17)
2.0.15 (2016-10-16)
2.0.14 (2016-03-20)
2.0.13 (2015-12-15)
2.0.12 (2015-11-26)
2.0.11 (2015-10-07 14:16)
2.0.10 (2015-10-07 12:47)
2.0.9 (2015-09-26)
2.0.8 (2015-09-15)
2.0.7 (2015-09-14)
2.0.6 (2015-09-08)
2.0.5 (2015-08-23)
2.0.4 (2015-08-18)
2.0.3 (2015-08-01)
2.0.2 (2015-06-29)
2.0.1 (2015-06-19 21:21)
2.0.0 (2015-06-19 10:41)
1.0.71 (2015-05-17)
1.0.70 (2015-05-08)
1.0.69 (2015-05-05 12:28)
1.0.68 (2015-05-05 09:49)
1.0.67 (2015-05-03)
1.0.66 (2015-04-03)
1.0.65 (2015-04-02)
1.0.64 (2015-03-29)
1.0.63 (2015-02-19)
1.0.62 (2015-02-17)
1.0.61 (2015-02-11)
1.0.60 (2015-02-03 10:12)
1.0.59 (2015-02-03 04:05)
1.0.58 (2015-01-07)
1.0.57 (2014-12-23)
1.0.56 (2014-12-17)
1.0.55 (2014-12-09)
1.0.54 (2014-11-15)
1.0.53 (2014-11-01)
1.0.52 (2014-10-23)
1.0.51 (2014-10-20 16:01)
1.0.50 (2014-10-20 01:50)
1.0.49 (2014-10-13)
1.0.48 (2014-10-12)
1.0.47 (2014-10-08)
1.0.46 (2014-10-03)
1.0.45 (2014-09-29)
1.0.44 (2014-09-26 09:17)
1.0.43 (2014-09-26 01:08)
1.0.42 (2014-09-25)
1.0.41 (2014-09-23)
1.0.40 (2014-09-19)
1.0.39 (2014-09-17)
1.0.38 (2014-09-13)
1.0.37 (2014-09-08)
1.0.36 (2014-09-01)
1.0.35 (2014-08-16)
1.0.34 (2014-08-14)
1.0.33 (2014-07-28)
1.0.32 (2014-07-26)
1.0.31 (2014-07-23)
1.0.30 (2014-07-15)
1.0.29 (2014-07-02)
1.0.28 (2014-06-24)
1.0.27 (2014-06-10)
1.0.26 (2014-05-30)
1.0.25 (2014-05-26)
1.0.24 (2014-05-24)
1.0.23 (2014-05-23)
1.0.22 (2014-05-22)
1.0.21 (2014-05-20)
1.0.20 (2014-05-09)
1.0.19 (2014-05-06)
1.0.18 (2014-05-04)
1.0.17 (2014-04-20)
1.0.16 (2014-04-19 23:29)
1.0.15 (2014-04-19 20:19)
1.0.14 (2014-04-19 12:52)
1.0.13 (2014-04-19 11:06)
1.0.12 (2014-04-18 16:58)
1.0.11 (2014-04-18 08:18)
1.0.10 (2014-04-17)
1.0.9 (2014-04-12)
1.0.8 (2014-04-11)
1.0.7 (2014-04-10)
1.0.6 (2014-04-07)
1.0.5 (2014-03-31)
1.0.4 (2014-03-29)
1.0.3 (2014-03-19)
1.0.2 (2014-03-12)
1.0.1 (2014-03-07)
1.0.0 (2014-03-05)
Wiki Tutorials
Package Dependencies
Deps | Name |
---|---|
catkin_virtualenv | |
dynamic_reconfigure | |
jsk_data | |
speech_recognition_msgs | |
catkin | |
audio_capture | |
audio_common_msgs | |
sound_play | |
rostest | |
roslaunch |
System Dependencies
Dependant Packages
Name | Deps |
---|---|
jsk_3rdparty |
Launch files
- launch/speech_recognition.launch
-
- launch_sound_play [default: true] — Launch sound_play node to speak
- launch_audio_capture [default: true] — Launch audio_capture node to publish audio topic from microphone
- audio_topic [default: /audio] — Name of audio topic captured from microphone
- voice_topic [default: /speech_to_text] — Name of text topic of recognized speech
- n_channel [default: 1] — Number of channels of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- depth [default: 16] — Bit depth of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- sample_rate [default: 16000] — Frame rate of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- device [default: ] — Card and device number of microphone (e.g. hw:0,0). you can check card number and device number by '$ arecord -l', then uses hw:[card number],[device number]
- engine [default: Google] — Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM
- language [default: en-US] — Speech to text language. For Japanese, set ja-JP.
- continuous [default: true] — If false, /speech_recognition service is published. If true, /speech_to_text topic is published.
- auto_start [default: true] — Whether speech_recognition starts automatically or not. This parameter works when continuous is true
- self_cancellation [default: true] — Do not recognize the audio when robot is speaking or not.
- tts_tolerance [default: 1.0] — Tolerance second for recognizing whether robot is speaking or not
- tts_action_names [default: ['sound_play']] — tts action name. these servers outputs are ignored by sound_recognition
- launch/parrotry.launch
-
- use_google [default: true]
- language [default: en-US]
- confidence_threshold [default: 0.8]
Messages
Services
Plugins
Recent questions tagged ros_speech_recognition at Robotics Stack Exchange
|
Package Summary
Tags | No category tags. |
Version | 2.1.28 |
License | BSD |
Build type | CATKIN |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://github.com/jsk-ros-pkg/jsk_3rdparty.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2024-07-10 |
Dev Status | DEVELOPED |
CI status | Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Yuki Furuta
Authors
- Yuki Furuta
ros_speech_recognition
A ROS package for speech-to-text services.
This package uses Python package SpeechRecognition as a backend.
Tutorials
Normal tutorial
- Install this package and SpeechReconition
sudo apt install ros-${ROS_DISTRO}-ros-speech-recognition
- Launch speech recognition node
roslaunch ros_speech_recognition speech_recognition.launch
- Echo
/speech_to_text
rostopic echo /speech_to_text
# you can get the recognition result
Parrotry tutorial
Parrotry mean オウム返し in Japanese
# english
roslaunch ros_speech_recognition parrotry.launch
# japanese
roslaunch ros_speech_recognition parrotry.launch language:=ja-JP
speech_recognition_node.py
Interface
Publishing Topics
-
~voice_topic
(speech_recognition_msgs/SpeechRecognitionCandidates
)Speech recognition candidates topic name.
Topic name is set by parameter
~voice_topic
, and default value isspeech_to_text
. -
sound_play
(sound_play/SoundRequestAction
)Action client to play sound on events. If the action server is not available or
~enable_sound_effect
isFalse
, no sound is played.
Subscribing Topics
-
~audio_topic
(audio_common_msgs/AudioData
)Audio stream data to be recognized.
Topis name is set by parameter
~audio_topic
and default value isaudio
.
Advertising Services
-
speech_recognition
(speech_recognition_msgs/SpeechRecognition
)Service for speech recognition
-
speech_recognition/start
(std_srvs/Empty
)Start service for speech recognition
This service is available when parameter
~contiunous
isTrue
. -
speech_recognition/start
(std_srvs/Empty
)Stop service for speech recognition
This service is available when parameter
~contiunous
isTrue
.
Parameters
-
~voice_topic
(String
, default:speech_to_text
)Publishing voice topic name
-
~audio_topic
(String
, default:audio
)Subscribing audio topic name
-
~enable_sound_effect
(Bool
, default:True
)Flag to enable or disable sound to play sound on recognition.
-
~language
(String
, default:en-US
)Language to be recognized
-
~engine
(Enum[String]
, default:Google
)Speech-to-text engine (To see full options use
dynamic_reconfigure
) -
~energy_threshold
(Double
, default:300
)Threshold for Voice activity detection
-
~dynamic_energy_threshold
(Bool
, default:True
)Adaptive estimation for
energy_threshold
-
~dynamic_energy_adjustment_damping
(Double
, default:0.15
)Damping threshold for dynamic VAD
-
~dynamic_energy_ratio
(Double
, default:1.5
)Energy ratio for dynamic VAD
-
~pause_threshold
(Double
, default:0.8
)Seconds of non-speaking audio before a phrase is considered complete
-
~operation_timeout
(Double
, default:0.0
)Seconds after an internal operation (e.g., an API request) starts before it times out
-
~listen_timeout
(Double
, default:0.0
)The maximum number of seconds that this will wait for a phrase to start before giving up
-
~phrase_time_limit
(Double
, default:10.0
)The maximum number of seconds that this will allow a phrase to continue before stopping and returning the part of the phrase processed before the time limit was reached
-
~phrase_threshold
(Double
, default:0.3
)Minimum seconds of speaking audio before we consider the speaking audio a phrase
-
~non_speaking_duration
(Double
, default:0.5
)Seconds of non-speaking audio to keep on both sides of the recording
-
~duration
(Double
, default:10.0
)Seconds of waiting for speech
-
~depth
(Int
, default:16
)Depth of audio signal
-
~n_channel
(Int
, default:1
)Total number of channels in audio data (e.g. 1: mono, 2: stereo)
-
~sample_rate
(Int
, default:16000
)Sample rate of audio signal
-
~buffer_size
(Int
, default:10240
)Maximum buffer size to store audio data for speech recognition
-
~start_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/bell.ogg
)Path to sound file for bell on the start of audio caption
-
~recognized_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/message.ogg
)Path to sound file for bell on the end of audio caption
-
~success_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/message-new-instant.ogg
)Path to sound file for bell on getting successful recognition result
-
~timeout_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/network-connectivity-lost.ogg
)Path to sound file for bell on timeout for recognition
-
~continuous
(Bool
, default: False)Selecting to use topic or service. By default, service is used.
-
~auto_start
(Bool
, default: True)Starting the speech recognition when launching.
-
~self_cancellation
(Bool
, default:True
)Whether the node recognize the sound heard when
~tts_action_names
is running or not.This options is for ignoring self voice sounds from recognition.
-
~tts_action_names
(List[String]
, default:['sound_play']
)Text-to-speech action name for self cancellation.
The node ignores the voice heard when these Text-to-speech action is running.
-
~tts_tolerance
(Float
, default:1.0
)Tolerance seconds for self cancellation.
The node ignores the voice with this tolerance seconds after
~tts_action_names
finish running. -
~google_key
(String
, default:None
)Auth Key for Google API. If
None
, use public key. (No guarantee to be blocked.)
This is valid only if~engine
isGoogle
. -
~google_cloud_credentials_json
(String
, default:None
)Path to credential json file. For JSK users, you can download from Google Drive link. This is valid only if
~engine
isGoogleCloud
. -
~google_cloud_preferred_phrases
([String]
, default:None
)Preferred phrases parameters. This is valid only if
~engine
isGoogleCloud
. -
~bing_key
(String
, default:None
)Auth key for Bing API.
This is valid only if~engine
isbing
. -
~vosk_model_path
(String
, default:None
)Path to trainded model for Vosk API. This is valid only if
~engine
isVosk
.If
en-US
orja
is selected as~language
, you do not need to specify the path. To load other models, please download them from Model list.
Author
Yuki Furuta «furushchev@jsk.imi.i.u-tokyo.ac.jp»
Changelog for package ros_speech_recognition
2.1.28 (2023-07-24)
2.1.27 (2023-06-24)
- fix package.xml/CMakeLists.txt to supress catkin_lint errors (#479)
- Contributors: Kei Okada
2.1.26 (2023-06-14)
- add LICENSE files (#476)
- Contributors: Kei Okada
2.1.25 (2023-06-08)
- [ros_speech_recognition] Add vosk engine (#474)
- Pr/use sound themes freedesktop (#472)
- add test to check if ros node is loadable (#463)
- add self.conf_thresh in __init_ function (#457)
- [ros_speech_recognition] add ubuntu-sounds dependency (#453)
- [ros_speech_recognition] Return if result is empty (#443)
- [ros_speece_recognition] Set confidence value of google (#434)
- [ros_speech_recognition] add parrotry.launch (#414)
- [ros_ speech_recognition] update default arg for speech_recognition.launch (#412)
- [ros_speech_recogniton, respeaker_ros] add confidence field (#411)
- [ros_speech_recognition] add self cancellation for speech recogntion (#413)
- [#405 and #410] Fix CI (#415)
- add ROS interface for https://cloud.google.com/natural-language (#304)
- GithubAction: add test for aarch64(melodic) / indigo (arm64)
(#365)
- pgm_learner/respeaker_ros/ros_speech_recognition/rosping: increase time-limit/wait-time
- Explicit python interpreter in catkin_virtualenv (#367)
- .github/workflow: integrate all yaml to one (#338)
- [ros_speech_recognition] Fixed the behavior of launch file (#336)
- [ros_speech_recognition] add auto_start in speech_recognition_node.py (#301)
- [ros_speech_recognition] add SpeechRecognitionCandidatesToString node (#303)
- Enable sound play flag (#315)
- Contributors: Aiko Ichikura, Aoi Nakane, Kei Okada, Koki Shinjo, Naoto Tsukamoto, Naoya Yamaguchi, Shingo Kitagawa, Yoshiki Obinata, Iory Yanokura
2.1.24 (2021-07-26)
2.1.23 (2021-07-21)
2.1.22 (2021-06-10)
- enable to change topic name from speech_recognition.launch (#254)
- support SpeakerDiarization, see
https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize#SpeechRecognitionAlternative
(#244)
- [ros_speech_recognition] Add doc to speech_recognition.launch add doc to args, and we need to use rosparm for device, not param. because 'device: ' causes load_parameters: unable to set parameters (last param was [/speech_recognition/depth=16]): cannot marshal None unless allow_none is enabled error
- more exception message for self.recognize
- Use PYTHON_INTERPRETER python3 in ros_speech_recognition (#225)
- Contributors: Kei Okada, Naoya Yamaguchi, Shingo Kitagawa
2.1.21 (2020-08-19)
- add missing packages, closes https://github.com/ros/rosdistro/pull/26216 (#211)
- Contributors: Kei Okada
2.1.20 (2020-08-07)
2.1.19 (2020-07-21)
- Fixed issue #201 as requested, see https://github.com/jsk-ros-pkg/jsk_3rdparty/pull/202
- Contributors: MrMarshy
2.1.18 (2020-07-20)
- Fix for noetic
(#200)
- fix 2to3, with print, raise, exception
- [ros_speech_recognition] Enable multi channel audio recognition
(#198)
- adjust type code to the CPU platform
- replace rosparam name: channels -> n_channel
- add rosparam description to README
- enable multi channel audio recognition
- Add args to ros_speech_recognition
(#197)
- Add flac as run_depend for SpeechRecognition pip package
- Use catkin_virtualenv to use SpeechRecognition pip package
- Add arguments and params to pass rostest
- Add test for ros_speech_recognition
- add args to launch
- add pip install to tutorials
- add param description to README
- Contributors: Kei Okada, Naoya Yamaguchi
2.1.17 (2020-04-16)
2.1.16 (2020-04-16)
2.1.15 (2019-12-12)
2.1.14 (2019-11-21)
- set SoundRequest.volume for kinetic (#173)
- Contributors: Kei Okada
2.1.13 (2019-07-10)
2.1.12 (2019-05-25)
- fixes GoogleCloud auth (#158)
- Contributors: jonasius
2.1.11 (2018-08-29)
2.1.10 (2018-04-25)
2.1.9 (2018-04-24)
2.1.8 (2018-04-17)
2.1.7 (2018-04-09)
2.1.6 (2017-11-21)
2.1.5 (2017-11-20)
- ros_speech_recognition: add continuous mode (#127)
- ros_speech_recognition: add README (#123)
- add ros_speech_recognition package (#121)
- Contributors: Yuki Furuta
2.1.4 (2017-07-16)
2.1.3 (2017-07-07)
2.1.2 (2017-07-06)
2.1.1 (2017-07-05)
2.1.0 (2017-07-02)
2.0.20 (2017-05-09)
2.0.19 (2017-02-22)
2.0.18 (2016-10-28)
2.0.17 (2016-10-22)
2.0.16 (2016-10-17)
2.0.15 (2016-10-16)
2.0.14 (2016-03-20)
2.0.13 (2015-12-15)
2.0.12 (2015-11-26)
2.0.11 (2015-10-07 14:16)
2.0.10 (2015-10-07 12:47)
2.0.9 (2015-09-26)
2.0.8 (2015-09-15)
2.0.7 (2015-09-14)
2.0.6 (2015-09-08)
2.0.5 (2015-08-23)
2.0.4 (2015-08-18)
2.0.3 (2015-08-01)
2.0.2 (2015-06-29)
2.0.1 (2015-06-19 21:21)
2.0.0 (2015-06-19 10:41)
1.0.71 (2015-05-17)
1.0.70 (2015-05-08)
1.0.69 (2015-05-05 12:28)
1.0.68 (2015-05-05 09:49)
1.0.67 (2015-05-03)
1.0.66 (2015-04-03)
1.0.65 (2015-04-02)
1.0.64 (2015-03-29)
1.0.63 (2015-02-19)
1.0.62 (2015-02-17)
1.0.61 (2015-02-11)
1.0.60 (2015-02-03 10:12)
1.0.59 (2015-02-03 04:05)
1.0.58 (2015-01-07)
1.0.57 (2014-12-23)
1.0.56 (2014-12-17)
1.0.55 (2014-12-09)
1.0.54 (2014-11-15)
1.0.53 (2014-11-01)
1.0.52 (2014-10-23)
1.0.51 (2014-10-20 16:01)
1.0.50 (2014-10-20 01:50)
1.0.49 (2014-10-13)
1.0.48 (2014-10-12)
1.0.47 (2014-10-08)
1.0.46 (2014-10-03)
1.0.45 (2014-09-29)
1.0.44 (2014-09-26 09:17)
1.0.43 (2014-09-26 01:08)
1.0.42 (2014-09-25)
1.0.41 (2014-09-23)
1.0.40 (2014-09-19)
1.0.39 (2014-09-17)
1.0.38 (2014-09-13)
1.0.37 (2014-09-08)
1.0.36 (2014-09-01)
1.0.35 (2014-08-16)
1.0.34 (2014-08-14)
1.0.33 (2014-07-28)
1.0.32 (2014-07-26)
1.0.31 (2014-07-23)
1.0.30 (2014-07-15)
1.0.29 (2014-07-02)
1.0.28 (2014-06-24)
1.0.27 (2014-06-10)
1.0.26 (2014-05-30)
1.0.25 (2014-05-26)
1.0.24 (2014-05-24)
1.0.23 (2014-05-23)
1.0.22 (2014-05-22)
1.0.21 (2014-05-20)
1.0.20 (2014-05-09)
1.0.19 (2014-05-06)
1.0.18 (2014-05-04)
1.0.17 (2014-04-20)
1.0.16 (2014-04-19 23:29)
1.0.15 (2014-04-19 20:19)
1.0.14 (2014-04-19 12:52)
1.0.13 (2014-04-19 11:06)
1.0.12 (2014-04-18 16:58)
1.0.11 (2014-04-18 08:18)
1.0.10 (2014-04-17)
1.0.9 (2014-04-12)
1.0.8 (2014-04-11)
1.0.7 (2014-04-10)
1.0.6 (2014-04-07)
1.0.5 (2014-03-31)
1.0.4 (2014-03-29)
1.0.3 (2014-03-19)
1.0.2 (2014-03-12)
1.0.1 (2014-03-07)
1.0.0 (2014-03-05)
Wiki Tutorials
Package Dependencies
Deps | Name |
---|---|
catkin_virtualenv | |
dynamic_reconfigure | |
jsk_data | |
speech_recognition_msgs | |
catkin | |
audio_capture | |
audio_common_msgs | |
sound_play | |
rostest | |
roslaunch |
System Dependencies
Dependant Packages
Name | Deps |
---|---|
jsk_3rdparty | |
jsk_nao_startup |
Launch files
- launch/speech_recognition.launch
-
- launch_sound_play [default: true] — Launch sound_play node to speak
- launch_audio_capture [default: true] — Launch audio_capture node to publish audio topic from microphone
- audio_topic [default: /audio] — Name of audio topic captured from microphone
- voice_topic [default: /speech_to_text] — Name of text topic of recognized speech
- n_channel [default: 1] — Number of channels of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- depth [default: 16] — Bit depth of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- sample_rate [default: 16000] — Frame rate of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- device [default: ] — Card and device number of microphone (e.g. hw:0,0). you can check card number and device number by '$ arecord -l', then uses hw:[card number],[device number]
- engine [default: Google] — Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM
- language [default: en-US] — Speech to text language. For Japanese, set ja-JP.
- continuous [default: true] — If false, /speech_recognition service is published. If true, /speech_to_text topic is published.
- auto_start [default: true] — Whether speech_recognition starts automatically or not. This parameter works when continuous is true
- self_cancellation [default: true] — Do not recognize the audio when robot is speaking or not.
- tts_tolerance [default: 1.0] — Tolerance second for recognizing whether robot is speaking or not
- tts_action_names [default: ['sound_play']] — tts action name. these servers outputs are ignored by sound_recognition
- launch/parrotry.launch
-
- use_google [default: true]
- language [default: en-US]
- confidence_threshold [default: 0.8]
Messages
Services
Plugins
Recent questions tagged ros_speech_recognition at Robotics Stack Exchange
|
Package Summary
Tags | No category tags. |
Version | 2.1.28 |
License | BSD |
Build type | CATKIN |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://github.com/jsk-ros-pkg/jsk_3rdparty.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2024-07-10 |
Dev Status | DEVELOPED |
CI status | No Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Yuki Furuta
Authors
- Yuki Furuta
ros_speech_recognition
A ROS package for speech-to-text services.
This package uses Python package SpeechRecognition as a backend.
Tutorials
Normal tutorial
- Install this package and SpeechReconition
sudo apt install ros-${ROS_DISTRO}-ros-speech-recognition
- Launch speech recognition node
roslaunch ros_speech_recognition speech_recognition.launch
- Echo
/speech_to_text
rostopic echo /speech_to_text
# you can get the recognition result
Parrotry tutorial
Parrotry mean オウム返し in Japanese
# english
roslaunch ros_speech_recognition parrotry.launch
# japanese
roslaunch ros_speech_recognition parrotry.launch language:=ja-JP
speech_recognition_node.py
Interface
Publishing Topics
-
~voice_topic
(speech_recognition_msgs/SpeechRecognitionCandidates
)Speech recognition candidates topic name.
Topic name is set by parameter
~voice_topic
, and default value isspeech_to_text
. -
sound_play
(sound_play/SoundRequestAction
)Action client to play sound on events. If the action server is not available or
~enable_sound_effect
isFalse
, no sound is played.
Subscribing Topics
-
~audio_topic
(audio_common_msgs/AudioData
)Audio stream data to be recognized.
Topis name is set by parameter
~audio_topic
and default value isaudio
.
Advertising Services
-
speech_recognition
(speech_recognition_msgs/SpeechRecognition
)Service for speech recognition
-
speech_recognition/start
(std_srvs/Empty
)Start service for speech recognition
This service is available when parameter
~contiunous
isTrue
. -
speech_recognition/start
(std_srvs/Empty
)Stop service for speech recognition
This service is available when parameter
~contiunous
isTrue
.
Parameters
-
~voice_topic
(String
, default:speech_to_text
)Publishing voice topic name
-
~audio_topic
(String
, default:audio
)Subscribing audio topic name
-
~enable_sound_effect
(Bool
, default:True
)Flag to enable or disable sound to play sound on recognition.
-
~language
(String
, default:en-US
)Language to be recognized
-
~engine
(Enum[String]
, default:Google
)Speech-to-text engine (To see full options use
dynamic_reconfigure
) -
~energy_threshold
(Double
, default:300
)Threshold for Voice activity detection
-
~dynamic_energy_threshold
(Bool
, default:True
)Adaptive estimation for
energy_threshold
-
~dynamic_energy_adjustment_damping
(Double
, default:0.15
)Damping threshold for dynamic VAD
-
~dynamic_energy_ratio
(Double
, default:1.5
)Energy ratio for dynamic VAD
-
~pause_threshold
(Double
, default:0.8
)Seconds of non-speaking audio before a phrase is considered complete
-
~operation_timeout
(Double
, default:0.0
)Seconds after an internal operation (e.g., an API request) starts before it times out
-
~listen_timeout
(Double
, default:0.0
)The maximum number of seconds that this will wait for a phrase to start before giving up
-
~phrase_time_limit
(Double
, default:10.0
)The maximum number of seconds that this will allow a phrase to continue before stopping and returning the part of the phrase processed before the time limit was reached
-
~phrase_threshold
(Double
, default:0.3
)Minimum seconds of speaking audio before we consider the speaking audio a phrase
-
~non_speaking_duration
(Double
, default:0.5
)Seconds of non-speaking audio to keep on both sides of the recording
-
~duration
(Double
, default:10.0
)Seconds of waiting for speech
-
~depth
(Int
, default:16
)Depth of audio signal
-
~n_channel
(Int
, default:1
)Total number of channels in audio data (e.g. 1: mono, 2: stereo)
-
~sample_rate
(Int
, default:16000
)Sample rate of audio signal
-
~buffer_size
(Int
, default:10240
)Maximum buffer size to store audio data for speech recognition
-
~start_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/bell.ogg
)Path to sound file for bell on the start of audio caption
-
~recognized_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/message.ogg
)Path to sound file for bell on the end of audio caption
-
~success_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/message-new-instant.ogg
)Path to sound file for bell on getting successful recognition result
-
~timeout_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/network-connectivity-lost.ogg
)Path to sound file for bell on timeout for recognition
-
~continuous
(Bool
, default: False)Selecting to use topic or service. By default, service is used.
-
~auto_start
(Bool
, default: True)Starting the speech recognition when launching.
-
~self_cancellation
(Bool
, default:True
)Whether the node recognize the sound heard when
~tts_action_names
is running or not.This options is for ignoring self voice sounds from recognition.
-
~tts_action_names
(List[String]
, default:['sound_play']
)Text-to-speech action name for self cancellation.
The node ignores the voice heard when these Text-to-speech action is running.
-
~tts_tolerance
(Float
, default:1.0
)Tolerance seconds for self cancellation.
The node ignores the voice with this tolerance seconds after
~tts_action_names
finish running. -
~google_key
(String
, default:None
)Auth Key for Google API. If
None
, use public key. (No guarantee to be blocked.)
This is valid only if~engine
isGoogle
. -
~google_cloud_credentials_json
(String
, default:None
)Path to credential json file. For JSK users, you can download from Google Drive link. This is valid only if
~engine
isGoogleCloud
. -
~google_cloud_preferred_phrases
([String]
, default:None
)Preferred phrases parameters. This is valid only if
~engine
isGoogleCloud
. -
~bing_key
(String
, default:None
)Auth key for Bing API.
This is valid only if~engine
isbing
. -
~vosk_model_path
(String
, default:None
)Path to trainded model for Vosk API. This is valid only if
~engine
isVosk
.If
en-US
orja
is selected as~language
, you do not need to specify the path. To load other models, please download them from Model list.
Author
Yuki Furuta «furushchev@jsk.imi.i.u-tokyo.ac.jp»
Changelog for package ros_speech_recognition
2.1.28 (2023-07-24)
2.1.27 (2023-06-24)
- fix package.xml/CMakeLists.txt to supress catkin_lint errors (#479)
- Contributors: Kei Okada
2.1.26 (2023-06-14)
- add LICENSE files (#476)
- Contributors: Kei Okada
2.1.25 (2023-06-08)
- [ros_speech_recognition] Add vosk engine (#474)
- Pr/use sound themes freedesktop (#472)
- add test to check if ros node is loadable (#463)
- add self.conf_thresh in __init_ function (#457)
- [ros_speech_recognition] add ubuntu-sounds dependency (#453)
- [ros_speech_recognition] Return if result is empty (#443)
- [ros_speece_recognition] Set confidence value of google (#434)
- [ros_speech_recognition] add parrotry.launch (#414)
- [ros_ speech_recognition] update default arg for speech_recognition.launch (#412)
- [ros_speech_recogniton, respeaker_ros] add confidence field (#411)
- [ros_speech_recognition] add self cancellation for speech recogntion (#413)
- [#405 and #410] Fix CI (#415)
- add ROS interface for https://cloud.google.com/natural-language (#304)
- GithubAction: add test for aarch64(melodic) / indigo (arm64)
(#365)
- pgm_learner/respeaker_ros/ros_speech_recognition/rosping: increase time-limit/wait-time
- Explicit python interpreter in catkin_virtualenv (#367)
- .github/workflow: integrate all yaml to one (#338)
- [ros_speech_recognition] Fixed the behavior of launch file (#336)
- [ros_speech_recognition] add auto_start in speech_recognition_node.py (#301)
- [ros_speech_recognition] add SpeechRecognitionCandidatesToString node (#303)
- Enable sound play flag (#315)
- Contributors: Aiko Ichikura, Aoi Nakane, Kei Okada, Koki Shinjo, Naoto Tsukamoto, Naoya Yamaguchi, Shingo Kitagawa, Yoshiki Obinata, Iory Yanokura
2.1.24 (2021-07-26)
2.1.23 (2021-07-21)
2.1.22 (2021-06-10)
- enable to change topic name from speech_recognition.launch (#254)
- support SpeakerDiarization, see
https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize#SpeechRecognitionAlternative
(#244)
- [ros_speech_recognition] Add doc to speech_recognition.launch add doc to args, and we need to use rosparm for device, not param. because 'device: ' causes load_parameters: unable to set parameters (last param was [/speech_recognition/depth=16]): cannot marshal None unless allow_none is enabled error
- more exception message for self.recognize
- Use PYTHON_INTERPRETER python3 in ros_speech_recognition (#225)
- Contributors: Kei Okada, Naoya Yamaguchi, Shingo Kitagawa
2.1.21 (2020-08-19)
- add missing packages, closes https://github.com/ros/rosdistro/pull/26216 (#211)
- Contributors: Kei Okada
2.1.20 (2020-08-07)
2.1.19 (2020-07-21)
- Fixed issue #201 as requested, see https://github.com/jsk-ros-pkg/jsk_3rdparty/pull/202
- Contributors: MrMarshy
2.1.18 (2020-07-20)
- Fix for noetic
(#200)
- fix 2to3, with print, raise, exception
- [ros_speech_recognition] Enable multi channel audio recognition
(#198)
- adjust type code to the CPU platform
- replace rosparam name: channels -> n_channel
- add rosparam description to README
- enable multi channel audio recognition
- Add args to ros_speech_recognition
(#197)
- Add flac as run_depend for SpeechRecognition pip package
- Use catkin_virtualenv to use SpeechRecognition pip package
- Add arguments and params to pass rostest
- Add test for ros_speech_recognition
- add args to launch
- add pip install to tutorials
- add param description to README
- Contributors: Kei Okada, Naoya Yamaguchi
2.1.17 (2020-04-16)
2.1.16 (2020-04-16)
2.1.15 (2019-12-12)
2.1.14 (2019-11-21)
- set SoundRequest.volume for kinetic (#173)
- Contributors: Kei Okada
2.1.13 (2019-07-10)
2.1.12 (2019-05-25)
- fixes GoogleCloud auth (#158)
- Contributors: jonasius
2.1.11 (2018-08-29)
2.1.10 (2018-04-25)
2.1.9 (2018-04-24)
2.1.8 (2018-04-17)
2.1.7 (2018-04-09)
2.1.6 (2017-11-21)
2.1.5 (2017-11-20)
- ros_speech_recognition: add continuous mode (#127)
- ros_speech_recognition: add README (#123)
- add ros_speech_recognition package (#121)
- Contributors: Yuki Furuta
2.1.4 (2017-07-16)
2.1.3 (2017-07-07)
2.1.2 (2017-07-06)
2.1.1 (2017-07-05)
2.1.0 (2017-07-02)
2.0.20 (2017-05-09)
2.0.19 (2017-02-22)
2.0.18 (2016-10-28)
2.0.17 (2016-10-22)
2.0.16 (2016-10-17)
2.0.15 (2016-10-16)
2.0.14 (2016-03-20)
2.0.13 (2015-12-15)
2.0.12 (2015-11-26)
2.0.11 (2015-10-07 14:16)
2.0.10 (2015-10-07 12:47)
2.0.9 (2015-09-26)
2.0.8 (2015-09-15)
2.0.7 (2015-09-14)
2.0.6 (2015-09-08)
2.0.5 (2015-08-23)
2.0.4 (2015-08-18)
2.0.3 (2015-08-01)
2.0.2 (2015-06-29)
2.0.1 (2015-06-19 21:21)
2.0.0 (2015-06-19 10:41)
1.0.71 (2015-05-17)
1.0.70 (2015-05-08)
1.0.69 (2015-05-05 12:28)
1.0.68 (2015-05-05 09:49)
1.0.67 (2015-05-03)
1.0.66 (2015-04-03)
1.0.65 (2015-04-02)
1.0.64 (2015-03-29)
1.0.63 (2015-02-19)
1.0.62 (2015-02-17)
1.0.61 (2015-02-11)
1.0.60 (2015-02-03 10:12)
1.0.59 (2015-02-03 04:05)
1.0.58 (2015-01-07)
1.0.57 (2014-12-23)
1.0.56 (2014-12-17)
1.0.55 (2014-12-09)
1.0.54 (2014-11-15)
1.0.53 (2014-11-01)
1.0.52 (2014-10-23)
1.0.51 (2014-10-20 16:01)
1.0.50 (2014-10-20 01:50)
1.0.49 (2014-10-13)
1.0.48 (2014-10-12)
1.0.47 (2014-10-08)
1.0.46 (2014-10-03)
1.0.45 (2014-09-29)
1.0.44 (2014-09-26 09:17)
1.0.43 (2014-09-26 01:08)
1.0.42 (2014-09-25)
1.0.41 (2014-09-23)
1.0.40 (2014-09-19)
1.0.39 (2014-09-17)
1.0.38 (2014-09-13)
1.0.37 (2014-09-08)
1.0.36 (2014-09-01)
1.0.35 (2014-08-16)
1.0.34 (2014-08-14)
1.0.33 (2014-07-28)
1.0.32 (2014-07-26)
1.0.31 (2014-07-23)
1.0.30 (2014-07-15)
1.0.29 (2014-07-02)
1.0.28 (2014-06-24)
1.0.27 (2014-06-10)
1.0.26 (2014-05-30)
1.0.25 (2014-05-26)
1.0.24 (2014-05-24)
1.0.23 (2014-05-23)
1.0.22 (2014-05-22)
1.0.21 (2014-05-20)
1.0.20 (2014-05-09)
1.0.19 (2014-05-06)
1.0.18 (2014-05-04)
1.0.17 (2014-04-20)
1.0.16 (2014-04-19 23:29)
1.0.15 (2014-04-19 20:19)
1.0.14 (2014-04-19 12:52)
1.0.13 (2014-04-19 11:06)
1.0.12 (2014-04-18 16:58)
1.0.11 (2014-04-18 08:18)
1.0.10 (2014-04-17)
1.0.9 (2014-04-12)
1.0.8 (2014-04-11)
1.0.7 (2014-04-10)
1.0.6 (2014-04-07)
1.0.5 (2014-03-31)
1.0.4 (2014-03-29)
1.0.3 (2014-03-19)
1.0.2 (2014-03-12)
1.0.1 (2014-03-07)
1.0.0 (2014-03-05)
Wiki Tutorials
Package Dependencies
Deps | Name |
---|---|
catkin_virtualenv | |
dynamic_reconfigure | |
jsk_data | |
speech_recognition_msgs | |
catkin | |
audio_capture | |
audio_common_msgs | |
sound_play | |
rostest | |
roslaunch |
System Dependencies
Dependant Packages
Name | Deps |
---|---|
jsk_3rdparty | |
jsk_nao_startup |
Launch files
- launch/speech_recognition.launch
-
- launch_sound_play [default: true] — Launch sound_play node to speak
- launch_audio_capture [default: true] — Launch audio_capture node to publish audio topic from microphone
- audio_topic [default: /audio] — Name of audio topic captured from microphone
- voice_topic [default: /speech_to_text] — Name of text topic of recognized speech
- n_channel [default: 1] — Number of channels of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- depth [default: 16] — Bit depth of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- sample_rate [default: 16000] — Frame rate of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- device [default: ] — Card and device number of microphone (e.g. hw:0,0). you can check card number and device number by '$ arecord -l', then uses hw:[card number],[device number]
- engine [default: Google] — Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM
- language [default: en-US] — Speech to text language. For Japanese, set ja-JP.
- continuous [default: true] — If false, /speech_recognition service is published. If true, /speech_to_text topic is published.
- auto_start [default: true] — Whether speech_recognition starts automatically or not. This parameter works when continuous is true
- self_cancellation [default: true] — Do not recognize the audio when robot is speaking or not.
- tts_tolerance [default: 1.0] — Tolerance second for recognizing whether robot is speaking or not
- tts_action_names [default: ['sound_play']] — tts action name. these servers outputs are ignored by sound_recognition
- launch/parrotry.launch
-
- use_google [default: true]
- language [default: en-US]
- confidence_threshold [default: 0.8]
Messages
Services
Plugins
Recent questions tagged ros_speech_recognition at Robotics Stack Exchange
|
Package Summary
Tags | No category tags. |
Version | 2.1.28 |
License | BSD |
Build type | CATKIN |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://github.com/jsk-ros-pkg/jsk_3rdparty.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2024-07-10 |
Dev Status | DEVELOPED |
CI status | Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Yuki Furuta
Authors
- Yuki Furuta
ros_speech_recognition
A ROS package for speech-to-text services.
This package uses Python package SpeechRecognition as a backend.
Tutorials
Normal tutorial
- Install this package and SpeechReconition
sudo apt install ros-${ROS_DISTRO}-ros-speech-recognition
- Launch speech recognition node
roslaunch ros_speech_recognition speech_recognition.launch
- Echo
/speech_to_text
rostopic echo /speech_to_text
# you can get the recognition result
Parrotry tutorial
Parrotry mean オウム返し in Japanese
# english
roslaunch ros_speech_recognition parrotry.launch
# japanese
roslaunch ros_speech_recognition parrotry.launch language:=ja-JP
speech_recognition_node.py
Interface
Publishing Topics
-
~voice_topic
(speech_recognition_msgs/SpeechRecognitionCandidates
)Speech recognition candidates topic name.
Topic name is set by parameter
~voice_topic
, and default value isspeech_to_text
. -
sound_play
(sound_play/SoundRequestAction
)Action client to play sound on events. If the action server is not available or
~enable_sound_effect
isFalse
, no sound is played.
Subscribing Topics
-
~audio_topic
(audio_common_msgs/AudioData
)Audio stream data to be recognized.
Topis name is set by parameter
~audio_topic
and default value isaudio
.
Advertising Services
-
speech_recognition
(speech_recognition_msgs/SpeechRecognition
)Service for speech recognition
-
speech_recognition/start
(std_srvs/Empty
)Start service for speech recognition
This service is available when parameter
~contiunous
isTrue
. -
speech_recognition/start
(std_srvs/Empty
)Stop service for speech recognition
This service is available when parameter
~contiunous
isTrue
.
Parameters
-
~voice_topic
(String
, default:speech_to_text
)Publishing voice topic name
-
~audio_topic
(String
, default:audio
)Subscribing audio topic name
-
~enable_sound_effect
(Bool
, default:True
)Flag to enable or disable sound to play sound on recognition.
-
~language
(String
, default:en-US
)Language to be recognized
-
~engine
(Enum[String]
, default:Google
)Speech-to-text engine (To see full options use
dynamic_reconfigure
) -
~energy_threshold
(Double
, default:300
)Threshold for Voice activity detection
-
~dynamic_energy_threshold
(Bool
, default:True
)Adaptive estimation for
energy_threshold
-
~dynamic_energy_adjustment_damping
(Double
, default:0.15
)Damping threshold for dynamic VAD
-
~dynamic_energy_ratio
(Double
, default:1.5
)Energy ratio for dynamic VAD
-
~pause_threshold
(Double
, default:0.8
)Seconds of non-speaking audio before a phrase is considered complete
-
~operation_timeout
(Double
, default:0.0
)Seconds after an internal operation (e.g., an API request) starts before it times out
-
~listen_timeout
(Double
, default:0.0
)The maximum number of seconds that this will wait for a phrase to start before giving up
-
~phrase_time_limit
(Double
, default:10.0
)The maximum number of seconds that this will allow a phrase to continue before stopping and returning the part of the phrase processed before the time limit was reached
-
~phrase_threshold
(Double
, default:0.3
)Minimum seconds of speaking audio before we consider the speaking audio a phrase
-
~non_speaking_duration
(Double
, default:0.5
)Seconds of non-speaking audio to keep on both sides of the recording
-
~duration
(Double
, default:10.0
)Seconds of waiting for speech
-
~depth
(Int
, default:16
)Depth of audio signal
-
~n_channel
(Int
, default:1
)Total number of channels in audio data (e.g. 1: mono, 2: stereo)
-
~sample_rate
(Int
, default:16000
)Sample rate of audio signal
-
~buffer_size
(Int
, default:10240
)Maximum buffer size to store audio data for speech recognition
-
~start_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/bell.ogg
)Path to sound file for bell on the start of audio caption
-
~recognized_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/message.ogg
)Path to sound file for bell on the end of audio caption
-
~success_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/message-new-instant.ogg
)Path to sound file for bell on getting successful recognition result
-
~timeout_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/network-connectivity-lost.ogg
)Path to sound file for bell on timeout for recognition
-
~continuous
(Bool
, default: False)Selecting to use topic or service. By default, service is used.
-
~auto_start
(Bool
, default: True)Starting the speech recognition when launching.
-
~self_cancellation
(Bool
, default:True
)Whether the node recognize the sound heard when
~tts_action_names
is running or not.This options is for ignoring self voice sounds from recognition.
-
~tts_action_names
(List[String]
, default:['sound_play']
)Text-to-speech action name for self cancellation.
The node ignores the voice heard when these Text-to-speech action is running.
-
~tts_tolerance
(Float
, default:1.0
)Tolerance seconds for self cancellation.
The node ignores the voice with this tolerance seconds after
~tts_action_names
finish running. -
~google_key
(String
, default:None
)Auth Key for Google API. If
None
, use public key. (No guarantee to be blocked.)
This is valid only if~engine
isGoogle
. -
~google_cloud_credentials_json
(String
, default:None
)Path to credential json file. For JSK users, you can download from Google Drive link. This is valid only if
~engine
isGoogleCloud
. -
~google_cloud_preferred_phrases
([String]
, default:None
)Preferred phrases parameters. This is valid only if
~engine
isGoogleCloud
. -
~bing_key
(String
, default:None
)Auth key for Bing API.
This is valid only if~engine
isbing
. -
~vosk_model_path
(String
, default:None
)Path to trainded model for Vosk API. This is valid only if
~engine
isVosk
.If
en-US
orja
is selected as~language
, you do not need to specify the path. To load other models, please download them from Model list.
Author
Yuki Furuta «furushchev@jsk.imi.i.u-tokyo.ac.jp»
Changelog for package ros_speech_recognition
2.1.28 (2023-07-24)
2.1.27 (2023-06-24)
- fix package.xml/CMakeLists.txt to supress catkin_lint errors (#479)
- Contributors: Kei Okada
2.1.26 (2023-06-14)
- add LICENSE files (#476)
- Contributors: Kei Okada
2.1.25 (2023-06-08)
- [ros_speech_recognition] Add vosk engine (#474)
- Pr/use sound themes freedesktop (#472)
- add test to check if ros node is loadable (#463)
- add self.conf_thresh in __init_ function (#457)
- [ros_speech_recognition] add ubuntu-sounds dependency (#453)
- [ros_speech_recognition] Return if result is empty (#443)
- [ros_speece_recognition] Set confidence value of google (#434)
- [ros_speech_recognition] add parrotry.launch (#414)
- [ros_ speech_recognition] update default arg for speech_recognition.launch (#412)
- [ros_speech_recogniton, respeaker_ros] add confidence field (#411)
- [ros_speech_recognition] add self cancellation for speech recogntion (#413)
- [#405 and #410] Fix CI (#415)
- add ROS interface for https://cloud.google.com/natural-language (#304)
- GithubAction: add test for aarch64(melodic) / indigo (arm64)
(#365)
- pgm_learner/respeaker_ros/ros_speech_recognition/rosping: increase time-limit/wait-time
- Explicit python interpreter in catkin_virtualenv (#367)
- .github/workflow: integrate all yaml to one (#338)
- [ros_speech_recognition] Fixed the behavior of launch file (#336)
- [ros_speech_recognition] add auto_start in speech_recognition_node.py (#301)
- [ros_speech_recognition] add SpeechRecognitionCandidatesToString node (#303)
- Enable sound play flag (#315)
- Contributors: Aiko Ichikura, Aoi Nakane, Kei Okada, Koki Shinjo, Naoto Tsukamoto, Naoya Yamaguchi, Shingo Kitagawa, Yoshiki Obinata, Iory Yanokura
2.1.24 (2021-07-26)
2.1.23 (2021-07-21)
2.1.22 (2021-06-10)
- enable to change topic name from speech_recognition.launch (#254)
- support SpeakerDiarization, see
https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize#SpeechRecognitionAlternative
(#244)
- [ros_speech_recognition] Add doc to speech_recognition.launch add doc to args, and we need to use rosparm for device, not param. because 'device: ' causes load_parameters: unable to set parameters (last param was [/speech_recognition/depth=16]): cannot marshal None unless allow_none is enabled error
- more exception message for self.recognize
- Use PYTHON_INTERPRETER python3 in ros_speech_recognition (#225)
- Contributors: Kei Okada, Naoya Yamaguchi, Shingo Kitagawa
2.1.21 (2020-08-19)
- add missing packages, closes https://github.com/ros/rosdistro/pull/26216 (#211)
- Contributors: Kei Okada
2.1.20 (2020-08-07)
2.1.19 (2020-07-21)
- Fixed issue #201 as requested, see https://github.com/jsk-ros-pkg/jsk_3rdparty/pull/202
- Contributors: MrMarshy
2.1.18 (2020-07-20)
- Fix for noetic
(#200)
- fix 2to3, with print, raise, exception
- [ros_speech_recognition] Enable multi channel audio recognition
(#198)
- adjust type code to the CPU platform
- replace rosparam name: channels -> n_channel
- add rosparam description to README
- enable multi channel audio recognition
- Add args to ros_speech_recognition
(#197)
- Add flac as run_depend for SpeechRecognition pip package
- Use catkin_virtualenv to use SpeechRecognition pip package
- Add arguments and params to pass rostest
- Add test for ros_speech_recognition
- add args to launch
- add pip install to tutorials
- add param description to README
- Contributors: Kei Okada, Naoya Yamaguchi
2.1.17 (2020-04-16)
2.1.16 (2020-04-16)
2.1.15 (2019-12-12)
2.1.14 (2019-11-21)
- set SoundRequest.volume for kinetic (#173)
- Contributors: Kei Okada
2.1.13 (2019-07-10)
2.1.12 (2019-05-25)
- fixes GoogleCloud auth (#158)
- Contributors: jonasius
2.1.11 (2018-08-29)
2.1.10 (2018-04-25)
2.1.9 (2018-04-24)
2.1.8 (2018-04-17)
2.1.7 (2018-04-09)
2.1.6 (2017-11-21)
2.1.5 (2017-11-20)
- ros_speech_recognition: add continuous mode (#127)
- ros_speech_recognition: add README (#123)
- add ros_speech_recognition package (#121)
- Contributors: Yuki Furuta
2.1.4 (2017-07-16)
2.1.3 (2017-07-07)
2.1.2 (2017-07-06)
2.1.1 (2017-07-05)
2.1.0 (2017-07-02)
2.0.20 (2017-05-09)
2.0.19 (2017-02-22)
2.0.18 (2016-10-28)
2.0.17 (2016-10-22)
2.0.16 (2016-10-17)
2.0.15 (2016-10-16)
2.0.14 (2016-03-20)
2.0.13 (2015-12-15)
2.0.12 (2015-11-26)
2.0.11 (2015-10-07 14:16)
2.0.10 (2015-10-07 12:47)
2.0.9 (2015-09-26)
2.0.8 (2015-09-15)
2.0.7 (2015-09-14)
2.0.6 (2015-09-08)
2.0.5 (2015-08-23)
2.0.4 (2015-08-18)
2.0.3 (2015-08-01)
2.0.2 (2015-06-29)
2.0.1 (2015-06-19 21:21)
2.0.0 (2015-06-19 10:41)
1.0.71 (2015-05-17)
1.0.70 (2015-05-08)
1.0.69 (2015-05-05 12:28)
1.0.68 (2015-05-05 09:49)
1.0.67 (2015-05-03)
1.0.66 (2015-04-03)
1.0.65 (2015-04-02)
1.0.64 (2015-03-29)
1.0.63 (2015-02-19)
1.0.62 (2015-02-17)
1.0.61 (2015-02-11)
1.0.60 (2015-02-03 10:12)
1.0.59 (2015-02-03 04:05)
1.0.58 (2015-01-07)
1.0.57 (2014-12-23)
1.0.56 (2014-12-17)
1.0.55 (2014-12-09)
1.0.54 (2014-11-15)
1.0.53 (2014-11-01)
1.0.52 (2014-10-23)
1.0.51 (2014-10-20 16:01)
1.0.50 (2014-10-20 01:50)
1.0.49 (2014-10-13)
1.0.48 (2014-10-12)
1.0.47 (2014-10-08)
1.0.46 (2014-10-03)
1.0.45 (2014-09-29)
1.0.44 (2014-09-26 09:17)
1.0.43 (2014-09-26 01:08)
1.0.42 (2014-09-25)
1.0.41 (2014-09-23)
1.0.40 (2014-09-19)
1.0.39 (2014-09-17)
1.0.38 (2014-09-13)
1.0.37 (2014-09-08)
1.0.36 (2014-09-01)
1.0.35 (2014-08-16)
1.0.34 (2014-08-14)
1.0.33 (2014-07-28)
1.0.32 (2014-07-26)
1.0.31 (2014-07-23)
1.0.30 (2014-07-15)
1.0.29 (2014-07-02)
1.0.28 (2014-06-24)
1.0.27 (2014-06-10)
1.0.26 (2014-05-30)
1.0.25 (2014-05-26)
1.0.24 (2014-05-24)
1.0.23 (2014-05-23)
1.0.22 (2014-05-22)
1.0.21 (2014-05-20)
1.0.20 (2014-05-09)
1.0.19 (2014-05-06)
1.0.18 (2014-05-04)
1.0.17 (2014-04-20)
1.0.16 (2014-04-19 23:29)
1.0.15 (2014-04-19 20:19)
1.0.14 (2014-04-19 12:52)
1.0.13 (2014-04-19 11:06)
1.0.12 (2014-04-18 16:58)
1.0.11 (2014-04-18 08:18)
1.0.10 (2014-04-17)
1.0.9 (2014-04-12)
1.0.8 (2014-04-11)
1.0.7 (2014-04-10)
1.0.6 (2014-04-07)
1.0.5 (2014-03-31)
1.0.4 (2014-03-29)
1.0.3 (2014-03-19)
1.0.2 (2014-03-12)
1.0.1 (2014-03-07)
1.0.0 (2014-03-05)
Wiki Tutorials
Package Dependencies
Deps | Name |
---|---|
catkin_virtualenv | |
dynamic_reconfigure | |
jsk_data | |
speech_recognition_msgs | |
catkin | |
audio_capture | |
audio_common_msgs | |
sound_play | |
rostest | |
roslaunch |
System Dependencies
Dependant Packages
Name | Deps |
---|---|
jsk_3rdparty |
Launch files
- launch/speech_recognition.launch
-
- launch_sound_play [default: true] — Launch sound_play node to speak
- launch_audio_capture [default: true] — Launch audio_capture node to publish audio topic from microphone
- audio_topic [default: /audio] — Name of audio topic captured from microphone
- voice_topic [default: /speech_to_text] — Name of text topic of recognized speech
- n_channel [default: 1] — Number of channels of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- depth [default: 16] — Bit depth of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- sample_rate [default: 16000] — Frame rate of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- device [default: ] — Card and device number of microphone (e.g. hw:0,0). you can check card number and device number by '$ arecord -l', then uses hw:[card number],[device number]
- engine [default: Google] — Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM
- language [default: en-US] — Speech to text language. For Japanese, set ja-JP.
- continuous [default: true] — If false, /speech_recognition service is published. If true, /speech_to_text topic is published.
- auto_start [default: true] — Whether speech_recognition starts automatically or not. This parameter works when continuous is true
- self_cancellation [default: true] — Do not recognize the audio when robot is speaking or not.
- tts_tolerance [default: 1.0] — Tolerance second for recognizing whether robot is speaking or not
- tts_action_names [default: ['sound_play']] — tts action name. these servers outputs are ignored by sound_recognition
- launch/parrotry.launch
-
- use_google [default: true]
- language [default: en-US]
- confidence_threshold [default: 0.8]
Messages
Services
Plugins
Recent questions tagged ros_speech_recognition at Robotics Stack Exchange
|
Package Summary
Tags | No category tags. |
Version | 2.1.28 |
License | BSD |
Build type | CATKIN |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://github.com/jsk-ros-pkg/jsk_3rdparty.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2024-07-10 |
Dev Status | DEVELOPED |
CI status | Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Yuki Furuta
Authors
- Yuki Furuta
ros_speech_recognition
A ROS package for speech-to-text services.
This package uses Python package SpeechRecognition as a backend.
Tutorials
Normal tutorial
- Install this package and SpeechReconition
sudo apt install ros-${ROS_DISTRO}-ros-speech-recognition
- Launch speech recognition node
roslaunch ros_speech_recognition speech_recognition.launch
- Echo
/speech_to_text
rostopic echo /speech_to_text
# you can get the recognition result
Parrotry tutorial
Parrotry mean オウム返し in Japanese
# english
roslaunch ros_speech_recognition parrotry.launch
# japanese
roslaunch ros_speech_recognition parrotry.launch language:=ja-JP
speech_recognition_node.py
Interface
Publishing Topics
-
~voice_topic
(speech_recognition_msgs/SpeechRecognitionCandidates
)Speech recognition candidates topic name.
Topic name is set by parameter
~voice_topic
, and default value isspeech_to_text
. -
sound_play
(sound_play/SoundRequestAction
)Action client to play sound on events. If the action server is not available or
~enable_sound_effect
isFalse
, no sound is played.
Subscribing Topics
-
~audio_topic
(audio_common_msgs/AudioData
)Audio stream data to be recognized.
Topis name is set by parameter
~audio_topic
and default value isaudio
.
Advertising Services
-
speech_recognition
(speech_recognition_msgs/SpeechRecognition
)Service for speech recognition
-
speech_recognition/start
(std_srvs/Empty
)Start service for speech recognition
This service is available when parameter
~contiunous
isTrue
. -
speech_recognition/start
(std_srvs/Empty
)Stop service for speech recognition
This service is available when parameter
~contiunous
isTrue
.
Parameters
-
~voice_topic
(String
, default:speech_to_text
)Publishing voice topic name
-
~audio_topic
(String
, default:audio
)Subscribing audio topic name
-
~enable_sound_effect
(Bool
, default:True
)Flag to enable or disable sound to play sound on recognition.
-
~language
(String
, default:en-US
)Language to be recognized
-
~engine
(Enum[String]
, default:Google
)Speech-to-text engine (To see full options use
dynamic_reconfigure
) -
~energy_threshold
(Double
, default:300
)Threshold for Voice activity detection
-
~dynamic_energy_threshold
(Bool
, default:True
)Adaptive estimation for
energy_threshold
-
~dynamic_energy_adjustment_damping
(Double
, default:0.15
)Damping threshold for dynamic VAD
-
~dynamic_energy_ratio
(Double
, default:1.5
)Energy ratio for dynamic VAD
-
~pause_threshold
(Double
, default:0.8
)Seconds of non-speaking audio before a phrase is considered complete
-
~operation_timeout
(Double
, default:0.0
)Seconds after an internal operation (e.g., an API request) starts before it times out
-
~listen_timeout
(Double
, default:0.0
)The maximum number of seconds that this will wait for a phrase to start before giving up
-
~phrase_time_limit
(Double
, default:10.0
)The maximum number of seconds that this will allow a phrase to continue before stopping and returning the part of the phrase processed before the time limit was reached
-
~phrase_threshold
(Double
, default:0.3
)Minimum seconds of speaking audio before we consider the speaking audio a phrase
-
~non_speaking_duration
(Double
, default:0.5
)Seconds of non-speaking audio to keep on both sides of the recording
-
~duration
(Double
, default:10.0
)Seconds of waiting for speech
-
~depth
(Int
, default:16
)Depth of audio signal
-
~n_channel
(Int
, default:1
)Total number of channels in audio data (e.g. 1: mono, 2: stereo)
-
~sample_rate
(Int
, default:16000
)Sample rate of audio signal
-
~buffer_size
(Int
, default:10240
)Maximum buffer size to store audio data for speech recognition
-
~start_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/bell.ogg
)Path to sound file for bell on the start of audio caption
-
~recognized_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/message.ogg
)Path to sound file for bell on the end of audio caption
-
~success_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/message-new-instant.ogg
)Path to sound file for bell on getting successful recognition result
-
~timeout_signal
(String
, default:/usr/share/sounds/freedesktop/stereo/network-connectivity-lost.ogg
)Path to sound file for bell on timeout for recognition
-
~continuous
(Bool
, default: False)Selecting to use topic or service. By default, service is used.
-
~auto_start
(Bool
, default: True)Starting the speech recognition when launching.
-
~self_cancellation
(Bool
, default:True
)Whether the node recognize the sound heard when
~tts_action_names
is running or not.This options is for ignoring self voice sounds from recognition.
-
~tts_action_names
(List[String]
, default:['sound_play']
)Text-to-speech action name for self cancellation.
The node ignores the voice heard when these Text-to-speech action is running.
-
~tts_tolerance
(Float
, default:1.0
)Tolerance seconds for self cancellation.
The node ignores the voice with this tolerance seconds after
~tts_action_names
finish running. -
~google_key
(String
, default:None
)Auth Key for Google API. If
None
, use public key. (No guarantee to be blocked.)
This is valid only if~engine
isGoogle
. -
~google_cloud_credentials_json
(String
, default:None
)Path to credential json file. For JSK users, you can download from Google Drive link. This is valid only if
~engine
isGoogleCloud
. -
~google_cloud_preferred_phrases
([String]
, default:None
)Preferred phrases parameters. This is valid only if
~engine
isGoogleCloud
. -
~bing_key
(String
, default:None
)Auth key for Bing API.
This is valid only if~engine
isbing
. -
~vosk_model_path
(String
, default:None
)Path to trainded model for Vosk API. This is valid only if
~engine
isVosk
.If
en-US
orja
is selected as~language
, you do not need to specify the path. To load other models, please download them from Model list.
Author
Yuki Furuta «furushchev@jsk.imi.i.u-tokyo.ac.jp»
Changelog for package ros_speech_recognition
2.1.28 (2023-07-24)
2.1.27 (2023-06-24)
- fix package.xml/CMakeLists.txt to supress catkin_lint errors (#479)
- Contributors: Kei Okada
2.1.26 (2023-06-14)
- add LICENSE files (#476)
- Contributors: Kei Okada
2.1.25 (2023-06-08)
- [ros_speech_recognition] Add vosk engine (#474)
- Pr/use sound themes freedesktop (#472)
- add test to check if ros node is loadable (#463)
- add self.conf_thresh in __init_ function (#457)
- [ros_speech_recognition] add ubuntu-sounds dependency (#453)
- [ros_speech_recognition] Return if result is empty (#443)
- [ros_speece_recognition] Set confidence value of google (#434)
- [ros_speech_recognition] add parrotry.launch (#414)
- [ros_ speech_recognition] update default arg for speech_recognition.launch (#412)
- [ros_speech_recogniton, respeaker_ros] add confidence field (#411)
- [ros_speech_recognition] add self cancellation for speech recogntion (#413)
- [#405 and #410] Fix CI (#415)
- add ROS interface for https://cloud.google.com/natural-language (#304)
- GithubAction: add test for aarch64(melodic) / indigo (arm64)
(#365)
- pgm_learner/respeaker_ros/ros_speech_recognition/rosping: increase time-limit/wait-time
- Explicit python interpreter in catkin_virtualenv (#367)
- .github/workflow: integrate all yaml to one (#338)
- [ros_speech_recognition] Fixed the behavior of launch file (#336)
- [ros_speech_recognition] add auto_start in speech_recognition_node.py (#301)
- [ros_speech_recognition] add SpeechRecognitionCandidatesToString node (#303)
- Enable sound play flag (#315)
- Contributors: Aiko Ichikura, Aoi Nakane, Kei Okada, Koki Shinjo, Naoto Tsukamoto, Naoya Yamaguchi, Shingo Kitagawa, Yoshiki Obinata, Iory Yanokura
2.1.24 (2021-07-26)
2.1.23 (2021-07-21)
2.1.22 (2021-06-10)
- enable to change topic name from speech_recognition.launch (#254)
- support SpeakerDiarization, see
https://cloud.google.com/speech-to-text/docs/reference/rest/v1/speech/recognize#SpeechRecognitionAlternative
(#244)
- [ros_speech_recognition] Add doc to speech_recognition.launch add doc to args, and we need to use rosparm for device, not param. because 'device: ' causes load_parameters: unable to set parameters (last param was [/speech_recognition/depth=16]): cannot marshal None unless allow_none is enabled error
- more exception message for self.recognize
- Use PYTHON_INTERPRETER python3 in ros_speech_recognition (#225)
- Contributors: Kei Okada, Naoya Yamaguchi, Shingo Kitagawa
2.1.21 (2020-08-19)
- add missing packages, closes https://github.com/ros/rosdistro/pull/26216 (#211)
- Contributors: Kei Okada
2.1.20 (2020-08-07)
2.1.19 (2020-07-21)
- Fixed issue #201 as requested, see https://github.com/jsk-ros-pkg/jsk_3rdparty/pull/202
- Contributors: MrMarshy
2.1.18 (2020-07-20)
- Fix for noetic
(#200)
- fix 2to3, with print, raise, exception
- [ros_speech_recognition] Enable multi channel audio recognition
(#198)
- adjust type code to the CPU platform
- replace rosparam name: channels -> n_channel
- add rosparam description to README
- enable multi channel audio recognition
- Add args to ros_speech_recognition
(#197)
- Add flac as run_depend for SpeechRecognition pip package
- Use catkin_virtualenv to use SpeechRecognition pip package
- Add arguments and params to pass rostest
- Add test for ros_speech_recognition
- add args to launch
- add pip install to tutorials
- add param description to README
- Contributors: Kei Okada, Naoya Yamaguchi
2.1.17 (2020-04-16)
2.1.16 (2020-04-16)
2.1.15 (2019-12-12)
2.1.14 (2019-11-21)
- set SoundRequest.volume for kinetic (#173)
- Contributors: Kei Okada
2.1.13 (2019-07-10)
2.1.12 (2019-05-25)
- fixes GoogleCloud auth (#158)
- Contributors: jonasius
2.1.11 (2018-08-29)
2.1.10 (2018-04-25)
2.1.9 (2018-04-24)
2.1.8 (2018-04-17)
2.1.7 (2018-04-09)
2.1.6 (2017-11-21)
2.1.5 (2017-11-20)
- ros_speech_recognition: add continuous mode (#127)
- ros_speech_recognition: add README (#123)
- add ros_speech_recognition package (#121)
- Contributors: Yuki Furuta
2.1.4 (2017-07-16)
2.1.3 (2017-07-07)
2.1.2 (2017-07-06)
2.1.1 (2017-07-05)
2.1.0 (2017-07-02)
2.0.20 (2017-05-09)
2.0.19 (2017-02-22)
2.0.18 (2016-10-28)
2.0.17 (2016-10-22)
2.0.16 (2016-10-17)
2.0.15 (2016-10-16)
2.0.14 (2016-03-20)
2.0.13 (2015-12-15)
2.0.12 (2015-11-26)
2.0.11 (2015-10-07 14:16)
2.0.10 (2015-10-07 12:47)
2.0.9 (2015-09-26)
2.0.8 (2015-09-15)
2.0.7 (2015-09-14)
2.0.6 (2015-09-08)
2.0.5 (2015-08-23)
2.0.4 (2015-08-18)
2.0.3 (2015-08-01)
2.0.2 (2015-06-29)
2.0.1 (2015-06-19 21:21)
2.0.0 (2015-06-19 10:41)
1.0.71 (2015-05-17)
1.0.70 (2015-05-08)
1.0.69 (2015-05-05 12:28)
1.0.68 (2015-05-05 09:49)
1.0.67 (2015-05-03)
1.0.66 (2015-04-03)
1.0.65 (2015-04-02)
1.0.64 (2015-03-29)
1.0.63 (2015-02-19)
1.0.62 (2015-02-17)
1.0.61 (2015-02-11)
1.0.60 (2015-02-03 10:12)
1.0.59 (2015-02-03 04:05)
1.0.58 (2015-01-07)
1.0.57 (2014-12-23)
1.0.56 (2014-12-17)
1.0.55 (2014-12-09)
1.0.54 (2014-11-15)
1.0.53 (2014-11-01)
1.0.52 (2014-10-23)
1.0.51 (2014-10-20 16:01)
1.0.50 (2014-10-20 01:50)
1.0.49 (2014-10-13)
1.0.48 (2014-10-12)
1.0.47 (2014-10-08)
1.0.46 (2014-10-03)
1.0.45 (2014-09-29)
1.0.44 (2014-09-26 09:17)
1.0.43 (2014-09-26 01:08)
1.0.42 (2014-09-25)
1.0.41 (2014-09-23)
1.0.40 (2014-09-19)
1.0.39 (2014-09-17)
1.0.38 (2014-09-13)
1.0.37 (2014-09-08)
1.0.36 (2014-09-01)
1.0.35 (2014-08-16)
1.0.34 (2014-08-14)
1.0.33 (2014-07-28)
1.0.32 (2014-07-26)
1.0.31 (2014-07-23)
1.0.30 (2014-07-15)
1.0.29 (2014-07-02)
1.0.28 (2014-06-24)
1.0.27 (2014-06-10)
1.0.26 (2014-05-30)
1.0.25 (2014-05-26)
1.0.24 (2014-05-24)
1.0.23 (2014-05-23)
1.0.22 (2014-05-22)
1.0.21 (2014-05-20)
1.0.20 (2014-05-09)
1.0.19 (2014-05-06)
1.0.18 (2014-05-04)
1.0.17 (2014-04-20)
1.0.16 (2014-04-19 23:29)
1.0.15 (2014-04-19 20:19)
1.0.14 (2014-04-19 12:52)
1.0.13 (2014-04-19 11:06)
1.0.12 (2014-04-18 16:58)
1.0.11 (2014-04-18 08:18)
1.0.10 (2014-04-17)
1.0.9 (2014-04-12)
1.0.8 (2014-04-11)
1.0.7 (2014-04-10)
1.0.6 (2014-04-07)
1.0.5 (2014-03-31)
1.0.4 (2014-03-29)
1.0.3 (2014-03-19)
1.0.2 (2014-03-12)
1.0.1 (2014-03-07)
1.0.0 (2014-03-05)
Wiki Tutorials
Package Dependencies
Deps | Name |
---|---|
catkin_virtualenv | |
dynamic_reconfigure | |
jsk_data | |
speech_recognition_msgs | |
catkin | |
audio_capture | |
audio_common_msgs | |
sound_play | |
rostest | |
roslaunch |
System Dependencies
Dependant Packages
Name | Deps |
---|---|
jsk_3rdparty |
Launch files
- launch/speech_recognition.launch
-
- launch_sound_play [default: true] — Launch sound_play node to speak
- launch_audio_capture [default: true] — Launch audio_capture node to publish audio topic from microphone
- audio_topic [default: /audio] — Name of audio topic captured from microphone
- voice_topic [default: /speech_to_text] — Name of text topic of recognized speech
- n_channel [default: 1] — Number of channels of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- depth [default: 16] — Bit depth of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- sample_rate [default: 16000] — Frame rate of audio topic and microphone. '$ pactl list short sinks' to check your hardware
- device [default: ] — Card and device number of microphone (e.g. hw:0,0). you can check card number and device number by '$ arecord -l', then uses hw:[card number],[device number]
- engine [default: Google] — Speech to text engine. TTS engine, Google, GoogleCloud, Sphinx, Wit, Bing Houndify, IBM
- language [default: en-US] — Speech to text language. For Japanese, set ja-JP.
- continuous [default: true] — If false, /speech_recognition service is published. If true, /speech_to_text topic is published.
- auto_start [default: true] — Whether speech_recognition starts automatically or not. This parameter works when continuous is true
- self_cancellation [default: true] — Do not recognize the audio when robot is speaking or not.
- tts_tolerance [default: 1.0] — Tolerance second for recognizing whether robot is speaking or not
- tts_action_names [default: ['sound_play']] — tts action name. these servers outputs are ignored by sound_recognition
- launch/parrotry.launch
-
- use_google [default: true]
- language [default: en-US]
- confidence_threshold [default: 0.8]