|
Package Summary
Tags | No category tags. |
Version | 1.2.15 |
License | BSD |
Build type | CATKIN |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://github.com/jsk-ros-pkg/jsk_recognition.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2022-12-14 |
Dev Status | DEVELOPED |
CI status | No Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Naoya Yamaguchi
Authors
Sound Classification
ROS package to classify sound stream.
Contents
Setup
-
Install ROS. Available OS:
- Ubuntu 16.04 (?)
- Ubuntu 18.04
Create workspace
mkdir ~/sound_classification_ws/src -p
cd ~/sound_classification_ws/src
git clone https://github.com/jsk-ros-pkg/jsk_recognition.git
rosdep install --from-paths . --ignore-src -y -r
cd ..
catkin build sound_classification
source ~/sound_classification_ws/devel/setup.bash
- Install other packages.
- cuda and cupy are needed for chainer. See installation guide of JSK
- Using GPU is highly recommended.
Usage
- Check and specify your microphone parameters.
- In particular,
device
,n_channel
,bitdepth
andmic_sampling_rate
need to be known. - The example bash commands to get these params are below:
- In particular,
# For device. In this example, card 0 and device 0, so device:="hw:0,0"
$ arecord -l
\**** List of CAPTURE Hardware Devices ****
card 0: PCH [HDA Intel PCH], device 0: ALC293 Analog [ALC293 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
# For n_channel, bitdepth and sample_rate,
# Note that sources means input (e.g. microphone) and sinks means output (e.g. speaker)
$ pactl list short sources
1 alsa_input.pci-0000_00_1f.3.analog-stereo module-alsa-card.c s16le 2ch 44100Hz SUSPENDED
- Pass these params to each launch file as arguments when launching (e.g., `device:=hw:0,0 n_channel:=2 bitdepth:=16 mic_sampling_rate:=44100`).
- If you use `/audio` topic from other computer and do not want to publish `/audio`, set `use_microphone:=false` at each launch file when launching.
- Save environmental noise to
train_data/noise.npy
.- By subtracting noise, spectrograms become clear.
- During this script, you must not give any sound to the sensor.
- You should update noise data everytime before sound recognition, because environmental sound differs everytime.
- 30 noise samples are enough.
$ roslaunch sound_classification save_noise.launch
- Publish audio -> spectrum -> spectrogram topics.
- You can set the max/min frequency to be included in the spectrum by
high_cut_freq
/low_cut_freq
args inaudio_to_spectrogram.launch
. - If
gui:=true
, spectrum and spectrogram are visualized.
- You can set the max/min frequency to be included in the spectrum by
$ roslaunch sound_classification audio_to_spectrogram.launch gui:=true
- Here is an example spectrogram at quiet environment.
- Horiozntal axis is time [Hz]
- Vertical axis is frequency [Hz]
|Spectrogram w/o noise subtraction|Spectrogram w/ noise subtraction|
|---|---|
|||
- Collect spectrogram you would like to classify.
- When the volume exceeds the `threshold`, save the spectrogram at `train_data/original_spectrogram/TARGET_CLASS`.
- You can use rosbag and stream as sound sources.
1. Rosbag version (Recommended)
- I recommend to use rosbag to collect spectrograms. The rosbag makes it easy to use `save_sound.launch` with several parameters.
- In `target_class:=TARGET_CLASS`, you can set the class name of your target sound.
- By using `use_rosbag:=true` and `filename:=PATH_TO_ROSBAG`, you can save spectrograms from rosbag.
- By default, rosbag is paused at first. Press 'Space' key on terminal to start playing rosbag. When rosbag ends, press 'Ctrl-c' to terminate.
- The newly saved spectrograms are appended to existing spectrograms.
- You can change threshold of sound saving by `threshold:=xxx`. The smaller the value is, the more easily sound is saved.
# Save audio to rosbag
$ roslaunch sound_classification record_audio_rosbag.launch filename:=PATH_TO_ROSBAG
# play rosbag and collecting data
$ export ROS_MASTER_URI=http://localhost:11311
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=PATH_TO_ROSBAG target_class:=TARGET_CLASS threshold:=0.5
- By setting `threshold:=0` and `save_when_sound:=false`, you can collect spectrogram of "no sound".
# play rosbag and collecting no-sound data
$ export ROS_MASTER_URI=http://localhost:11311
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=PATH_TO_ROSBAG target_class:=no_sound threshold:=0 save_when_sound:=false
1. Stream version (Not Recommended)
- You can collect spectrogram directly from audio topic stream.
- Do not use `use_rosbag:=true`. The other args are the same as the rosbag version. Please see above.
$ roslaunch sound_classification save_sound.launch \
save_when_sound:=true target_class:=TARGET_CLASS threshold:=0.5 save_data_rate:=5
- Create dateaset for chainer from saved spectrograms.
- Some data augmentation is executed.
-
--number 30
means to use maximum 30 images for each class in dataset.
$ rosrun sound_classification create_dataset.py --number 30
- Visualize dataset.
- You can use
train
arg for train dataset (augmented dataset),test
arg for test dataset. - The spectrograms in the dataset are visualized in random order.
- You can use
$ rosrun sound_classification visualize_dataset.py test # train/test
- Train with dataset.
- Default model is
NIN
(Recommended). - If you use
vgg16
, pretrained weights of VGG16 is downloaded toscripts/VGG_ILSVRC_16_layers.npz
at the first time you run this script.
- Default model is
$ rosrun sound_classification train.py --epoch 30
- Classify sounds.
- It takes a few seconds for the neural network weights to be loaded.
-
use_rosbag:=true
andfilename:=PATH_TO_ROSBAG
is available if you classify sound with rosbag.
$ roslaunch sound_classification classify_sound.launch
- You can fix class names' color in classification result image by specifying order of class names like below:
<rosparam>
target_names: [none, other, chip_bag]
</rosparam>
- Example classification result:
|no_sound|applause|voice|
|---|---|---|
||||
Quick demo
Sound classification demo with your laptop's built-in microphone. You can create dataset from rosbag files in sample_rosbag/
directory.
Classification example gif
Commands
$ roslaunch sound_classification save_noise.launch
- Collect spectrograms from sample rosbags. Press 'Space' to start rosbag.
- For no_sound class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/no_sound.bag \
target_class:=no_sound threshold:=0 save_when_sound:=false
- For applause class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/applause.bag \
target_class:=applause threshold:=0.5
- For voice class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/voice.bag \
target_class:=voice threshold:=0.5
- Create dataset
$ rosrun sound_classification create_dataset.py --number 30
- Train (takes ~10 minites)
$ rosrun sound_classification train.py --epoch 20
- Classify sound
$ roslaunch sound_classification classify_sound.launch
Wiki Tutorials
Source Tutorials
Package Dependencies
System Dependencies
Dependant Packages
Launch files
- launch/audio_to_spectrogram.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- gui [default: false]
- pause_rosbag [default: true]
- launch/record_audio_rosbag.launch
-
- filename
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_microphone [default: true]
- launch/save_noise.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gui [default: true]
- save_data_rate [default: 10]
- launch/classify_sound.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gpu [default: 0]
- gui [default: true]
- launch/save_sound.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gui [default: true]
- save_data_rate [default: 5]
- target_class [default: ]
- save_when_sound [default: true]
- threshold [default: 0.5]
Messages
Services
Plugins
Recent questions tagged sound_classification at answers.ros.org
|
Package Summary
Tags | No category tags. |
Version | 1.2.15 |
License | BSD |
Build type | CATKIN |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://github.com/jsk-ros-pkg/jsk_recognition.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2022-12-14 |
Dev Status | DEVELOPED |
CI status | No Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Naoya Yamaguchi
Authors
Sound Classification
ROS package to classify sound stream.
Contents
Setup
-
Install ROS. Available OS:
- Ubuntu 16.04 (?)
- Ubuntu 18.04
Create workspace
mkdir ~/sound_classification_ws/src -p
cd ~/sound_classification_ws/src
git clone https://github.com/jsk-ros-pkg/jsk_recognition.git
rosdep install --from-paths . --ignore-src -y -r
cd ..
catkin build sound_classification
source ~/sound_classification_ws/devel/setup.bash
- Install other packages.
- cuda and cupy are needed for chainer. See installation guide of JSK
- Using GPU is highly recommended.
Usage
- Check and specify your microphone parameters.
- In particular,
device
,n_channel
,bitdepth
andmic_sampling_rate
need to be known. - The example bash commands to get these params are below:
- In particular,
# For device. In this example, card 0 and device 0, so device:="hw:0,0"
$ arecord -l
\**** List of CAPTURE Hardware Devices ****
card 0: PCH [HDA Intel PCH], device 0: ALC293 Analog [ALC293 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
# For n_channel, bitdepth and sample_rate,
# Note that sources means input (e.g. microphone) and sinks means output (e.g. speaker)
$ pactl list short sources
1 alsa_input.pci-0000_00_1f.3.analog-stereo module-alsa-card.c s16le 2ch 44100Hz SUSPENDED
- Pass these params to each launch file as arguments when launching (e.g., `device:=hw:0,0 n_channel:=2 bitdepth:=16 mic_sampling_rate:=44100`).
- If you use `/audio` topic from other computer and do not want to publish `/audio`, set `use_microphone:=false` at each launch file when launching.
- Save environmental noise to
train_data/noise.npy
.- By subtracting noise, spectrograms become clear.
- During this script, you must not give any sound to the sensor.
- You should update noise data everytime before sound recognition, because environmental sound differs everytime.
- 30 noise samples are enough.
$ roslaunch sound_classification save_noise.launch
- Publish audio -> spectrum -> spectrogram topics.
- You can set the max/min frequency to be included in the spectrum by
high_cut_freq
/low_cut_freq
args inaudio_to_spectrogram.launch
. - If
gui:=true
, spectrum and spectrogram are visualized.
- You can set the max/min frequency to be included in the spectrum by
$ roslaunch sound_classification audio_to_spectrogram.launch gui:=true
- Here is an example spectrogram at quiet environment.
- Horiozntal axis is time [Hz]
- Vertical axis is frequency [Hz]
|Spectrogram w/o noise subtraction|Spectrogram w/ noise subtraction|
|---|---|
|||
- Collect spectrogram you would like to classify.
- When the volume exceeds the `threshold`, save the spectrogram at `train_data/original_spectrogram/TARGET_CLASS`.
- You can use rosbag and stream as sound sources.
1. Rosbag version (Recommended)
- I recommend to use rosbag to collect spectrograms. The rosbag makes it easy to use `save_sound.launch` with several parameters.
- In `target_class:=TARGET_CLASS`, you can set the class name of your target sound.
- By using `use_rosbag:=true` and `filename:=PATH_TO_ROSBAG`, you can save spectrograms from rosbag.
- By default, rosbag is paused at first. Press 'Space' key on terminal to start playing rosbag. When rosbag ends, press 'Ctrl-c' to terminate.
- The newly saved spectrograms are appended to existing spectrograms.
- You can change threshold of sound saving by `threshold:=xxx`. The smaller the value is, the more easily sound is saved.
# Save audio to rosbag
$ roslaunch sound_classification record_audio_rosbag.launch filename:=PATH_TO_ROSBAG
# play rosbag and collecting data
$ export ROS_MASTER_URI=http://localhost:11311
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=PATH_TO_ROSBAG target_class:=TARGET_CLASS threshold:=0.5
- By setting `threshold:=0` and `save_when_sound:=false`, you can collect spectrogram of "no sound".
# play rosbag and collecting no-sound data
$ export ROS_MASTER_URI=http://localhost:11311
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=PATH_TO_ROSBAG target_class:=no_sound threshold:=0 save_when_sound:=false
1. Stream version (Not Recommended)
- You can collect spectrogram directly from audio topic stream.
- Do not use `use_rosbag:=true`. The other args are the same as the rosbag version. Please see above.
$ roslaunch sound_classification save_sound.launch \
save_when_sound:=true target_class:=TARGET_CLASS threshold:=0.5 save_data_rate:=5
- Create dateaset for chainer from saved spectrograms.
- Some data augmentation is executed.
-
--number 30
means to use maximum 30 images for each class in dataset.
$ rosrun sound_classification create_dataset.py --number 30
- Visualize dataset.
- You can use
train
arg for train dataset (augmented dataset),test
arg for test dataset. - The spectrograms in the dataset are visualized in random order.
- You can use
$ rosrun sound_classification visualize_dataset.py test # train/test
- Train with dataset.
- Default model is
NIN
(Recommended). - If you use
vgg16
, pretrained weights of VGG16 is downloaded toscripts/VGG_ILSVRC_16_layers.npz
at the first time you run this script.
- Default model is
$ rosrun sound_classification train.py --epoch 30
- Classify sounds.
- It takes a few seconds for the neural network weights to be loaded.
-
use_rosbag:=true
andfilename:=PATH_TO_ROSBAG
is available if you classify sound with rosbag.
$ roslaunch sound_classification classify_sound.launch
- You can fix class names' color in classification result image by specifying order of class names like below:
<rosparam>
target_names: [none, other, chip_bag]
</rosparam>
- Example classification result:
|no_sound|applause|voice|
|---|---|---|
||||
Quick demo
Sound classification demo with your laptop's built-in microphone. You can create dataset from rosbag files in sample_rosbag/
directory.
Classification example gif
Commands
$ roslaunch sound_classification save_noise.launch
- Collect spectrograms from sample rosbags. Press 'Space' to start rosbag.
- For no_sound class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/no_sound.bag \
target_class:=no_sound threshold:=0 save_when_sound:=false
- For applause class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/applause.bag \
target_class:=applause threshold:=0.5
- For voice class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/voice.bag \
target_class:=voice threshold:=0.5
- Create dataset
$ rosrun sound_classification create_dataset.py --number 30
- Train (takes ~10 minites)
$ rosrun sound_classification train.py --epoch 20
- Classify sound
$ roslaunch sound_classification classify_sound.launch
Wiki Tutorials
Source Tutorials
Package Dependencies
System Dependencies
Dependant Packages
Launch files
- launch/audio_to_spectrogram.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- gui [default: false]
- pause_rosbag [default: true]
- launch/record_audio_rosbag.launch
-
- filename
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_microphone [default: true]
- launch/save_noise.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gui [default: true]
- save_data_rate [default: 10]
- launch/classify_sound.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gpu [default: 0]
- gui [default: true]
- launch/save_sound.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gui [default: true]
- save_data_rate [default: 5]
- target_class [default: ]
- save_when_sound [default: true]
- threshold [default: 0.5]
Messages
Services
Plugins
Recent questions tagged sound_classification at answers.ros.org
|
Package Summary
Tags | No category tags. |
Version | 1.2.15 |
License | BSD |
Build type | CATKIN |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://github.com/jsk-ros-pkg/jsk_recognition.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2022-12-14 |
Dev Status | DEVELOPED |
CI status | No Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Naoya Yamaguchi
Authors
Sound Classification
ROS package to classify sound stream.
Contents
Setup
-
Install ROS. Available OS:
- Ubuntu 16.04 (?)
- Ubuntu 18.04
Create workspace
mkdir ~/sound_classification_ws/src -p
cd ~/sound_classification_ws/src
git clone https://github.com/jsk-ros-pkg/jsk_recognition.git
rosdep install --from-paths . --ignore-src -y -r
cd ..
catkin build sound_classification
source ~/sound_classification_ws/devel/setup.bash
- Install other packages.
- cuda and cupy are needed for chainer. See installation guide of JSK
- Using GPU is highly recommended.
Usage
- Check and specify your microphone parameters.
- In particular,
device
,n_channel
,bitdepth
andmic_sampling_rate
need to be known. - The example bash commands to get these params are below:
- In particular,
# For device. In this example, card 0 and device 0, so device:="hw:0,0"
$ arecord -l
\**** List of CAPTURE Hardware Devices ****
card 0: PCH [HDA Intel PCH], device 0: ALC293 Analog [ALC293 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
# For n_channel, bitdepth and sample_rate,
# Note that sources means input (e.g. microphone) and sinks means output (e.g. speaker)
$ pactl list short sources
1 alsa_input.pci-0000_00_1f.3.analog-stereo module-alsa-card.c s16le 2ch 44100Hz SUSPENDED
- Pass these params to each launch file as arguments when launching (e.g., `device:=hw:0,0 n_channel:=2 bitdepth:=16 mic_sampling_rate:=44100`).
- If you use `/audio` topic from other computer and do not want to publish `/audio`, set `use_microphone:=false` at each launch file when launching.
- Save environmental noise to
train_data/noise.npy
.- By subtracting noise, spectrograms become clear.
- During this script, you must not give any sound to the sensor.
- You should update noise data everytime before sound recognition, because environmental sound differs everytime.
- 30 noise samples are enough.
$ roslaunch sound_classification save_noise.launch
- Publish audio -> spectrum -> spectrogram topics.
- You can set the max/min frequency to be included in the spectrum by
high_cut_freq
/low_cut_freq
args inaudio_to_spectrogram.launch
. - If
gui:=true
, spectrum and spectrogram are visualized.
- You can set the max/min frequency to be included in the spectrum by
$ roslaunch sound_classification audio_to_spectrogram.launch gui:=true
- Here is an example spectrogram at quiet environment.
- Horiozntal axis is time [Hz]
- Vertical axis is frequency [Hz]
|Spectrogram w/o noise subtraction|Spectrogram w/ noise subtraction|
|---|---|
|||
- Collect spectrogram you would like to classify.
- When the volume exceeds the `threshold`, save the spectrogram at `train_data/original_spectrogram/TARGET_CLASS`.
- You can use rosbag and stream as sound sources.
1. Rosbag version (Recommended)
- I recommend to use rosbag to collect spectrograms. The rosbag makes it easy to use `save_sound.launch` with several parameters.
- In `target_class:=TARGET_CLASS`, you can set the class name of your target sound.
- By using `use_rosbag:=true` and `filename:=PATH_TO_ROSBAG`, you can save spectrograms from rosbag.
- By default, rosbag is paused at first. Press 'Space' key on terminal to start playing rosbag. When rosbag ends, press 'Ctrl-c' to terminate.
- The newly saved spectrograms are appended to existing spectrograms.
- You can change threshold of sound saving by `threshold:=xxx`. The smaller the value is, the more easily sound is saved.
# Save audio to rosbag
$ roslaunch sound_classification record_audio_rosbag.launch filename:=PATH_TO_ROSBAG
# play rosbag and collecting data
$ export ROS_MASTER_URI=http://localhost:11311
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=PATH_TO_ROSBAG target_class:=TARGET_CLASS threshold:=0.5
- By setting `threshold:=0` and `save_when_sound:=false`, you can collect spectrogram of "no sound".
# play rosbag and collecting no-sound data
$ export ROS_MASTER_URI=http://localhost:11311
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=PATH_TO_ROSBAG target_class:=no_sound threshold:=0 save_when_sound:=false
1. Stream version (Not Recommended)
- You can collect spectrogram directly from audio topic stream.
- Do not use `use_rosbag:=true`. The other args are the same as the rosbag version. Please see above.
$ roslaunch sound_classification save_sound.launch \
save_when_sound:=true target_class:=TARGET_CLASS threshold:=0.5 save_data_rate:=5
- Create dateaset for chainer from saved spectrograms.
- Some data augmentation is executed.
-
--number 30
means to use maximum 30 images for each class in dataset.
$ rosrun sound_classification create_dataset.py --number 30
- Visualize dataset.
- You can use
train
arg for train dataset (augmented dataset),test
arg for test dataset. - The spectrograms in the dataset are visualized in random order.
- You can use
$ rosrun sound_classification visualize_dataset.py test # train/test
- Train with dataset.
- Default model is
NIN
(Recommended). - If you use
vgg16
, pretrained weights of VGG16 is downloaded toscripts/VGG_ILSVRC_16_layers.npz
at the first time you run this script.
- Default model is
$ rosrun sound_classification train.py --epoch 30
- Classify sounds.
- It takes a few seconds for the neural network weights to be loaded.
-
use_rosbag:=true
andfilename:=PATH_TO_ROSBAG
is available if you classify sound with rosbag.
$ roslaunch sound_classification classify_sound.launch
- You can fix class names' color in classification result image by specifying order of class names like below:
<rosparam>
target_names: [none, other, chip_bag]
</rosparam>
- Example classification result:
|no_sound|applause|voice|
|---|---|---|
||||
Quick demo
Sound classification demo with your laptop's built-in microphone. You can create dataset from rosbag files in sample_rosbag/
directory.
Classification example gif
Commands
$ roslaunch sound_classification save_noise.launch
- Collect spectrograms from sample rosbags. Press 'Space' to start rosbag.
- For no_sound class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/no_sound.bag \
target_class:=no_sound threshold:=0 save_when_sound:=false
- For applause class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/applause.bag \
target_class:=applause threshold:=0.5
- For voice class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/voice.bag \
target_class:=voice threshold:=0.5
- Create dataset
$ rosrun sound_classification create_dataset.py --number 30
- Train (takes ~10 minites)
$ rosrun sound_classification train.py --epoch 20
- Classify sound
$ roslaunch sound_classification classify_sound.launch
Wiki Tutorials
Source Tutorials
Package Dependencies
System Dependencies
Dependant Packages
Launch files
- launch/audio_to_spectrogram.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- gui [default: false]
- pause_rosbag [default: true]
- launch/record_audio_rosbag.launch
-
- filename
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_microphone [default: true]
- launch/save_noise.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gui [default: true]
- save_data_rate [default: 10]
- launch/classify_sound.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gpu [default: 0]
- gui [default: true]
- launch/save_sound.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gui [default: true]
- save_data_rate [default: 5]
- target_class [default: ]
- save_when_sound [default: true]
- threshold [default: 0.5]
Messages
Services
Plugins
Recent questions tagged sound_classification at answers.ros.org
|
Package Summary
Tags | No category tags. |
Version | 1.2.15 |
License | BSD |
Build type | CATKIN |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://github.com/jsk-ros-pkg/jsk_recognition.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2022-12-14 |
Dev Status | DEVELOPED |
CI status | No Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Naoya Yamaguchi
Authors
Sound Classification
ROS package to classify sound stream.
Contents
Setup
-
Install ROS. Available OS:
- Ubuntu 16.04 (?)
- Ubuntu 18.04
Create workspace
mkdir ~/sound_classification_ws/src -p
cd ~/sound_classification_ws/src
git clone https://github.com/jsk-ros-pkg/jsk_recognition.git
rosdep install --from-paths . --ignore-src -y -r
cd ..
catkin build sound_classification
source ~/sound_classification_ws/devel/setup.bash
- Install other packages.
- cuda and cupy are needed for chainer. See installation guide of JSK
- Using GPU is highly recommended.
Usage
- Check and specify your microphone parameters.
- In particular,
device
,n_channel
,bitdepth
andmic_sampling_rate
need to be known. - The example bash commands to get these params are below:
- In particular,
# For device. In this example, card 0 and device 0, so device:="hw:0,0"
$ arecord -l
\**** List of CAPTURE Hardware Devices ****
card 0: PCH [HDA Intel PCH], device 0: ALC293 Analog [ALC293 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
# For n_channel, bitdepth and sample_rate,
# Note that sources means input (e.g. microphone) and sinks means output (e.g. speaker)
$ pactl list short sources
1 alsa_input.pci-0000_00_1f.3.analog-stereo module-alsa-card.c s16le 2ch 44100Hz SUSPENDED
- Pass these params to each launch file as arguments when launching (e.g., `device:=hw:0,0 n_channel:=2 bitdepth:=16 mic_sampling_rate:=44100`).
- If you use `/audio` topic from other computer and do not want to publish `/audio`, set `use_microphone:=false` at each launch file when launching.
- Save environmental noise to
train_data/noise.npy
.- By subtracting noise, spectrograms become clear.
- During this script, you must not give any sound to the sensor.
- You should update noise data everytime before sound recognition, because environmental sound differs everytime.
- 30 noise samples are enough.
$ roslaunch sound_classification save_noise.launch
- Publish audio -> spectrum -> spectrogram topics.
- You can set the max/min frequency to be included in the spectrum by
high_cut_freq
/low_cut_freq
args inaudio_to_spectrogram.launch
. - If
gui:=true
, spectrum and spectrogram are visualized.
- You can set the max/min frequency to be included in the spectrum by
$ roslaunch sound_classification audio_to_spectrogram.launch gui:=true
- Here is an example spectrogram at quiet environment.
- Horiozntal axis is time [Hz]
- Vertical axis is frequency [Hz]
|Spectrogram w/o noise subtraction|Spectrogram w/ noise subtraction|
|---|---|
|||
- Collect spectrogram you would like to classify.
- When the volume exceeds the `threshold`, save the spectrogram at `train_data/original_spectrogram/TARGET_CLASS`.
- You can use rosbag and stream as sound sources.
1. Rosbag version (Recommended)
- I recommend to use rosbag to collect spectrograms. The rosbag makes it easy to use `save_sound.launch` with several parameters.
- In `target_class:=TARGET_CLASS`, you can set the class name of your target sound.
- By using `use_rosbag:=true` and `filename:=PATH_TO_ROSBAG`, you can save spectrograms from rosbag.
- By default, rosbag is paused at first. Press 'Space' key on terminal to start playing rosbag. When rosbag ends, press 'Ctrl-c' to terminate.
- The newly saved spectrograms are appended to existing spectrograms.
- You can change threshold of sound saving by `threshold:=xxx`. The smaller the value is, the more easily sound is saved.
# Save audio to rosbag
$ roslaunch sound_classification record_audio_rosbag.launch filename:=PATH_TO_ROSBAG
# play rosbag and collecting data
$ export ROS_MASTER_URI=http://localhost:11311
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=PATH_TO_ROSBAG target_class:=TARGET_CLASS threshold:=0.5
- By setting `threshold:=0` and `save_when_sound:=false`, you can collect spectrogram of "no sound".
# play rosbag and collecting no-sound data
$ export ROS_MASTER_URI=http://localhost:11311
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=PATH_TO_ROSBAG target_class:=no_sound threshold:=0 save_when_sound:=false
1. Stream version (Not Recommended)
- You can collect spectrogram directly from audio topic stream.
- Do not use `use_rosbag:=true`. The other args are the same as the rosbag version. Please see above.
$ roslaunch sound_classification save_sound.launch \
save_when_sound:=true target_class:=TARGET_CLASS threshold:=0.5 save_data_rate:=5
- Create dateaset for chainer from saved spectrograms.
- Some data augmentation is executed.
-
--number 30
means to use maximum 30 images for each class in dataset.
$ rosrun sound_classification create_dataset.py --number 30
- Visualize dataset.
- You can use
train
arg for train dataset (augmented dataset),test
arg for test dataset. - The spectrograms in the dataset are visualized in random order.
- You can use
$ rosrun sound_classification visualize_dataset.py test # train/test
- Train with dataset.
- Default model is
NIN
(Recommended). - If you use
vgg16
, pretrained weights of VGG16 is downloaded toscripts/VGG_ILSVRC_16_layers.npz
at the first time you run this script.
- Default model is
$ rosrun sound_classification train.py --epoch 30
- Classify sounds.
- It takes a few seconds for the neural network weights to be loaded.
-
use_rosbag:=true
andfilename:=PATH_TO_ROSBAG
is available if you classify sound with rosbag.
$ roslaunch sound_classification classify_sound.launch
- You can fix class names' color in classification result image by specifying order of class names like below:
<rosparam>
target_names: [none, other, chip_bag]
</rosparam>
- Example classification result:
|no_sound|applause|voice|
|---|---|---|
||||
Quick demo
Sound classification demo with your laptop's built-in microphone. You can create dataset from rosbag files in sample_rosbag/
directory.
Classification example gif
Commands
$ roslaunch sound_classification save_noise.launch
- Collect spectrograms from sample rosbags. Press 'Space' to start rosbag.
- For no_sound class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/no_sound.bag \
target_class:=no_sound threshold:=0 save_when_sound:=false
- For applause class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/applause.bag \
target_class:=applause threshold:=0.5
- For voice class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/voice.bag \
target_class:=voice threshold:=0.5
- Create dataset
$ rosrun sound_classification create_dataset.py --number 30
- Train (takes ~10 minites)
$ rosrun sound_classification train.py --epoch 20
- Classify sound
$ roslaunch sound_classification classify_sound.launch
Wiki Tutorials
Source Tutorials
Package Dependencies
System Dependencies
Dependant Packages
Name | Repo | Deps |
---|---|---|
jsk_panda_teleop | github-jsk-ros-pkg-jsk_robot |
Launch files
- launch/audio_to_spectrogram.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- gui [default: false]
- pause_rosbag [default: true]
- launch/record_audio_rosbag.launch
-
- filename
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_microphone [default: true]
- launch/save_noise.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gui [default: true]
- save_data_rate [default: 10]
- launch/classify_sound.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gpu [default: 0]
- gui [default: true]
- launch/save_sound.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gui [default: true]
- save_data_rate [default: 5]
- target_class [default: ]
- save_when_sound [default: true]
- threshold [default: 0.5]
Messages
Services
Plugins
Recent questions tagged sound_classification at answers.ros.org
|
Package Summary
Tags | No category tags. |
Version | 1.2.15 |
License | BSD |
Build type | CATKIN |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://github.com/jsk-ros-pkg/jsk_recognition.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2022-12-14 |
Dev Status | DEVELOPED |
CI status | No Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Naoya Yamaguchi
Authors
Sound Classification
ROS package to classify sound stream.
Contents
Setup
-
Install ROS. Available OS:
- Ubuntu 16.04 (?)
- Ubuntu 18.04
Create workspace
mkdir ~/sound_classification_ws/src -p
cd ~/sound_classification_ws/src
git clone https://github.com/jsk-ros-pkg/jsk_recognition.git
rosdep install --from-paths . --ignore-src -y -r
cd ..
catkin build sound_classification
source ~/sound_classification_ws/devel/setup.bash
- Install other packages.
- cuda and cupy are needed for chainer. See installation guide of JSK
- Using GPU is highly recommended.
Usage
- Check and specify your microphone parameters.
- In particular,
device
,n_channel
,bitdepth
andmic_sampling_rate
need to be known. - The example bash commands to get these params are below:
- In particular,
# For device. In this example, card 0 and device 0, so device:="hw:0,0"
$ arecord -l
\**** List of CAPTURE Hardware Devices ****
card 0: PCH [HDA Intel PCH], device 0: ALC293 Analog [ALC293 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
# For n_channel, bitdepth and sample_rate,
# Note that sources means input (e.g. microphone) and sinks means output (e.g. speaker)
$ pactl list short sources
1 alsa_input.pci-0000_00_1f.3.analog-stereo module-alsa-card.c s16le 2ch 44100Hz SUSPENDED
- Pass these params to each launch file as arguments when launching (e.g., `device:=hw:0,0 n_channel:=2 bitdepth:=16 mic_sampling_rate:=44100`).
- If you use `/audio` topic from other computer and do not want to publish `/audio`, set `use_microphone:=false` at each launch file when launching.
- Save environmental noise to
train_data/noise.npy
.- By subtracting noise, spectrograms become clear.
- During this script, you must not give any sound to the sensor.
- You should update noise data everytime before sound recognition, because environmental sound differs everytime.
- 30 noise samples are enough.
$ roslaunch sound_classification save_noise.launch
- Publish audio -> spectrum -> spectrogram topics.
- You can set the max/min frequency to be included in the spectrum by
high_cut_freq
/low_cut_freq
args inaudio_to_spectrogram.launch
. - If
gui:=true
, spectrum and spectrogram are visualized.
- You can set the max/min frequency to be included in the spectrum by
$ roslaunch sound_classification audio_to_spectrogram.launch gui:=true
- Here is an example spectrogram at quiet environment.
- Horiozntal axis is time [Hz]
- Vertical axis is frequency [Hz]
|Spectrogram w/o noise subtraction|Spectrogram w/ noise subtraction|
|---|---|
|||
- Collect spectrogram you would like to classify.
- When the volume exceeds the `threshold`, save the spectrogram at `train_data/original_spectrogram/TARGET_CLASS`.
- You can use rosbag and stream as sound sources.
1. Rosbag version (Recommended)
- I recommend to use rosbag to collect spectrograms. The rosbag makes it easy to use `save_sound.launch` with several parameters.
- In `target_class:=TARGET_CLASS`, you can set the class name of your target sound.
- By using `use_rosbag:=true` and `filename:=PATH_TO_ROSBAG`, you can save spectrograms from rosbag.
- By default, rosbag is paused at first. Press 'Space' key on terminal to start playing rosbag. When rosbag ends, press 'Ctrl-c' to terminate.
- The newly saved spectrograms are appended to existing spectrograms.
- You can change threshold of sound saving by `threshold:=xxx`. The smaller the value is, the more easily sound is saved.
# Save audio to rosbag
$ roslaunch sound_classification record_audio_rosbag.launch filename:=PATH_TO_ROSBAG
# play rosbag and collecting data
$ export ROS_MASTER_URI=http://localhost:11311
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=PATH_TO_ROSBAG target_class:=TARGET_CLASS threshold:=0.5
- By setting `threshold:=0` and `save_when_sound:=false`, you can collect spectrogram of "no sound".
# play rosbag and collecting no-sound data
$ export ROS_MASTER_URI=http://localhost:11311
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=PATH_TO_ROSBAG target_class:=no_sound threshold:=0 save_when_sound:=false
1. Stream version (Not Recommended)
- You can collect spectrogram directly from audio topic stream.
- Do not use `use_rosbag:=true`. The other args are the same as the rosbag version. Please see above.
$ roslaunch sound_classification save_sound.launch \
save_when_sound:=true target_class:=TARGET_CLASS threshold:=0.5 save_data_rate:=5
- Create dateaset for chainer from saved spectrograms.
- Some data augmentation is executed.
-
--number 30
means to use maximum 30 images for each class in dataset.
$ rosrun sound_classification create_dataset.py --number 30
- Visualize dataset.
- You can use
train
arg for train dataset (augmented dataset),test
arg for test dataset. - The spectrograms in the dataset are visualized in random order.
- You can use
$ rosrun sound_classification visualize_dataset.py test # train/test
- Train with dataset.
- Default model is
NIN
(Recommended). - If you use
vgg16
, pretrained weights of VGG16 is downloaded toscripts/VGG_ILSVRC_16_layers.npz
at the first time you run this script.
- Default model is
$ rosrun sound_classification train.py --epoch 30
- Classify sounds.
- It takes a few seconds for the neural network weights to be loaded.
-
use_rosbag:=true
andfilename:=PATH_TO_ROSBAG
is available if you classify sound with rosbag.
$ roslaunch sound_classification classify_sound.launch
- You can fix class names' color in classification result image by specifying order of class names like below:
<rosparam>
target_names: [none, other, chip_bag]
</rosparam>
- Example classification result:
|no_sound|applause|voice|
|---|---|---|
||||
Quick demo
Sound classification demo with your laptop's built-in microphone. You can create dataset from rosbag files in sample_rosbag/
directory.
Classification example gif
Commands
$ roslaunch sound_classification save_noise.launch
- Collect spectrograms from sample rosbags. Press 'Space' to start rosbag.
- For no_sound class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/no_sound.bag \
target_class:=no_sound threshold:=0 save_when_sound:=false
- For applause class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/applause.bag \
target_class:=applause threshold:=0.5
- For voice class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/voice.bag \
target_class:=voice threshold:=0.5
- Create dataset
$ rosrun sound_classification create_dataset.py --number 30
- Train (takes ~10 minites)
$ rosrun sound_classification train.py --epoch 20
- Classify sound
$ roslaunch sound_classification classify_sound.launch
Wiki Tutorials
Source Tutorials
Package Dependencies
System Dependencies
Dependant Packages
Name | Repo | Deps |
---|---|---|
jsk_panda_teleop | github-jsk-ros-pkg-jsk_robot |
Launch files
- launch/audio_to_spectrogram.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- gui [default: false]
- pause_rosbag [default: true]
- launch/record_audio_rosbag.launch
-
- filename
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_microphone [default: true]
- launch/save_noise.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gui [default: true]
- save_data_rate [default: 10]
- launch/classify_sound.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gpu [default: 0]
- gui [default: true]
- launch/save_sound.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gui [default: true]
- save_data_rate [default: 5]
- target_class [default: ]
- save_when_sound [default: true]
- threshold [default: 0.5]
Messages
Services
Plugins
Recent questions tagged sound_classification at answers.ros.org
|
Package Summary
Tags | No category tags. |
Version | 1.2.15 |
License | BSD |
Build type | CATKIN |
Use | RECOMMENDED |
Repository Summary
Checkout URI | https://github.com/jsk-ros-pkg/jsk_recognition.git |
VCS Type | git |
VCS Version | master |
Last Updated | 2022-12-14 |
Dev Status | DEVELOPED |
CI status | No Continuous Integration |
Released | RELEASED |
Tags | No category tags. |
Contributing |
Help Wanted (0)
Good First Issues (0) Pull Requests to Review (0) |
Package Description
Additional Links
Maintainers
- Naoya Yamaguchi
Authors
Sound Classification
ROS package to classify sound stream.
Contents
Setup
-
Install ROS. Available OS:
- Ubuntu 16.04 (?)
- Ubuntu 18.04
Create workspace
mkdir ~/sound_classification_ws/src -p
cd ~/sound_classification_ws/src
git clone https://github.com/jsk-ros-pkg/jsk_recognition.git
rosdep install --from-paths . --ignore-src -y -r
cd ..
catkin build sound_classification
source ~/sound_classification_ws/devel/setup.bash
- Install other packages.
- cuda and cupy are needed for chainer. See installation guide of JSK
- Using GPU is highly recommended.
Usage
- Check and specify your microphone parameters.
- In particular,
device
,n_channel
,bitdepth
andmic_sampling_rate
need to be known. - The example bash commands to get these params are below:
- In particular,
# For device. In this example, card 0 and device 0, so device:="hw:0,0"
$ arecord -l
\**** List of CAPTURE Hardware Devices ****
card 0: PCH [HDA Intel PCH], device 0: ALC293 Analog [ALC293 Analog]
Subdevices: 1/1
Subdevice #0: subdevice #0
# For n_channel, bitdepth and sample_rate,
# Note that sources means input (e.g. microphone) and sinks means output (e.g. speaker)
$ pactl list short sources
1 alsa_input.pci-0000_00_1f.3.analog-stereo module-alsa-card.c s16le 2ch 44100Hz SUSPENDED
- Pass these params to each launch file as arguments when launching (e.g., `device:=hw:0,0 n_channel:=2 bitdepth:=16 mic_sampling_rate:=44100`).
- If you use `/audio` topic from other computer and do not want to publish `/audio`, set `use_microphone:=false` at each launch file when launching.
- Save environmental noise to
train_data/noise.npy
.- By subtracting noise, spectrograms become clear.
- During this script, you must not give any sound to the sensor.
- You should update noise data everytime before sound recognition, because environmental sound differs everytime.
- 30 noise samples are enough.
$ roslaunch sound_classification save_noise.launch
- Publish audio -> spectrum -> spectrogram topics.
- You can set the max/min frequency to be included in the spectrum by
high_cut_freq
/low_cut_freq
args inaudio_to_spectrogram.launch
. - If
gui:=true
, spectrum and spectrogram are visualized.
- You can set the max/min frequency to be included in the spectrum by
$ roslaunch sound_classification audio_to_spectrogram.launch gui:=true
- Here is an example spectrogram at quiet environment.
- Horiozntal axis is time [Hz]
- Vertical axis is frequency [Hz]
|Spectrogram w/o noise subtraction|Spectrogram w/ noise subtraction|
|---|---|
|||
- Collect spectrogram you would like to classify.
- When the volume exceeds the `threshold`, save the spectrogram at `train_data/original_spectrogram/TARGET_CLASS`.
- You can use rosbag and stream as sound sources.
1. Rosbag version (Recommended)
- I recommend to use rosbag to collect spectrograms. The rosbag makes it easy to use `save_sound.launch` with several parameters.
- In `target_class:=TARGET_CLASS`, you can set the class name of your target sound.
- By using `use_rosbag:=true` and `filename:=PATH_TO_ROSBAG`, you can save spectrograms from rosbag.
- By default, rosbag is paused at first. Press 'Space' key on terminal to start playing rosbag. When rosbag ends, press 'Ctrl-c' to terminate.
- The newly saved spectrograms are appended to existing spectrograms.
- You can change threshold of sound saving by `threshold:=xxx`. The smaller the value is, the more easily sound is saved.
# Save audio to rosbag
$ roslaunch sound_classification record_audio_rosbag.launch filename:=PATH_TO_ROSBAG
# play rosbag and collecting data
$ export ROS_MASTER_URI=http://localhost:11311
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=PATH_TO_ROSBAG target_class:=TARGET_CLASS threshold:=0.5
- By setting `threshold:=0` and `save_when_sound:=false`, you can collect spectrogram of "no sound".
# play rosbag and collecting no-sound data
$ export ROS_MASTER_URI=http://localhost:11311
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=PATH_TO_ROSBAG target_class:=no_sound threshold:=0 save_when_sound:=false
1. Stream version (Not Recommended)
- You can collect spectrogram directly from audio topic stream.
- Do not use `use_rosbag:=true`. The other args are the same as the rosbag version. Please see above.
$ roslaunch sound_classification save_sound.launch \
save_when_sound:=true target_class:=TARGET_CLASS threshold:=0.5 save_data_rate:=5
- Create dateaset for chainer from saved spectrograms.
- Some data augmentation is executed.
-
--number 30
means to use maximum 30 images for each class in dataset.
$ rosrun sound_classification create_dataset.py --number 30
- Visualize dataset.
- You can use
train
arg for train dataset (augmented dataset),test
arg for test dataset. - The spectrograms in the dataset are visualized in random order.
- You can use
$ rosrun sound_classification visualize_dataset.py test # train/test
- Train with dataset.
- Default model is
NIN
(Recommended). - If you use
vgg16
, pretrained weights of VGG16 is downloaded toscripts/VGG_ILSVRC_16_layers.npz
at the first time you run this script.
- Default model is
$ rosrun sound_classification train.py --epoch 30
- Classify sounds.
- It takes a few seconds for the neural network weights to be loaded.
-
use_rosbag:=true
andfilename:=PATH_TO_ROSBAG
is available if you classify sound with rosbag.
$ roslaunch sound_classification classify_sound.launch
- You can fix class names' color in classification result image by specifying order of class names like below:
<rosparam>
target_names: [none, other, chip_bag]
</rosparam>
- Example classification result:
|no_sound|applause|voice|
|---|---|---|
||||
Quick demo
Sound classification demo with your laptop's built-in microphone. You can create dataset from rosbag files in sample_rosbag/
directory.
Classification example gif
Commands
$ roslaunch sound_classification save_noise.launch
- Collect spectrograms from sample rosbags. Press 'Space' to start rosbag.
- For no_sound class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/no_sound.bag \
target_class:=no_sound threshold:=0 save_when_sound:=false
- For applause class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/applause.bag \
target_class:=applause threshold:=0.5
- For voice class
$ roslaunch sound_classification save_sound.launch use_rosbag:=true \
filename:=$(rospack find sound_classification)/sample_rosbag/voice.bag \
target_class:=voice threshold:=0.5
- Create dataset
$ rosrun sound_classification create_dataset.py --number 30
- Train (takes ~10 minites)
$ rosrun sound_classification train.py --epoch 20
- Classify sound
$ roslaunch sound_classification classify_sound.launch
Wiki Tutorials
Source Tutorials
Package Dependencies
System Dependencies
Dependant Packages
Launch files
- launch/audio_to_spectrogram.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- gui [default: false]
- pause_rosbag [default: true]
- launch/record_audio_rosbag.launch
-
- filename
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_microphone [default: true]
- launch/save_noise.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gui [default: true]
- save_data_rate [default: 10]
- launch/classify_sound.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gpu [default: 0]
- gui [default: true]
- launch/save_sound.launch
-
- device [default: hw:0,0]
- n_channel [default: 2]
- bitdepth [default: 16]
- mic_sampling_rate [default: 44100]
- use_rosbag [default: false]
- filename [default: /]
- use_microphone [default: true]
- high_cut_freq [default: 8000]
- low_cut_freq [default: 1]
- spectrogram_period [default: 1]
- pause_rosbag [default: true]
- gui [default: true]
- save_data_rate [default: 5]
- target_class [default: ]
- save_when_sound [default: true]
- threshold [default: 0.5]