tpl-spot-sequential¶
This template runs two wake word models in sequence. Use this to listen for a trigger phrase followed by a command, for example: "Voice genie, play music."
tpl-spot-sequential has task-type==phrasespot.
Expected task types:
- Slot 0: phrasespot
- Slot 1: phrasespot
tpl-spot-sequential-1.5.0.snsr
Operation¶
flowchart TD
start((start))
loop0{loop == 2?}
start --> loop0
loop0 -->|no| start0
loop0 -->|yes| start1
subgraph slot0[**slot 0** (phrasespot)]
start0((start))
fetch0[/samples from ->audio-pcm/]
audio0(^sample-count)
process0[process]
stop0((stop))
start0 --> fetch0
fetch0 --> audio0
audio0 --> process0
process0 --> fetch0
process0 -->|recognize| stop0
end
listenBegin(^listen-begin)
stop0 --> listenBegin
listenBegin --> start1
subgraph slot1[**slot 1** (phrasespot)]
start1((start))
fetch1[/samples from ->audio-pcm/]
audio1(^sample-count)
process1[process]
result1(^result)
stop1((stop))
loop{loop == 0?}
loop2{loop == 2?}
start1 --> fetch1
fetch1 --> audio1
audio1 --> process1
process1 --> fetch1
process1 --->|recognize| result1
process1 -->|timeout| loop2
loop2 -->|no| stop1
loop2 -->|yes| fetch1
result1 --> loop
loop -->|no| fetch1
loop -->|yes| stop1
end
listenEnd(^listen-end)
stop1 --> listenEnd
listenEnd --> start0 - If loop
== 2skip to step 6. - Read audio data from ->audio-pcm.
- Invoke ^sample-count.
- Invoke ^result if processing detects a vocabulary phrase, else continue at step 2.
- Invoke ^listen-begin, then start the wake word in slot 1.
- Read audio data from ->audio-pcm.
- Invoke ^sample-count.
- If loop
!= 2and processing does not detect a wake word within listen-window, invoke ^listen-end and restart the slot 0 wake word at step 2. - Invoke ^result if processing detects a vocabulary phrase, else continue at step 6.
- If loop
== 0invoke ^listen-end and continue at step 2. - If loop
!= 0reset the listen-window timeout and continue processing at step 6. - Continue processing until STREAM_END occurs on ->audio-pcm, or one of the event handlers returns a code other than OK.
Register callback handlers with setHandler only for those events you're interested in.
Settings¶
^listen-begin, ^listen-end, ^result, ^sample-count
operating-point-iterator, vocab-iterator
audio-stream, audio-stream-first, audio-stream-last
->audio-pcm, audio-stream-from, audio-stream-to, dsp-acmodel-stream, dsp-header-stream, dsp-search-stream
0, 1, audio-stream-size, delay, dsp-target, duration-ms, listen-window, loop, low-fr-operating-point, operating-point, samples-per-second, sv-threshold
live-spot.c, snsr-eval.c, PhraseSpot.java, segmentSpottedAudio.java
Notes¶
With loop == 0 (the default): This template runs the spotter in slot 0 until it spots, then runs slot 1 until it spots, or the listen-window timeout expires, then returns to the spotter in slot 0.
With loop == 1: This runs the spotter in slot 0 until it spots, then runs slot 1 until the listen-window timeout expires, then returns to the spotter in slot 0. It resets the expiration timer every time slot 1 recognizes.
7.6.0 With loop == 2: The template runs only slot 1. If your application needs to listen for a wake word but also support an external trigger, such as a push-to-talk button, set loop=2 when the event occurs.
The combined model is a wake word and can be used in any application that expects those without code changes.
Combined model settings refer to the model in slot 1, so operating-point refers to 1.operating-point. You can change settings for the wake word in slot 0 by prefixing the setting name with 0, for example: 0.operating-point.
The model invokes ^listen-begin just before audio focus switches to slot 1, and ^listen-end before audio focus switches back to slot 0. If there's no ^result between ^listen-begin and ^listen-end it is because the recognizer in slot 1 timed out.
Examples¶
% cd ~/Sensory/TrulyNaturalSDK/7.6.1
% bin/snsr-edit -o vg-music.snsr\
-t model/tpl-spot-sequential-1.5.0.snsr\
-f 0 model/spot-voicegenie-enUS-6.5.1-m.snsr\
-f 1 model/spot-music-enUS-1.2.0-m.snsr
# say "voice genie, play music"
% bin/snsr-eval -vvt vg-music.snsr
Using live audio from default capture device. ^C to stop.
Using operating point 17.
Available operating points: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20.
Available vocabulary:
1: "play_music"
2: "previous_song"
3: "stop_music"
4: "next_song"
5: "pause_music"
3180 [^listen-begin]
phrase:
3630 4410 (1 sv) play_music
words:
3630 3900 (1 sv)
3900 4410 (1 sv) play_music
4635 [^listen-end]
^C