tpl-spot-sequential¶

This template runs two wake word models in sequence. Use this to listen for a trigger phrase followed by a command, for example: "Voice genie, play music."

tpl-spot-sequential has task-type==phrasespot.

Expected task types:

Slot 0: phrasespot
Slot 1: phrasespot

tpl-spot-sequential-1.5.0.snsr

Operation¶

flowchart TD
  start((start))
  loop0{loop == 2?}
  start --> loop0
  loop0 -->|no| start0
  loop0 -->|yes| start1

  subgraph slot0[**slot 0** &lpar;phrasespot&rpar;]
    start0((start))
    fetch0[/samples from ->audio-pcm/]
    audio0(^sample-count)
    process0[process]
    stop0((stop))
    start0 --> fetch0
    fetch0 --> audio0
    audio0 --> process0
    process0 --> fetch0
    process0 -->|recognize| stop0
  end

  listenBegin(^listen-begin)
  stop0 --> listenBegin
  listenBegin --> start1

  subgraph slot1[**slot 1** &lpar;phrasespot&rpar;]
    start1((start))
    fetch1[/samples from ->audio-pcm/]
    audio1(^sample-count)
    process1[process]
    result1(^result)
    stop1((stop))
    loop{loop == 0?}
    loop2{loop == 2?}
    start1 --> fetch1
    fetch1 --> audio1
    audio1 --> process1
    process1 --> fetch1
    process1 --->|recognize| result1
    process1 -->|timeout| loop2
    loop2 -->|no| stop1
    loop2 -->|yes| fetch1
    result1 --> loop
    loop -->|no| fetch1
    loop -->|yes| stop1
  end

  listenEnd(^listen-end)
  stop1 --> listenEnd
  listenEnd --> start0

If loop == 2 skip to step 6.
Read audio data from ->audio-pcm.
Invoke ^sample-count.
Invoke ^result if processing detects a vocabulary phrase, else continue at step 2.
Invoke ^listen-begin, then start the wake word in slot 1.
Read audio data from ->audio-pcm.
Invoke ^sample-count.
If loop != 2 and processing does not detect a wake word within listen-window, invoke ^listen-end and restart the slot 0 wake word at step 2.
Invoke ^result if processing detects a vocabulary phrase, else continue at step 6.
If loop == 0 invoke ^listen-end and continue at step 2.
If loop != 0 reset the listen-window timeout and continue processing at step 6.
Continue processing until STREAM_END occurs on ->audio-pcm, or one of the event handlers returns a code other than OK.

Register callback handlers with setHandler only for those events you're interested in.

Settings¶

^listen-begin, ^listen-end, ^result, ^sample-count

operating-point-iterator, vocab-iterator

audio-stream, audio-stream-first, audio-stream-last

->audio-pcm, audio-stream-from, audio-stream-to, dsp-acmodel-stream, dsp-header-stream, dsp-search-stream

0, 1, audio-stream-size, delay, dsp-target, duration-ms, listen-window, loop, low-fr-operating-point, operating-point, samples-per-second, sv-threshold

phrasespot

live-spot.c, snsr-eval.c, PhraseSpot.java, segmentSpottedAudio.java

Notes¶

With loop == 0 (the default): This template runs the spotter in slot 0 until it spots, then runs slot 1 until it spots, or the listen-window timeout expires, then returns to the spotter in slot 0.

With loop == 1: This runs the spotter in slot 0 until it spots, then runs slot 1 until the listen-window timeout expires, then returns to the spotter in slot 0. It resets the expiration timer every time slot 1 recognizes.

7.6.0 With loop == 2: The template runs only slot 1. If your application needs to listen for a wake word but also support an external trigger, such as a push-to-talk button, set loop=2 when the event occurs.

The combined model is a wake word and can be used in any application that expects those without code changes.

Combined model settings refer to the model in slot 1, so operating-point refers to 1.operating-point. You can change settings for the wake word in slot 0 by prefixing the setting name with 0, for example: 0.operating-point.

The model invokes ^listen-begin just before audio focus switches to slot 1, and ^listen-end before audio focus switches back to slot 0. If there's no ^result between ^listen-begin and ^listen-end it is because the recognizer in slot 1 timed out.

Examples¶

% cd ~/Sensory/TrulyNaturalSDK/7.6.1

% bin/snsr-edit -o vg-music.snsr\
    -t model/tpl-spot-sequential-1.5.0.snsr\
    -f 0 model/spot-voicegenie-enUS-6.5.1-m.snsr\
    -f 1 model/spot-music-enUS-1.2.0-m.snsr

# say "voice genie, play music"
% bin/snsr-eval -vvt vg-music.snsr
Using live audio from default capture device. ^C to stop.
Using operating point 17.
Available operating points: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20.
Available vocabulary:
  1: "play_music"
  2: "previous_song"
  3: "stop_music"
  4: "next_song"
  5: "pause_music"
  3180 [^listen-begin]
phrase:
  3630   4410 (1 sv) play_music
words:
  3630   3900 (1 sv)
  3900   4410 (1 sv) play_music

  4635 [^listen-end]
^C