tpl-spot-vad-lvcsr tnl¶

This template runs the wake word in slot 0 until it detects, segments the audio following the wake word with a VAD, and sends the segmented audio to the LVCSR or STT recognizer in slot 1.

This behavior is also available in the tpl-opt-spot-vad-lvcsr template, which adds an option to skip the wake word.

tpl-spot-vad-lvcsr has task-type==phrasespot.

Expected task types:

Slot 0: phrasespot
Slot 1: lvcsr

tpl-spot-vad-lvcsr-3.20.0.snsr, tpl-opt-spot-vad-lvcsr

Operation¶

flowchart TD
  start((start))
  start --> startWW

  subgraph slot0[**slot 0** &lpar;phrasespot&rpar;]
    startWW((start))
    fetchWW[/samples from ->audio-pcm/]
    audioWW(^sample-count)
    processWW[process]
    result(0.^result)
    stopWW((stop))
    startWW --> fetchWW
    fetchWW --> audioWW
    audioWW --> processWW
    processWW --> fetchWW
    processWW -->|recognize| result
    result --> stopWW
  end

  subgraph slot1[**slot 1** &lpar;lvcsr&rpar;]
    startSTT((start))
    startSTTfinal((start))
    stopSTT((stop))
    stopSTTpartial((stop))
    processSTT[process]
    partialSTT(^result-partial)
    intentSTT(^nlu-intent)
    slotSTT(^nlu-slot)
    resultSTT(^result)
    nluSTT{NLU<br>match?}

    slmSTT{SLM<br>included?}
    generateSTT[generate]
    slmstartSTT(^slm-start)
    slmresultpartialSTT(^slm-result-partial)
    slmresultSTT(^slm-result)

    startSTT --> processSTT
    processSTT ---->|hypothesis| partialSTT
    partialSTT --> stopSTTpartial

    startSTTfinal --> nluSTT
    nluSTT -->|yes| intentSTT
    nluSTT -->|no| resultSTT
    intentSTT --> slotSTT
    slotSTT --> resultSTT
    slotSTT -->|more| intentSTT

    resultSTT --> slmSTT
    slmSTT -->|yes| slmstartSTT
    slmSTT -->|no| stopSTT
    slmstartSTT -->|OK| generateSTT
    slmstartSTT -->|STOP| stopSTT
    generateSTT -->|response| slmresultpartialSTT
    slmresultpartialSTT --> generateSTT
    generateSTT -->|done| slmresultSTT
    slmresultSTT --> stopSTT
  end

  listenBegin(^listen-begin)
  listenEnd(^listen-end)

  stopWW --> listenBegin
  listenBegin --> fetch0

  fetch0[/samples from ->audio-pcm/]
  fetch1[/samples from ->audio-pcm/]
  audio0(^sample-count)
  audio1(^sample-count)

  silence(^silence)
  begin(^begin)
  END(^end)
  limit(^limit)

  process0[VAD process]
  process1[VAD process]

  final@{ shape: f-circ }

  fetch0 --> audio0
  audio0 --> process0
  process0 --> fetch0
  process0 -->|speech start| begin
  process0 -->|timeout| silence
  silence ~~~ final
  silence --> listenEnd

  begin --> fetch1
  fetch1 --> audio1
  audio1 --> process1

  process1 --> startSTT
  stopSTTpartial --> fetch1

  process1 -->|speech end| END
  process1 -->|speech limit| limit
  END --> final
  limit --> final

  final --> startSTTfinal
  stopSTT --> listenEnd

  listenEnd --> startWW

Read audio data from ->audio-pcm.
Invoke ^sample-count.
If processing does not detect a wake word, continue at step 1.
Invoke 0.^result for the wake word.
Invoke ^listen-begin and start VAD processing.
Read audio data from ->audio-pcm.
Invoke ^sample-count.
If VAD processing does not detect the start of speech within the leading-silence timeout, invoke ^silence and continue at step 15.
Invoke ^begin if processing detects the start of speech, else continue at step 6.
Read audio date from ->audio-pcm.
Invoke ^sample-count.
If VAD processing detects an endpoint invoke either ^limit or ^end and continue at step 14.
Process VAD segmented audio in the LVCSR or STT recognizer
- Invoke ^result-partial with interim recognition result hypothesis.
- Continue at step 10.
Produce a final LVCSR or STT recognition hypothesis.
- Invoke ^nlu-intent and ^nlu-slot for each NLU intent found.
- Invoke ^result with the final recognition hypothesis.
- If there's no SLM, continue at step 15.
- Invoke ^slm-start, if the callback returns STOP, continue at step 15.
- Generate SLM result, invoking ^slm-result-partial on each generated token.
- Invoke ^slm-result with complete SLM result.
Invoke ^listen-end and start listening for the wake word again at step 1.

Register callback handlers with setHandler only for those events you're interested in.

Settings¶

^begin, ^end, ^limit, ^listen-begin, ^listen-end, ^nlu-intent, ^nlu-slot, ^result, ^result-partial, ^sample-count, ^silence, ^slm-result, ^slm-result-partial, ^slm-start

operating-point-iterator, vocab-iterator

audio-stream, audio-stream-first, audio-stream-last

->audio-pcm, audio-stream-from, audio-stream-to

audio-stream-size, audio-stream-size, backoff, custom-vocab, delay, duration-ms, hold-over, include-leading-silence, include-wake-word-audio, leading-silence, low-fr-operating-point, max-recording, operating-point, partial-result-interval, samples-per-second, stt-profile, sv-threshold

lvcsr, phrasespot

live-spot.c, snsr-eval.c, PhraseSpot.java

Notes¶

Use this template for command and control type applications where commands are initiated with a wake word.

The ^result-partial and ^result events are for the LVCSR or STT recognizer in slot 1. If you need direct access to the wake word result, prefix the event with the slot path: 0.^result Use the slot prefix to read values in the 0.^result event handler too, for example call getString with key 0.text to read the wake word transcription.

Set include-wake-word-audio= 1 to include include the wake word audio in the samples passed to the LVCSR or STT recognizer. STT hypotheses do not include the wake word text unless Sensory specifically configured the model to do so.

Examples¶

% cd ~/Sensory/TrulyNaturalSDK/7.6.1

% bin/snsr-edit -o vg-stt.snsr\
    -t model/tpl-spot-vad-lvcsr-3.20.0.snsr\
    -f 0 model/spot-voicegenie-enUS-6.5.1-m.snsr\
    -f 1 model/stt-enUS-automotive-medium-2.3.15-pnc.snsr\
    -s include-wake-word-audio=1

# Say "Voice genie, open the sunroof."
% snsr-eval -vt vg-stt.snsr
Using live audio from default capture device. ^C to stop.
P   2770   3250 (0.4166) Open the sun
P   2810   3650 (0.7161) Open the sunroof
  1815   3990 [^end] VAD speech region.
NLU intent: open_window (0.9956) = open the sunroof
NLU entity:   roof (0.9595) = sunroof
  2810   3690 (0.4394) Open the sunroof.
^C