tpl-vad-lvcsr tnl¶
This template detects speech with a VAD and sends the segmented audio to the LVCSR or STT recognizer in slot 0.
tpl-vad-lvcsr has task-type==phrasespot.
Expected task types:
- Slot 0: lvcsr
tpl-vad-lvcsr-3.17.0.snsr, tpl-opt-spot-vad-lvcsr
Operation¶
flowchart TD
start((start))
start --> fetch0
subgraph slot0[**slot 0** (lvcsr)]
startSTT((start))
startSTTfinal((start))
stopSTT((stop))
stopSTTpartial((stop))
processSTT[process]
partialSTT(^result-partial)
intentSTT(^nlu-intent)
slotSTT(^nlu-slot)
resultSTT(^result)
nluSTT{NLU<br>match?}
slmSTT{SLM<br>included?}
generateSTT[generate]
slmstartSTT(^slm-start)
slmresultpartialSTT(^slm-result-partial)
slmresultSTT(^slm-result)
startSTT --> processSTT
processSTT ---->|hypothesis| partialSTT
partialSTT --> stopSTTpartial
startSTTfinal --> nluSTT
nluSTT -->|yes| intentSTT
nluSTT -->|no| resultSTT
intentSTT --> slotSTT
slotSTT --> resultSTT
slotSTT -->|more| intentSTT
resultSTT --> slmSTT
slmSTT -->|yes| slmstartSTT
slmSTT -->|no| stopSTT
slmstartSTT -->|OK| generateSTT
slmstartSTT -->|STOP| stopSTT
generateSTT -->|response| slmresultpartialSTT
slmresultpartialSTT --> generateSTT
generateSTT -->|done| slmresultSTT
slmresultSTT --> stopSTT
end
fetch0[/samples from ->audio-pcm/]
fetch1[/samples from ->audio-pcm/]
audio0(^sample-count)
audio1(^sample-count)
silence(^silence)
begin(^begin)
END(^end)
limit(^limit)
process0[VAD process]
process1[VAD process]
final@{ shape: f-circ }
listenEnd@{ shape: f-circ }
fetch0 --> audio0
audio0 --> process0
process0 --> fetch0
process0 -->|speech start| begin
process0 -->|timeout| silence
silence ~~~ final
silence --> listenEnd
begin --> fetch1
fetch1 --> audio1
audio1 --> process1
process1 --> startSTT
stopSTTpartial --> fetch1
process1 -->|speech end| END
process1 -->|speech limit| limit
END --> final
limit --> final
final --> startSTTfinal
stopSTT --> listenEnd
listenEnd ----> fetch0 - Read audio data from ->audio-pcm.
- Invoke ^sample-count.
- If VAD processing does not detect the start of speech within the leading-silence timeout, invoke ^silence and continue at step 1.
- Invoke ^begin if processing detects the start of speech, else continue at step 1.
- Read audio date from ->audio-pcm.
- Invoke ^sample-count.
- If VAD processing detects an endpoint invoke either ^limit or ^end and continue at step 9.
- Process VAD segmented audio in the LVCSR or STT recognizer
- Invoke ^result-partial with interim recognition result hypothesis.
- Continue at step 5.
- Produce a final LVCSR or STT recognition hypothesis.
- Invoke ^nlu-intent and ^nlu-slot for each NLU intent found.
- Invoke ^result with the final recognition hypothesis.
- If there's no SLM, continue at step 1.
- Invoke ^slm-start, if the callback returns STOP, continue at step 1.
- Generate SLM result, invoking ^slm-result-partial on each generated token.
- Invoke ^slm-result with complete SLM result.
- Continue at step 1.
Register callback handlers with setHandler only for those events you're interested in.
Settings¶
^begin, ^end, ^limit, ^nlu-intent, ^nlu-slot, ^result, ^result-partial, ^sample-count, ^silence, ^slm-result, ^slm-result-partial, ^slm-start
none
audio-stream, audio-stream-first, audio-stream-last
->audio-pcm, audio-stream-from, audio-stream-to
audio-stream-size, audio-stream-size, backoff, custom-vocab, hold-over, include-leading-silence, leading-silence, max-recording, partial-result-interval, samples-per-second, stt-profile
live-spot.c, snsr-eval.c, PhraseSpot.java
Notes¶
Use this template for command and control type applications where commands are initiated just by speaking.
Examples¶
% cd ~/Sensory/TrulyNaturalSDK/7.6.1
% bin/snsr-edit -o vad-stt.snsr\
-t model/tpl-vad-lvcsr-3.17.0.snsr\
-f 0 model/stt-enUS-automotive-medium-2.3.15-pnc.snsr
# Say, for example: "Turn the air conditioning up all the way"
% snsr-eval -t vad-stt.snsr
P 1000 1040 T
P 1000 1600 Turn the egg
P 1040 2040 Turn the air conditioner
P 1040 2320 Turn the air conditioning up
P 1040 2760 Turn the air conditioning up all the way
NLU intent: set_fan (0.9547) = turn the air conditioning up 100%
NLU entity: hvac (0.9744) = air conditioning
NLU entity: percentage_value (0.8963) = 100%
1040 2880 Turn the air conditioning up all the way.
^C