VAD¶
Models of this type find speech segments in audio data streams.
Wake word models have task-type==vad and filenames that by convention match vad-*.snsr
VAD models included in this distribution.
Operation¶
flowchart TD
start((start))
fetch0[/samples from ->audio-pcm/]
fetch1[/samples from ->audio-pcm/]
audio0(^sample-count)
audio1(^sample-count)
silence(^silence)
begin(^begin)
END(^end)
limit(^limit)
process0[process]
process1[process]
out[\samples to <-audio-pcm\]
final@{ shape: f-circ }
start --> fetch0
fetch0 --> audio0
audio0 --> process0
process0 --> fetch0
process0 -->|speech start| begin
process0 -->|timeout| silence
silence --> final
begin --> fetch1
fetch1 --> audio1
audio1 --> out
out --> process1
process1 --> fetch1
process1 -->|speech end| END
process1 -->|speech limit| limit
END --> final
limit --> final
final --> fetch0 - Read audio data from ->audio-pcm.
- Invoke ^sample-count.
- If speech detected within leading-silence ms continue at step 6.
- If no speech detected within leading-silence ms, invoke ^silence and restart from step 1.
- Continue processing at step 1 until STREAM_END.
- Invoke ^begin.
- Read audio data from ->audio-pcm.
- Invoke ^sample-count.
- If pass-through
== 1write speech samples to <-audio-pcm. - If end detected within max-recording ms, invoke ^end and restart from step 1.
- If end not detected within max-recording ms, invoke ^limit and restart from step 1.
- Continue processing at step 7 until STREAM_END.
Register callback handlers with setHandler only for those events you're interested in.
Settings¶
^begin, ^end, ^limit, ^sample-count, ^silence
none
audio-stream, audio-stream-first, audio-stream-last
->audio-pcm, <-audio-pcm, audio-stream-from, skip-to-ms, skip-to-sample
backoff, hold-over, include-leading-silence, leading-silence, max-recording, pass-through, trailing-silence