VAD¶

Models of this type find speech segments in audio data streams.

Wake word models have task-type==vad and filenames that by convention match vad-*.snsr

VAD models included in this distribution.

Operation¶

flowchart TD
    start((start))
    fetch0[/samples from ->audio-pcm/]
    fetch1[/samples from ->audio-pcm/]
    audio0(^sample-count)
    audio1(^sample-count)

    silence(^silence)
    begin(^begin)
    END(^end)
    limit(^limit)

    process0[process]
    process1[process]
    out[\samples to <-audio-pcm\]

    final@{ shape: f-circ }

    start --> fetch0
    fetch0 --> audio0
    audio0 --> process0
    process0 --> fetch0
    process0 -->|speech start| begin
    process0 -->|timeout| silence
    silence --> final

    begin --> fetch1
    fetch1 --> audio1
    audio1 --> out
    out --> process1
    process1 --> fetch1
    process1 -->|speech end| END
    process1 -->|speech limit| limit
    END --> final
    limit --> final

    final --> fetch0

Read audio data from ->audio-pcm.
Invoke ^sample-count.
If speech detected within leading-silence ms continue at step 6.
If no speech detected within leading-silence ms, invoke ^silence and restart from step 1.
Continue processing at step 1 until STREAM_END.
Invoke ^begin.
Read audio data from ->audio-pcm.
Invoke ^sample-count.
If pass-through == 1 write speech samples to <-audio-pcm.
If end detected within max-recording ms, invoke ^end and restart from step 1.
If end not detected within max-recording ms, invoke ^limit and restart from step 1.
Continue processing at step 7 until STREAM_END.

Register callback handlers with setHandler only for those events you're interested in.

Settings¶

^begin, ^end, ^limit, ^sample-count, ^silence

none

audio-stream, audio-stream-first, audio-stream-last

->audio-pcm, <-audio-pcm, audio-stream-from, skip-to-ms, skip-to-sample

backoff, hold-over, include-leading-silence, leading-silence, max-recording, pass-through, trailing-silence

vad

live-segment.c, snsr-eval.c, segmentSpottedAudio.java