Skip to content

Adapting wake word

These are fixed wake word models that continuously adapt to speakers' voices to improve false-accept rates. They are drop-in replacements for fixed wake words.

Continuously adapting wake word models have task-type==phrasespot and filenames that by convention match ca-*.snsr

Adapting wake word models included in this distribution.

Operation

flowchart TD
    start((start))
    fetch[/samples from ->audio-pcm/]
    audio(^sample-count)
    process
    result(^result)
    adaptStarted(^adapt-started)
    adapted(^adapted)
    newUser(^new-user)
    start --> fetch
    fetch --> audio
    audio --> process
    process --> fetch
    process -->|recognize| result
    process -->|recognize w/ high SNR| adaptStarted
    adaptStarted --> result
    result --> fetch
    fetch -->|adapted| adapted
    adapted --> fetch
    adapted -->|new user identified| newUser
    newUser --> fetch
  1. Read audio data from ->audio-pcm.
  2. Invoke ^sample-count.
  3. Invoke ^adapt-started if processing detects a vocabulary phrase in a low-noise environment. This starts adapting the model to the speaker's voice on a background thread.
  4. Invoke ^result if processing detects a vocabulary phrase.
  5. Invoke ^adapted when the background thread has finished adding an enrollment.
    • Invoke ^new-user if adaptation detects a user it hasn't seen before.
  6. Continue processing until STREAM_END occurs on ->audio-pcm, or one of the event handlers returns a code other than OK.

Register callback handlers with setHandler only for those events you're interested in.

Settings

^adapt-started, ^adapted, ^new-user, ^result, ^sample-count

operating-point-iterator, user-iterator, vocab-iterator

audio-stream, audio-stream-first, audio-stream-last

->audio-pcm, audio-stream-from, audio-stream-to, delete-user, dsp-acmodel-stream, dsp-header-stream, dsp-search-stream, rename-user

audio-stream-size, cache-file, delay, dsp-target, duration-ms, listen-window, low-fr-operating-point, operating-point, samples-per-second, sv-threshold

phrasespot

live-spot.c, snsr-eval.c, PhraseSpot.java, segmentSpottedAudio.java