Skip to content

Wake word enrollment

These models provide user enrollment for EFT and UDT. They produce wake-word models.

Enrollment models have task-type==enroll and filenames that by convention match eft-*.snsr or udt-*.snsr

wake word enrollment models included in this distribution.

Operation

Wake word enrollment has two modes: interactive for live recordings, and offline for pre-recorded enrollment audio.

Interactive

With interactive = 1 enrollment tasks expect live audio and re-record enrollments that cannot be used.

flowchart TD
    start((start))
    fetch[/samples from ->audio-pcm/]
    audio(^sample-count)
    segment[segment audio]
    check[check audio quality]
    validate[check enrollment consistency]
    resume(^resume)
    next(^next)
    pause(^pause)
    pass(^pass)
    fail0(^fail)
    fail1(^fail)
    enrolled(^enrolled)
    progress(^progress)
    adapted(^adapted)
    done(^done)

    zeroCount[count←0]
    incrCount[count++]

    start --> next
    next --> zeroCount
    zeroCount --> resume
    resume --> fetch
    fetch --> audio
    audio --> segment
    segment --> fetch
    segment -->|endpoint| pause
    pause --> check
    check -->|good| pass
    pass ---> incrCount
    incrCount --> resume
    pass -->|count == required| validate
    validate -->|good| next
    validate -->|bad| fail1
    check -->|bad| fail0
    fail0 --> resume
    fail1 --> zeroCount

    next -->|user == NULL| enrolled
    enrolled --> enroll
    enroll --> progress
    progress --> enroll
    enroll --->|complete| adapted
    adapted --> done

    done ~~~ validate
  1. Invoke ^next.
    • If the callback set user to NULL, start model adaptation at step 8.
  2. Reset the enrollment count to 0. This tracks the number of usable enrollments for the current user or phrase.
  3. Invoke ^resume. Application should use this to restart audio recording.
  4. Make an audio recording and segment it with a VAD (UDT) or a wake word (EFT).
  5. Invoke ^pause. Application should pause audio recording.
  6. Check enrollment audio quality.
    • If good, invoke ^pass. If we have req-enroll enrollments, start validation at step 7, else start the next recording at step 3.
    • If bad, invoke ^fail and redo the current recording at step 3.
  7. Validate all the enrollment recordings, checking for consistency.
    • If good, start enrolling the next user or phrase at step 1.
    • If bad, invoke ^fail and restart at step 2.
  8. When all users / phrases are available, invoke ^enrolled.
  9. Train a new recognizer with the enrollments
  10. Invoke ^adapted.
  11. Invoke ^done.

live-enroll.c, enrollUDT.java, Enroll.java

Offline

With interactive = 0 enrollment tasks expect pre-recorded audio and fails if any of the enrollments cannot be used.

flowchart TD
    start((start))
    fetch[/samples from ->audio-pcm/]
    audio(^sample-count)
    segment[segment audio]
    check[check audio quality]
    validate[check enrollment consistency]
    next(^next)
    pass(^pass)
    fail0(^fail)
    fail1(^fail)
    enrolled(^enrolled)
    progress(^progress)
    adapted(^adapted)
    done(^done)

    skip[discard enrollment]
    skipBad[discard bad enrollment]
    zeroCount[count←0]
    incrCount[count++]

    user0{user == NULL?}
    user1{user == NULL?}

    start --> user0
    user0 -->|yes| next
    user0 -->|no| user1
    next --> user1

    user1 -->|no| zeroCount
    zeroCount --> fetch
    fetch --> audio
    audio --> segment
    segment --> fetch
    segment -->|endpoint| check
    check --->|good| pass
    pass --> incrCount
    incrCount --> fetch
    validate --->|bad| fail1
    fail1 --> skipBad
    skipBad --> validate
    validate --> user1
    check -->|bad| fail0
    fail0 --> skip
    skip --> fetch
    fetch -->|STREAM_END| validate

    user1 ---->|yes| enrolled
    enrolled --> enroll
    enroll --> progress
    progress --> enroll
    enroll --->|complete| adapted
    adapted --> done
  1. If user == NULL, invoke ^next. The application should set user before starting enrollment, or do so in the ^next callback.
  2. If user == NULL, start model adaptation at step 7.
  3. Reset the enrollment count to 0. This tracks the number of usable enrollments for the current user or phrase.
  4. Segment audio with a VAD (UDT) or a wake word (EFT).
  5. Check enrollment audio quality.
    • If good, invoke ^pass and keep the recording.
    • If bad, invoke ^fail and discard the recording.
    • Start the next recording at step 4.
  6. Validate all the enrollment recordings, checking for consistency.
    • If no bad recordings remain, start enrolling the next user or phrase at step 2.
    • If bad, invoke ^fail, remove the recording and revalidate.
  7. When all users / phrases are available, invoke ^enrolled.
  8. Train a new recognizer with the enrollments
  9. Invoke ^adapted.
  10. Invoke ^done.

spot-enroll.c

Settings

^adapted, ^done, ^enrolled, ^fail, ^next, ^pass, ^pause, ^progress, ^resume, ^sample-count

enrollment-iterator, user-iterator, vocab-iterator

none

->audio-pcm, add-context, audio-stream-from, audio-stream-to, delete-user, re-adapt, user

accuracy, enrollment-task-index, interactive, req-enroll

enroll

live-spot.c, snsr-eval.c