Wake word enrollment¶
These models provide user enrollment for EFT and UDT. They produce wake-word models.
Enrollment models have task-type==enroll and filenames that by convention match eft-*.snsr or udt-*.snsr
wake word enrollment models included in this distribution.
Operation¶
Wake word enrollment has two modes: interactive for live recordings, and offline for pre-recorded enrollment audio.
Interactive¶
With interactive = 1 enrollment tasks expect live audio and re-record enrollments that cannot be used.
flowchart TD
start((start))
fetch[/samples from ->audio-pcm/]
audio(^sample-count)
segment[segment audio]
check[check audio quality]
validate[check enrollment consistency]
resume(^resume)
next(^next)
pause(^pause)
pass(^pass)
fail0(^fail)
fail1(^fail)
enrolled(^enrolled)
progress(^progress)
adapted(^adapted)
done(^done)
zeroCount[count←0]
incrCount[count++]
start --> next
next --> zeroCount
zeroCount --> resume
resume --> fetch
fetch --> audio
audio --> segment
segment --> fetch
segment -->|endpoint| pause
pause --> check
check -->|good| pass
pass ---> incrCount
incrCount --> resume
pass -->|count == required| validate
validate -->|good| next
validate -->|bad| fail1
check -->|bad| fail0
fail0 --> resume
fail1 --> zeroCount
next -->|user == NULL| enrolled
enrolled --> enroll
enroll --> progress
progress --> enroll
enroll --->|complete| adapted
adapted --> done
done ~~~ validate - Invoke ^next.
- If the callback set user to
NULL, start model adaptation at step 8.
- If the callback set user to
- Reset the enrollment
countto0. This tracks the number of usable enrollments for the current user or phrase. - Invoke ^resume. Application should use this to restart audio recording.
- Make an audio recording and segment it with a VAD (UDT) or a wake word (EFT).
- Read audio data from ->audio-pcm.
- Invoke ^sample-count.
- Process and repeat until a speech segment is found.
- Invoke ^pause. Application should pause audio recording.
- Check enrollment audio quality.
- If good, invoke ^pass. If we have req-enroll enrollments, start validation at step 7, else start the next recording at step 3.
- If bad, invoke ^fail and redo the current recording at step 3.
- Validate all the enrollment recordings, checking for consistency.
- If good, start enrolling the next user or phrase at step 1.
- If bad, invoke ^fail and restart at step 2.
- When all users / phrases are available, invoke ^enrolled.
- Train a new recognizer with the enrollments
- Call ^progress repeatedly until done.
- Invoke ^adapted.
- Invoke ^done.
live-enroll.c, enrollUDT.java, Enroll.java
Offline¶
With interactive = 0 enrollment tasks expect pre-recorded audio and fails if any of the enrollments cannot be used.
flowchart TD
start((start))
fetch[/samples from ->audio-pcm/]
audio(^sample-count)
segment[segment audio]
check[check audio quality]
validate[check enrollment consistency]
next(^next)
pass(^pass)
fail0(^fail)
fail1(^fail)
enrolled(^enrolled)
progress(^progress)
adapted(^adapted)
done(^done)
skip[discard enrollment]
skipBad[discard bad enrollment]
zeroCount[count←0]
incrCount[count++]
user0{user == NULL?}
user1{user == NULL?}
start --> user0
user0 -->|yes| next
user0 -->|no| user1
next --> user1
user1 -->|no| zeroCount
zeroCount --> fetch
fetch --> audio
audio --> segment
segment --> fetch
segment -->|endpoint| check
check --->|good| pass
pass --> incrCount
incrCount --> fetch
validate --->|bad| fail1
fail1 --> skipBad
skipBad --> validate
validate --> user1
check -->|bad| fail0
fail0 --> skip
skip --> fetch
fetch -->|STREAM_END| validate
user1 ---->|yes| enrolled
enrolled --> enroll
enroll --> progress
progress --> enroll
enroll --->|complete| adapted
adapted --> done - If user
== NULL, invoke ^next. The application should set user before starting enrollment, or do so in the ^next callback. - If user
== NULL, start model adaptation at step 7. - Reset the enrollment
countto0. This tracks the number of usable enrollments for the current user or phrase. - Segment audio with a VAD (UDT) or a wake word (EFT).
- Read audio data from ->audio-pcm.
- Invoke ^sample-count.
- Process and repeat until a speech segment is found or STREAM_END.
- If STREAM_END, validate at step 6.
- Check enrollment audio quality.
- Validate all the enrollment recordings, checking for consistency.
- If no bad recordings remain, start enrolling the next user or phrase at step 2.
- If bad, invoke ^fail, remove the recording and revalidate.
- When all users / phrases are available, invoke ^enrolled.
- Train a new recognizer with the enrollments
- Call ^progress repeatedly until done.
- Invoke ^adapted.
- Invoke ^done.
Settings¶
^adapted, ^done, ^enrolled, ^fail, ^next, ^pass, ^pause, ^progress, ^resume, ^sample-count
enrollment-iterator, user-iterator, vocab-iterator
none
->audio-pcm, add-context, audio-stream-from, audio-stream-to, delete-user, re-adapt, user