Model types¶
The type of a model specifies the runtime behavior: what it does, which setting keys it supports, and when it invokes event callbacks.
These SDKs support both fundamental and composed model types. Fundamental types include wake words, adapting wake words, models that create wake words through user enrollment, VAD, LVCSR, and STT. Templates add features by composition, combining multiple fundamental models into one.
The TrulyHandsfree SDK supports all wake word, wake word enrollment, and VAD models. TrulyNatural (Lite) includes TrulyHandsfree and support for LVCSR. TrulyNatural STT includes TrulyNatural (Lite) and adds speech to text.
Fundamental models¶
- Wake word
- Fixed and enrolled wake words, and keyword spotted command sets.
- Adapting wake word
- Fixed wake word models that continuously adapt to speakers' voices to improve false-accept rates.
- Wake word enrollment
- Adapts fixed (EFT) and user-defined (UDT) wake words to speakers' voices, creating wake word models specific to these speakers.
- VAD
- Finds the start- and endpoints of speech segments in a stream of audio data.
- LVCSR tnl
- These recognizers use a phonetic acoustic model and an FST vocabulary decoder. They are suitable for small to medium vocabulary tasks, but not for audio transcription
- STT stt
- Audio transcription with transformers.
Composed models¶
Templates add behavior to the fundamental model types listed above. Use these, for example, to create a single model that waits for a keyword, runs a VAD, and then recognizes the segmented speech with an STT recognizer. This composed model uses the same API as a simple wake word and does not require application code changes.