Skip to content

Model types

The type of a model specifies the runtime behavior: what it does, which setting keys it supports, and when it invokes event callbacks.

These SDKs support both fundamental and composed model types. Fundamental types include wake words, adapting wake words, models that create wake words through user enrollment, VAD, LVCSR, and STT. Templates add features by composition, combining multiple fundamental models into one.

The TrulyHandsfree SDK supports all wake word, wake word enrollment, and VAD models. TrulyNatural (Lite) includes TrulyHandsfree and support for LVCSR. TrulyNatural STT includes TrulyNatural (Lite) and adds speech to text.

Fundamental models

Wake word
Fixed and enrolled wake words, and keyword spotted command sets.
Adapting wake word
Fixed wake word models that continuously adapt to speakers' voices to improve false-accept rates.
Wake word enrollment
Adapts fixed (EFT) and user-defined (UDT) wake words to speakers' voices, creating wake word models specific to these speakers.
VAD
Finds the start- and endpoints of speech segments in a stream of audio data.
LVCSR tnl
These recognizers use a phonetic acoustic model and an FST vocabulary decoder. They are suitable for small to medium vocabulary tasks, but not for audio transcription
STT stt
Audio transcription with transformers.

Composed models

Templates add behavior to the fundamental model types listed above. Use these, for example, to create a single model that waits for a keyword, runs a VAD, and then recognizes the segmented speech with an STT recognizer. This composed model uses the same API as a simple wake word and does not require application code changes.