API overview¶
This is a brief overview of the API design goals, the SDK's conceptual model, and the two supported audio processing modes.
Design goals¶
The TrulyNatural SDK API is a result of these design goals:
- Pure C implementation.
- Lowest common denominator, widest toolchain availability.
- No C++ runtime overhead.
- Fast.
- Simple API.
- Small footprint: limited number of functions and data types.
- Generic, independent of the inference task.
- Fundamental data types only: floating point, integer, strings, streams, and opaque object instance handles.
- Make it easier to provide bindings for languages other than C.
- Flexible configuration.
- Hide complexity,
- but still allow for fine-grained configuration if needed.
- Settings indexed by string names, documented settings define public API.
- One self-contained model per task.
- Model includes a flow graph that specifies how various low-level internal modules (feature extractors, acoustic models, etc.) connect and interact.
- Includes all required module configurations.
- Run on a wide variety of platforms, including ones without file system support.
There is a significant downside to these design choices: Discoverability is very limited. You cannot determine model behavior from function or method names alone. You must refer to the model type documentation for expected task behavior and available settings.
Conceptual model¶
This library uses a dataflow approach to evaluate speech recognition tasks. It uses inversion of control: The SDK invokes event handlers to report results and control task flow.
The API contains two primary data types: Session used for model inference, and a Stream abstraction for input and output.
Sessions hold the entire state of a model instance, and use streams for all input and output. There is, for example, a single load function to load a model into a session, but this supports loading from a named file, an open FILE * handle, a memory segment, from the code segment, and from compressed assets on Android.
Models (snsr files) define flow pipelines and session behavior. These contain the serialized content1 of the session flow graph, including all binary models and configurations. Think of these as hierarchical key-value databases. Once loaded into a Session, you can query or change the setting keys with generic getter and setter functions.
Processing modes¶
We support two modes for audio processing:
- Pull mode, where the run function reads audio from the configured input stream. This blocks on read until new data are available. The run function returns only when the stream runs out of data (for example the end of a file), an event handler tells it to stop, or an error occurs.
- Push mode, where the application repeatedly calls the push function with small chunks of the audio data. The push function returns once it has processed or buffered these data. The application eventually calls stop to flush and process any buffered data.
Model evaluation typically follows this recipe:
- Create a new session instance with new.
- Load a task model into the instance.
- Set the input source stream.
- Register one or more event handlers.
- Enter the main loop by calling run. The library will process the input streams and invoke event handlers at appropriate times. The main loop continues until a terminating condition is reached, such as an event returning an error code.
- Release the session instance.
Language bindings¶
This version of the TrulyNatural SDK supports two language bindings: C and Java. There's a one-to-one mapping between C functions and Java methods, with these idiomatic differences to translate between them.
Naming. A C function snsrXxx(SnsrSession s, ...) becomes a Java method Session.xxx(...) — the Snsr prefix and the SnsrSession first argument are absorbed into the receiver. For example, snsrSetHandler(s, key, c) ↔ s.setHandler(key, c).
Return values and error handling. The C binding uses a latched-error model: every function returns SnsrRC, and once a session enters an error state, every subsequent call short-circuits with the same code until clearRC (or reset) is called. Read the latched code with rC and the human-readable detail with errorDetail.
The Java binding does not surface latched errors to callers. Each Java method either completes successfully or throws an exception describing the failure; subsequent method calls on the same session start fresh. Six methods that perform I/O — Session.load, Session.run, and Stream.copy, Stream.getDelim, Stream.open, Stream.read, Stream.skip, Stream.write — declare throws java.io.IOException and so are checked. All other exceptions are unchecked subclasses of java.lang.RuntimeException. The mapping from SnsrRC to Java exception class is part of the binding's contract:
| Java exception class | Thrown for SnsrRC codes such as | Typical cause |
|---|---|---|
java.io.IOException (checked) | EOF, STREAM, STREAM_END, NOT_OPEN, BUFFER_OVERRUN, BUFFER_UNDERRUN, DELIM_NOT_FOUND, LIBRARY_TOO_OLD | Stream I/O failed or reached end-of-data. Only thrown from the six methods that declare throws IOException; identical conditions outside those methods raise RuntimeException. |
java.lang.OutOfMemoryError | NO_MEMORY, NOT_ENOUGH_SPACE | Allocation failed. |
java.lang.IllegalArgumentException | INVALID_ARG, INVALID_HANDLE, INCORRECT_SETTING_TYPE, SETTING_IS_READ_ONLY, FORMAT_NOT_SUPPORTED, VERSION_MISMATCH | The caller passed a value that is the wrong type, the wrong format, or otherwise unacceptable. |
java.lang.IndexOutOfBoundsException | SETTING_NOT_FOUND, SETTING_NOT_AVAILABLE, VALUE_NOT_SET, ARG_OUT_OF_RANGE, NAME_NOT_UNIQUE, ITERATION_LIMIT | A lookup by name, index, or port missed; or a numeric / iteration limit was exceeded. |
java.lang.RuntimeException | All other non-OK codes — ERROR, NOT_IMPLEMENTED, CONFIGURATION_*, ELEMENT_*, LICENSE_*, NO_MODEL, NOT_INITIALIZED, NOT_SUPPORTED, TIMED_OUT, and so on. | Misconfiguration, license issue, internal API violation, or the catch-all bucket. |
Callbacks that need to control the run loop without raising an exception may return any of OK, STREAM_END, STOP, SKIP, REPEAT, or TIMED_OUT; any other return value from a callback is translated into an exception by the same mapping.
Java methods that have no out-parameters in C return the Session instance instead, so callers chain freely:
When an exception is thrown, the underlying SnsrRC code remains available on the Session (or Stream) for the duration of the catch block: call rC to read it programmatically, or errorDetail for the human-readable message. (The exception's getMessage() is set to the same errorDetail text.) This is useful when the exception class alone is too coarse — for example, distinguishing a missing setting from an unsupported one when both surface as IndexOutOfBoundsException:
try {
s.set(KEY, value).run();
} catch (IndexOutOfBoundsException e) {
if (s.rC() == SnsrRC.SETTING_NOT_FOUND) {
// Treat as a config-file typo, fall back to a default.
} else {
throw e;
}
}
The handler does not need to do anything to "reset" the session — as above, the next method call on the same session starts fresh.
Memory management. The C binding uses reference counting on every Session, Stream, and Callback handle (and on the C string returned by getString); manage lifetimes with retain and release. The Java binding uses standard garbage collection. Explicit retain / release are not exposed in Java and are not needed; Session.release() is provided for callers who want to free native resources promptly without waiting for GC, but is otherwise optional.
-
Similar in concept to protocol buffers, but with streamed unpacking into native data structures in RAM, no need for accessor functions, and additional features such as conversion to code for running from the text segment. ↩