API Documentation¶

This page contains documentation of the API to the THF-Micro™ library.

Error codes that are commonly encountered with the API are listed below in detail. Refer to section 'Error Codes' for the complete list of error codes.

SensoryInfo¶

errors_t SensoryInfo(infoStruct_T* isp);

Description: Populates isp->version with the THF-Micro™ library version number.
Parameters: Pointer to an existing infoStruct.
Returns: Always returns ERR_OK.
Comments: Use of this function is optional in an application.

SensoryInfo Example

infoStruct_T isp;
SensoryInfo(&isp);
u32 major = (isp.version >> 20) & 0x00000fff;
u32 minor = (isp.version >> 12) & 0x000000ff;
u32 point = isp.version & 0x00000fff;
printf("THF-Micro Version = %d.%d.%d\n", major, minor, point);

SensoryLibraryLicenseInfo¶

BOOL SensoryLibraryLicenseInfo(unsigned* seconds, unsigned* events);

Description: Get license limits for the THF-Micro™ library.
Parameters:

seconds: Pointer to register to store time limit for continuous speech recognition.
events: Pointer to register to store event limit for continuous speech recognition.

Returns: TRUE if the THF-Micro™ library has a valid license and FALSE if it does not.
Comments: If seconds and events are 0, the library has no limits on continuous speech recognition.
Notes: If the THF-Micro™ library does not have a valid license, SensoryProcessInit should fail with error code ERR_LICENSE. Recognition will not run in this case.

SensoryModelLicenseInfo¶

BOOL SensoryModelLicenseInfo(t2siStruct* t, unsigned* seconds, unsigned* events);

Description: Get license limits for the model.
Parameters:

t: Pointer to t2siStruct.
seconds: Pointer to store time limit for continuous speech recognition.
events: Pointer to store event limit for continuous speech recognition.

Returns: TRUE if the model has a valid license and FALSE if does not.
Comments: If seconds and events return 0, the model has no limits on continuous speech recognition.
Notes:

If the model does not have a valid license, SensoryProcessInit should fail with error code ERR_LICENSE. Recognition will not run in this case.
Call after t2siStruct has been initialized with net and grammar.

SensoryAlloc¶

errors_t SensoryAlloc(t2siStruct* t, unsigned int* size);

Description: Calculates the SPP size needed for speech recognition.
Parameters:

t: Pointer to t2siStruct.
size: Pointer to return SPP size needed.
Returns: Should return ERR_OK if successful. Other codes may indicate bad net or grammar.

Comments: Unit for size is bytes. SensoryAlloc stores the size in t->size.
Notes:

Before the call to SensoryAlloc, t->net and t->gram must point to the net and grammar data, respectively.
It may be useful to experiment with t->maxTokens. Application developers can use t->maxTokensUsed and t->tokensPruned during development to determine if a higher or lower number than the default MAX_TOKENS is needed.
If t->outOfMemory or t->tokensPruned are TRUE during the recognition process, then the search was limited by the number of search tokens. Increase t->maxTokens in this case.
Optionally, any of the other t2siStruct input fields can be customized. If they are zero, then recognizer will use default values. It is a good practice to zero the entire t2siStruct at the start of the application. If not, then all input fields need to be initialized before calling SensoryAlloc.

SensoryAlloc Example

t2siStruct app;
t2siStruct *t = &app;
memset(t, 0, sizeof(t2siStruct));
unsigned int size;

t->net = (intptr_t) NET_ADDR;
t->gram = (intptr_t) GRAM_ADDR;
errors_t error = SensoryAlloc(t, &size);
if (error) {
    printf("SensoryAlloc failed with error 0x%x\n", error);
    panic();
} 
t->spp = (void*)malloc(size);

SensoryAllocMulti¶

errors_t SensoryAllocMulti(t2siStruct* t, unsigned int* size, int channels, int depth);

Description: Calculates the size of SPP needed for speech recognition, for (C or D) > 1.
Parameters:

t: Pointer to t2siStruct.
size: Pointer to return SPP size needed.
channels: Number of channels to be processed at once.
depth: Number of frames in one channel to be processed at once.

Returns: Should return ERR_OK if successful. Other returned codes may indicate bad net or grammar.
Comments: After the call to SensoryAllocMulti, (channels * depth) number of frames will be processed in one call to SensoryProcessMultiData.
Notes:

SensoryAllocMulti Example

t2siStruct app;
t2siStruct *t = &app;
memset(t, 0, sizeof(t2siStruct));
unsigned int size;

t->gram = (intptr_t) GRAM_ADDR;
t->net = (intptr_t) NET_ADDR;
errors_t error = SensoryAllocMulti(t, &size, 1, 2); // One channel, two frames at once
if (error) {
    printf("SensoryAllocMulti failed with error 0x%x\n", error);
    panic();
} 
t->spp = (void*)malloc(size);

SensoryProcessInit¶

errors_t SensoryProcessInit(t2siStruct* t);

Description: Initializes SPP for speech recognition.
Parameters: Pointer to t2siStruct.
Returns: Should return ERR_OK if successful.
Comments: Calling this function is required whenever the net or grammar changes.
Notes:

Before the call to SensoryProcessInit, t->spp must contain a pointer to the SPP. In other words, SensoryAlloc must be called successfully beforehand.
In operation, SensoryProcessInit should always return ERR_OK. Below are some other commonly encountered error codes that must be corrected before speech recognition can be performed.
ERR_LICENSE means that the THF-Micro™ library does not have a valid license.
ERR_T2SI_PSTORE means that t->spp is NULL.
ERR_T2SI_NN_BAD_VERSION means that t->net is corrupted or does not point to a valid net file.
ERR_T2SI_BAD_VERSION means that t->gram is corrupted or does not point to a valid grammar file.
Potential reasons for encountering 'invalid' model files: outdated models that are no longer supported, incompatbile target formats, etc.
ERR_T2SI_BAD_SETUP means that t->net is NULL and/or t->gram is NULL.
ERR_T2SI_NN_MISMATCH means that the net and grammar are not paired. These two files are generated together and they must be used together; it is not appropriate to pair any net with any grammar.

SensoryProcessInit Example

// t->net, t->gram, t->spp already set by user
if (t->spp == NULL)
{
    printf("No memory left for SPP\n");
    panic();
}

errors_t error = SensoryProcessInit(t);
if (error) {
    printf("SensoryProcessInit failed with error 0x%x\n", error);
    panic();
}

SensoryProcessData¶

RecoResult* SensoryProcessData(t2siStruct *t, SAMPLE *brick);

Description: Processes one brick of audio samples.
Parameters:

t: Pointer to t2siStruct.
brick: Pointer to brick of audio samples to process.

Returns: Pointer to a RecoResult structure, containing information about recognition results for the processed brick.
Comments: SensoryProcessDatais called once every 15 msec, as each new brick of data becomes available; it is called repeatedly until recognition success or failure.
Notes:

When a recognition occurs, the error field of the RecoResult structure should be ERR_OK. Below are some other commonly encountered error codes:
ERR_NOT_FINISHED means that recognition process is still going.
ERR_RECOG_FAIL means that recognition failed. Occurs only with non-spotted vocabularies.
ERR_RECOG_LOW_CONF means that the recognizer found a potential, but it is doubtful (low-confidence). Occurs only with non-spotted vocabularies.
ERR_RECOG_LOW_CONF means that the recognizer found a potential recognition, but it is a 'maybe' (mid-confidence). Occurs only with non-spotted vocabularies.
ERR_DATACOL_TIMEOUT means no recognition occurred before timeout. Occurs only when t->timeOut has been specified.
ERR_T2SI_TOO_MANY_RESULTS means t->maxResults is too small. Increase the value of t->maxResults.
ERR_NULL_POINTER means t is NULL.

SensoryProcessData Example

// File-based audio input example
// In an actual application, real-time audio input is captured on-device
const char* audioFile = "audio.wav";
FILE* file = fopen(audioFile, "rb");
if (file == NULL) {
    printf("Cannot open audio file '%s'\n", audioFile);
}

// Keep calling SensoryProcessData while THF-Micro is getting audio frames 
do { 
    s16 brick[BRICK_SIZE_SAMPLES];
    fread(brick, sizeof(brick), 1, file);

    RecoResult *r = SensoryProcessData(t, &brick[0]);
    if (r->error == ERR_NOT_FINISHED) {
        continue; // THF-Micro processing ongoing, but no recognition 
    }
    if (r->error == ERR_OK) { // THF-Micro recognized a phrase
        printf("Recognized wordID = %d", r->wordID);
    } else {
        printf("SensoryProcessData failed with error 0x%lx\n", r->error);
        panic();
    }
}
while (!feof(file));

SensoryProcessDataSamples¶

RecoResult* SensoryProcessDataSamples(t2siStruct* t, SAMPLE* samples, int count);

Description: Processes smaller bricks of audio than standard, such as 5 or 10 msec.
Parameters:

t: Pointer to t2siStruct.
brick: Pointer to brick of audio samples to process.
count: Number of samples in a brick (must be less than 240).

Returns: Pointer to a RecoResult structure, containing information about recognition results for the processed brick.
Comments: Works just like SensoryProcessData, but takes smaller sized bricks of audio than standard.
Notes:

The audio buffer size must be a multiple both of 240 and count.
Don’t use variable-size blocks.
Only works if channels and depth are both 1.

SensoryProcessMultiData¶

errors_t SensoryProcessMultiData(t2siStruct* t, SAMPLE** samples);

Description: Processes multiple frames at once, saving memory access overhead.
Parameters:

t: Pointer to t2siStruct.
samples: Array of pointers to audio samples in each channel.

Returns: Should return ERR_OK when recognition occurs or ERR_NOT_FINISHED when recognition process is still going. Refer to notes about other error codes in the section about SensoryProcessData.
Comments: See SensoryGetResult(channel, depth) below for getting the recognition results for each frame in each channel.
Notes:

This function takes an array of pointers to audio samples; samples[0] is for the first channel, samples[1] for the second channel, and so on.
Each pointer must point to depth D frames worth of samples, that is, (D * 240) samples.

SensoryGetResult¶

RecoResult* SensoryGetResult(t2siStruct* t, int channel, int depth);

Description: Get the results for the brick at (channel, depth), produced by SensoryProcessMultiData.
Parameters:

t: Pointer to t2siStruct.
channel: Index of channel
depth: Index of depth

Returns: Pointer to a RecoResult structure, containing information about recognition results for the brick at (channel, depth).
Comments: Indices are 0-based. The 'older' results come first (depth = 0 is the 'oldest', depth = (D - 1) is the 'newest').

SensoryProcessRestart¶

errors_t SensoryProcessRestart(t2siStruct* t, int channel);

Description: Re-initializes SPP for recognition on a given channel.
Parameters:

t: Pointer to t2siStruct.
channel: Index of channel

Returns: Always returns ERR_OK.
Comments: Now optional after recognition success or error.
Notes:

If the net and grammar have not changed, then SensoryProcessRestart can be used to restart recognition, and is faster than the full initialization done by SensoryProcessInit.
One call must have been made to SensoryProcessInit before any call to SensoryProcessRestart.
SensoryProcessRestart does not check to see that the net and grammar have not changed; the application must guarantee that.
SensoryProcessRestart does no error checking to ensure that the t2siStruct is still properly initialized.

SensoryFeatureCompatible¶

BOOL SensoryFeatureCompatible(t2siStruct* src, t2siStruct* dst);

Description: Checks two t2siStruct for feature compatibility.
Parameters: Pointers to two initialized t2siStruct.
Returns: TRUE if features from src can be used for dst or FALSE otherwise.
Comments: Used in case of multiple recognizers on the same audio stream.
Notes: The THF-Micro™ library can be built without this API, upon request, for small savings in code size.

SensoryConnectFeatures¶

errors_t SensoryConnectFeatures(t2siStruct* src, t2siStruct* dst);

Description: Sets up dst to process features from src.
Parameters: Pointers to two initialized t2siStruct.
Returns: Should return ERR_OK if connected successfully or ERR_T2SI_FEATURE_MISMATCH otherwise.
Comments : Used in case of multiple recognizers on the same audio stream.
Notes: The THF-Micro™ library can be built without this API, upon request, for small savings in code size.

SensoryProcessFeatures¶

RecoResult* SensoryProcessFeatures(t2siStruct* dst);

Description: dst will process features from the feature source from SensoryConnectFeatures.
Parameters: Pointer to initialized t2siStruct.
Returns: Pointer to a RecoResult structure, containing information about recognition results for the processed brick.
Comments: Used in case of multiple recognizers on the same audio stream.
Notes:

The feature source must have had SensoryProcessData called on it.
The THF-Micro™ library can be built without this API, upon request, for small savings in code size.

SensoryAudioRewind¶

errors_t SensoryAudioRewind(t2siStruct* t, int rewind);

Description: Rewinds the audio input pointer.
Parameters:

t: Pointer to t2siStruct.
rewind: Number of milliseconds to rewind.

Returns: Should return ERR_OK if rewind is successful. If rewind is not successful, another error code will be returned.
Comments: Used after wakeword recognition for wake-to-command use-cases.

SensoryAudioFastForward¶

errors_t SensoryAudioFastForward(t2siStruct* t);

Description: Fast forwards a rewound audio input pointer to the current frame.
Parameters: Pointer to t2siStruct.
Returns: Should return ERR_OK if fast forward is successful. If fast forward is not successful, another error code will be returned.
Comments: Used after command recognition for wake-to-command use-cases.