API Documentation¶
This page contains documentation of the API to the THF-Micro™ library.
Error codes that are commonly encountered with the API are listed below in detail. Refer to section 'Error Codes' for the complete list of error codes.
SensoryInfo¶
Description: Populatesisp->version with the THF-Micro™ library version number.Parameters: Pointer to an existing
infoStruct.Returns: Always returns ERR_OK.
Comments: Use of this function is optional in an application.
infoStruct_T isp;
SensoryInfo(&isp);
u32 major = (isp.version >> 20) & 0x00000fff;
u32 minor = (isp.version >> 12) & 0x000000ff;
u32 point = isp.version & 0x00000fff;
printf("THF-Micro Version = %d.%d.%d\n", major, minor, point);
SensoryLibraryLicenseInfo¶
Description: Get license limits for the THF-Micro™ library.Parameters:
seconds: Pointer to register to store time limit for continuous speech recognition.events: Pointer to register to store event limit for continuous speech recognition.
Returns: TRUE if the THF-Micro™ library has a valid license and FALSE if it does not.
Comments: If seconds and events are 0, the library has no limits on continuous speech recognition.
Notes: If the THF-Micro™ library does not have a valid license, SensoryProcessInit should fail with error code ERR_LICENSE. Recognition will not run in this case.
SensoryModelLicenseInfo¶
Description: Get license limits for the model.Parameters:
t: Pointer tot2siStruct.seconds: Pointer to store time limit for continuous speech recognition.events: Pointer to store event limit for continuous speech recognition.
Returns: TRUE if the model has a valid license and FALSE if does not.
Comments: If seconds and events return 0, the model has no limits on continuous speech recognition.
Notes:
- If the model does not have a valid license,
SensoryProcessInitshould fail with error code ERR_LICENSE. Recognition will not run in this case. - Call after
t2siStructhas been initialized with net and grammar.
SensoryAlloc¶
Description: Calculates the SPP size needed for speech recognition.Parameters:
t: Pointer tot2siStruct.size: Pointer to return SPP size needed.Returns: Should return ERR_OK if successful. Other codes may indicate bad net or grammar.
Comments: Unit for size is bytes. SensoryAlloc stores the size in t->size.
Notes:
- Before the call to
SensoryAlloc,t->netandt->grammust point to the net and grammar data, respectively. - It may be useful to experiment with
t->maxTokens. Application developers can uset->maxTokensUsedandt->tokensPrunedduring development to determine if a higher or lower number than the default MAX_TOKENS is needed. - If
t->outOfMemoryort->tokensPrunedare TRUE during the recognition process, then the search was limited by the number of search tokens. Increaset->maxTokensin this case. - Optionally, any of the other
t2siStructinput fields can be customized. If they are zero, then recognizer will use default values. It is a good practice to zero the entiret2siStructat the start of the application. If not, then all input fields need to be initialized before callingSensoryAlloc.
t2siStruct app;
t2siStruct *t = &app;
memset(t, 0, sizeof(t2siStruct));
unsigned int size;
t->net = (intptr_t) NET_ADDR;
t->gram = (intptr_t) GRAM_ADDR;
errors_t error = SensoryAlloc(t, &size);
if (error) {
printf("SensoryAlloc failed with error 0x%x\n", error);
panic();
}
t->spp = (void*)malloc(size);
SensoryAllocMulti¶
Description: Calculates the size of SPP needed for speech recognition, for (C or D) > 1.Parameters:
t: Pointer tot2siStruct.size: Pointer to return SPP size needed.channels: Number of channels to be processed at once.depth: Number of frames in one channel to be processed at once.
Returns: Should return ERR_OK if successful. Other returned codes may indicate bad net or grammar.
Comments: After the call to SensoryAllocMulti, (channels * depth) number of frames will be processed in one call to SensoryProcessMultiData.
Notes:
t2siStruct app;
t2siStruct *t = &app;
memset(t, 0, sizeof(t2siStruct));
unsigned int size;
t->gram = (intptr_t) GRAM_ADDR;
t->net = (intptr_t) NET_ADDR;
errors_t error = SensoryAllocMulti(t, &size, 1, 2); // One channel, two frames at once
if (error) {
printf("SensoryAllocMulti failed with error 0x%x\n", error);
panic();
}
t->spp = (void*)malloc(size);
SensoryProcessInit¶
Description: Initializes SPP for speech recognition.Parameters: Pointer to
t2siStruct.Returns: Should return ERR_OK if successful.
Comments: Calling this function is required whenever the net or grammar changes.
Notes:
- Before the call to
SensoryProcessInit,t->sppmust contain a pointer to the SPP. In other words,SensoryAllocmust be called successfully beforehand. - In operation, SensoryProcessInit should always return ERR_OK. Below are some other commonly encountered error codes that must be corrected before speech recognition can be performed.
- ERR_LICENSE means that the THF-Micro™ library does not have a valid license.
- ERR_T2SI_PSTORE means that
t->sppis NULL. - ERR_T2SI_NN_BAD_VERSION means that
t->netis corrupted or does not point to a valid net file. - ERR_T2SI_BAD_VERSION means that
t->gramis corrupted or does not point to a valid grammar file. - Potential reasons for encountering 'invalid' model files: outdated models that are no longer supported, incompatbile target formats, etc.
- ERR_T2SI_BAD_SETUP means that
t->netis NULL and/ort->gramis NULL. - ERR_T2SI_NN_MISMATCH means that the net and grammar are not paired. These two files are generated together and they must be used together; it is not appropriate to pair any net with any grammar.
// t->net, t->gram, t->spp already set by user
if (t->spp == NULL)
{
printf("No memory left for SPP\n");
panic();
}
errors_t error = SensoryProcessInit(t);
if (error) {
printf("SensoryProcessInit failed with error 0x%x\n", error);
panic();
}
SensoryProcessData¶
Description: Processes one brick of audio samples.Parameters:
t: Pointer tot2siStruct.brick: Pointer to brick of audio samples to process.
Returns: Pointer to a RecoResult structure, containing information about recognition results for the processed brick.
Comments: SensoryProcessDatais called once every 15 msec, as each new brick of data becomes available; it is called repeatedly until recognition success or failure.
Notes:
- When a recognition occurs, the
errorfield of theRecoResultstructure should be ERR_OK. Below are some other commonly encountered error codes: - ERR_NOT_FINISHED means that recognition process is still going.
- ERR_RECOG_FAIL means that recognition failed. Occurs only with non-spotted vocabularies.
- ERR_RECOG_LOW_CONF means that the recognizer found a potential, but it is doubtful (low-confidence). Occurs only with non-spotted vocabularies.
- ERR_RECOG_LOW_CONF means that the recognizer found a potential recognition, but it is a 'maybe' (mid-confidence). Occurs only with non-spotted vocabularies.
- ERR_DATACOL_TIMEOUT means no recognition occurred before timeout. Occurs only when
t->timeOuthas been specified. - ERR_T2SI_TOO_MANY_RESULTS means
t->maxResultsis too small. Increase the value oft->maxResults. - ERR_NULL_POINTER means
tis NULL.
// File-based audio input example
// In an actual application, real-time audio input is captured on-device
const char* audioFile = "audio.wav";
FILE* file = fopen(audioFile, "rb");
if (file == NULL) {
printf("Cannot open audio file '%s'\n", audioFile);
}
// Keep calling SensoryProcessData while THF-Micro is getting audio frames
do {
s16 brick[BRICK_SIZE_SAMPLES];
fread(brick, sizeof(brick), 1, file);
RecoResult *r = SensoryProcessData(t, &brick[0]);
if (r->error == ERR_NOT_FINISHED) {
continue; // THF-Micro processing ongoing, but no recognition
}
if (r->error == ERR_OK) { // THF-Micro recognized a phrase
printf("Recognized wordID = %d", r->wordID);
} else {
printf("SensoryProcessData failed with error 0x%lx\n", r->error);
panic();
}
}
while (!feof(file));
SensoryProcessDataSamples¶
Description: Processes smaller bricks of audio than standard, such as 5 or 10 msec.Parameters:
t: Pointer tot2siStruct.brick: Pointer to brick of audio samples to process.count: Number of samples in a brick (must be less than 240).
Returns: Pointer to a RecoResult structure, containing information about recognition results for the processed brick.
Comments: Works just like SensoryProcessData, but takes smaller sized bricks of audio than standard.
Notes:
- The audio buffer size must be a multiple both of 240 and
count. - Don’t use variable-size blocks.
- Only works if channels and depth are both 1.
SensoryProcessMultiData¶
Description: Processes multiple frames at once, saving memory access overhead.Parameters:
t: Pointer tot2siStruct.samples: Array of pointers to audio samples in each channel.
Returns: Should return ERR_OK when recognition occurs or ERR_NOT_FINISHED when recognition process is still going. Refer to notes about other error codes in the section about SensoryProcessData.
Comments: See SensoryGetResult(channel, depth) below for getting the recognition results for each frame in each channel.
Notes:
- This function takes an array of pointers to audio samples;
samples[0]is for the first channel,samples[1]for the second channel, and so on. - Each pointer must point to depth D frames worth of samples, that is, (D * 240) samples.
SensoryGetResult¶
Description: Get the results for the brick at(channel, depth), produced by SensoryProcessMultiData.Parameters:
t: Pointer tot2siStruct.channel: Index of channeldepth: Index of depth
Returns: Pointer to a RecoResult structure, containing information about recognition results for the brick at (channel, depth).
Comments: Indices are 0-based. The 'older' results come first (depth = 0 is the 'oldest', depth = (D - 1) is the 'newest').
SensoryProcessRestart¶
Description: Re-initializes SPP for recognition on a given channel.Parameters:
t: Pointer tot2siStruct.channel: Index of channel
Returns: Always returns ERR_OK.
Comments: Now optional after recognition success or error.
Notes:
- If the net and grammar have not changed, then
SensoryProcessRestartcan be used to restart recognition, and is faster than the full initialization done bySensoryProcessInit. - One call must have been made to
SensoryProcessInitbefore any call toSensoryProcessRestart. SensoryProcessRestartdoes not check to see that the net and grammar have not changed; the application must guarantee that.SensoryProcessRestartdoes no error checking to ensure that thet2siStructis still properly initialized.
SensoryFeatureCompatible¶
Description: Checks twot2siStruct for feature compatibility.Parameters: Pointers to two initialized
t2siStruct.Returns: TRUE if features from
src can be used for dst or FALSE otherwise.Comments: Used in case of multiple recognizers on the same audio stream.
Notes: The THF-Micro™ library can be built without this API, upon request, for small savings in code size.
SensoryConnectFeatures¶
Description: Sets updst to process features from src.Parameters: Pointers to two initialized
t2siStruct.Returns: Should return ERR_OK if connected successfully or ERR_T2SI_FEATURE_MISMATCH otherwise.
Comments : Used in case of multiple recognizers on the same audio stream.
Notes: The THF-Micro™ library can be built without this API, upon request, for small savings in code size.
SensoryProcessFeatures¶
Description:dst will process features from the feature source from SensoryConnectFeatures.Parameters: Pointer to initialized
t2siStruct.Returns: Pointer to a
RecoResult structure, containing information about recognition results for the processed brick.Comments: Used in case of multiple recognizers on the same audio stream.
Notes:
- The feature source must have had
SensoryProcessDatacalled on it. - The THF-Micro™ library can be built without this API, upon request, for small savings in code size.
SensoryAudioRewind¶
Description: Rewinds the audio input pointer.Parameters:
t: Pointer tot2siStruct.rewind: Number of milliseconds to rewind.
Returns: Should return ERR_OK if rewind is successful. If rewind is not successful, another error code will be returned.
Comments: Used after wakeword recognition for wake-to-command use-cases.
SensoryAudioFastForward¶
Description: Fast forwards a rewound audio input pointer to the current frame.Parameters: Pointer to
t2siStruct.Returns: Should return ERR_OK if fast forward is successful. If fast forward is not successful, another error code will be returned.
Comments: Used after command recognition for wake-to-command use-cases.