spot-eval-batch¶
This tool runs a phrase spotter over a (typically large) number of audio files to measure the performance in terms of the false accept (FA) rate, and the false reject (FR) ratio.
Test data requirements¶
Audio files used for FR (in-vocabulary) testing:
- Must contain a single target phrase utterance per file.
- Must contain lead-in ambient audio before the target phrase begins.
- In most cases one second of ambient audio will suffice.
- For custom spotters, refer to the documentation delivered with the model for the exact requirements.
- Most models created after May 2020 include a setting,
min-in-vocab-duration, which specifies the minimum required lead-in time in milliseconds. You can query this withsnsr-edit -t model.snsr -q min-in-vocab-duration - Recognition events that happen during the required lead-in time are counted as errors. See
INVFAin the log file format table below. - You can override the minimum lead-in requirement on the command-line (with
-s min-in-vocab-duration=0), but note that doing so means you will be testing the model outside of its intended operating environment. - The FR ratio is calculated as the fraction of the in-vocabulary files that the spotter model did not find the phrase in, expressed as a percentage. Example: Out of 2000 files, 120 did not not trigger the spotter. The false-reject ratio is therefore 6.0%.
Audio files used for FA testing:
- Should be much longer than the in-vocabulary examples.
- Should contain a selection of noise expected to be encountered during regular use.
- Must not contain explicit instances of the target phrase.
- The FA rate is calculated as the average number of times the spotter model mistakenly triggered per hour. Example: Out of 120 hours of audio, the spotter triggered 60 times. The false-accept rate is 0.5 / hour.
- If you run
spot-eval-batchwith the-uflag, unexpected recognition events from the FR testing files are included in the false accept totals. These unexpected events include: - Spots that happen during the required lead-in period.
- The second and all subsequent spots, as each in-vocabulary file must contain only a single target phrase utterance.
Usage¶
Runs a TrulyNatural SDK wake word model file on test data
and reports the false accept rate, false reject ratio, and execution speed.
usage: spot-eval-batch -t task [options]
options:
-f setting filename : load filename into task setting
-i filename : in-vocabulary (FR) filename list
-j threads : number of concurrent jobs (default: 1)
-l filename : log output file (default: <task>.log)
-o filename : out-of-vocabulary (FA) filename list
-s setting=value : override a task setting
-t task : specify task filename (required)
-u : count in-vocabulary FAs
-v [-v [-v]] : increase verbosity
At least one of -i and -o is required.
Settings are strings used as keys to query or change task behavior.
Most frequently used for wake words and command sets is operating-point.
Refer to the TrulyNatural SDK documentation for a complete list and
descriptions of all supported settings.
- The files specified by the
-iand-ooptions must contain exactly one audio file path per line, with no extraneous whitespace. The line separator is the newline character,\n. -i,-o, or both must be specified.-ucounts unexpected phrase spots in the in-vocabulary (FR) data towards the false accept total. This only has an effect for spotters that require a significant lead-in time to stabilize.- The
-joption determines the number of concurrent threads to start. For multi-core CPUs this can significantly speed up the evaluation.
Example¶
% spot-eval-batch -v -v -v -t alexa-fr.snsr \
-i inv.txt -o oov.txt -j 6 -s operating-point=10
Writing log to "alexa-fr.log"
INV: 2612 files, 23.128 hr, 23:07:39.285
OOV: 686 files, 142.984 hr, 142:59:01.345
Total: 3298 files, 166.111 hr, 166:06:40.630
Using operating point 10.
Available operating points: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20.
3298 files, 166.111 hr, 118 FA 0.83/hr, 3.33% FR, 2525 TA, 658.9x RT
- 3298 files processed.
- 166.111 hours of audio processed.
- 118 false accept spots, which is an FA rate of 0.83 per hour.
- 3.33% false reject ratio.
- 2525 true accept spots on in-vocabulary test audio.
- 658.9 real-time factor.
Log file format¶
spot-eval-batch produces a log file in plain text format. Each line in this file follows the same pattern: KEY [subkey] [detail].
| KEY | subkey | detail | notes |
|---|---|---|---|
| ERROR | error message | unexpected error encountered | |
| FACOUNT | integer | total number of false-accept spots | |
| FARATE | double | / hr | overall false-accept rate |
| FRRATIO | double | % | overall false-reject ratio |
| INFO | start-time | YYYY-MM-DD HH:MM:SS.sss UTC | job start time in UTC |
| INFO | completion-time | YYYY-MM-DD HH:MM:SS.sss UTC | job end time in UTC |
| INFO | duration | double | total job duration in seconds |
| INFO | sdk-name | "TrulyHandsfree" or "TrulyNatural" | |
| INFO | sdk-version | version-string | spot-eval-batch SDK version |
| INFO | command-line | command-line arguments | includes @c argv[0] |
| INFO | operating-point | integer | selected operating point |
| INFO | inv-files | integer | number of in-vocabulary (FR) test files |
| INFO | inv-seconds | integer | seconds of in-vocabulary audio |
| INFO | inv-hours | HHH:MM:SS.sss | inv-seconds as hours, minutes, seconds |
| INFO | oov-files | integer | number of out-of-vocabulary (FA) test files |
| INFO | oov-seconds | integer | seconds of out-of-vocabulary audio |
| INFO | oov-hours | HHH:MM:SS.sss | oov-seconds as hours, minutes, seconds |
| INFO | inv/oov-seconds | integer | seconds of OOV audio in FR test files |
| INFO | inv/oov-hours | HHH:MM:SS.sss | inv/oov-seconds as hours, minutes, seconds |
| INFO | real-time-factor | double | total duration of audio processed divided by the processing time |
| INVFA | "filename" | start-ms end-ms "phrase" sv-score score | FA in in-vocabulary test file. This is a spot that happened during the min-in-vocab-duration lead-in period, or an additional, spurious, spot recognized in the in-vocabulary file. |
| INVFR | "filename" | false reject, no spot in this file | |
| INVTA | "filename" | start-ms end-ms "phrase" sv-score score | true accept |
| INVTX | "filename" | N spots | more than one spot in this file |
| OOVFA | "filename" | start-ms end-ms "phrase" sv-score score | FA in out-of-vocabulary test file |
| REJECT | "filename" | reason | filename was rejected as unusable |
| TACOUNT | integer | total number of true-accept spots |