Create an Annotation Set for Automatic Speech Recognition (ASR)
The Automatic Speech Recognition (ASR) Evaluation tool allows you to test audio files to measure the ASR accuracy of your skills. Before you run an Automatic Speech Recognition (ASR) evaluation, you create a set of sample audio utterances. This set of utterances is called an annotation set.
To create your annotation set for ASR testing, you must have the following items:
- An Amazon developer account.
To create an account, see Create Your Amazon Developer Account.
- A set of sample utterances for testing. You have two options for these utterances:
- Create a .zip file pre-recorded audio files containing your utterances compressed into a single .zip file.
- Record your utterances directly from your computer when you create your annotation set.
- (Optional) A CSV or JSON file of utterance transcriptions for your annotation set. You can upload this CSV file of transcriptions to avoid having to manually add the expected transcription for each utterance in your annotation set.
The .zip file of utterances has the following requirements:
- The compressed .zip file can't be larger than 10 MB.
- The audio files must be in one of the following formats:
- The .zip file can't contain more than 1000 files.
- Individual audio files can't be larger than 3 MB in size.
- Individual audio filenames can't contain non-ascii characters.
Create an automated annotation set of audio files
Before running ASR testing on a set of sample utterances, take the following steps to generate a set of pre-recorded audio utterances for testing.
To create an automated annotation set of audio files
- Sign in to the Alexa developer console.
- On the Skills tab, under SKILL NAME, find your skill.
- Under ACTIONS, from the drop-down menu in your skill's row, select Edit.
- Under the CUSTOM > Interaction Model > Intents, click Annotation Sets.
- On the NLU Evaluation Page, click ASR Evaluation.
- Under Automated Test Sets, click the Generate Test Set button.
Select the data source for your test set:
- Interaction Model – Use sample utterances in your skill's interaction model to create the test set.
- Frequent Utterances – Use utterances frequently spoken to your skill to create the test set.
- Utterances Recommendation Engine – Generate grammatical variations of sample utterances to create the test set.
Click Generate Test Set and wait for your test set to generate.Note: Test sets take varying amounts of length to generate. If you are waiting for a test set to generate, you can start to review previously generated test sets.
Review the values for Filename and Expected Transcription in your generated test sets.
Click the speaker icon to review the audio. The file to be played maps to the values
filePathInUploadfrom Update Annotation Set Annotations for Automatic Speech Recognition (ASR) API. The
expectedTranscriptionvalue maps to Expected Transcription in the developer console.
In the upper-right corner, click Evaluate Model to run the evaluation.
Review and troubleshoot issues with your skill models. The following list describes the expected pass rate and recommendations for improvements for each data source:
- Interaction Model – An Interaction Model test set should have a pass rate greater than 95%. One common cause of errors is conflicting utterances across similar intents.
- Frequent Utterance – A Frequent Utterance test set should have a pass rate over 80%. Because this test set contains the utterances that your users are saying to your skill, you can use this test set to review how your development model responds (or will respond when pushed to production) to live customer utterances.
- Utterances Recommendation Engine – The Utterances Recommendation Engine test set should have a medium pass rate. The utterances in this set are variations of your sample utterances and could preempt what a user might say to your skill and your skill's expected response. Review the utterances in this test set, and remove utterances that aren't relevant to your skill. After updating your test set, review all utterances that map to
AMAZON.FallbackIntent, if enabled, to find possible unsupported use cases.
Create an annotation set of audio files manually
As an alternative to generating an annotation set, you can create the annotation set manually. You can either upload a .zip file of pre-recorded utterances or record your utterances as part of creating your annotation set. If you have already uploaded your audio files to an Amazon S3 bucket, you can also upload a CSV file of expected utterance transcriptions and weights to create your annotation set.
To create an annotation set of audio files
- With your Amazon developer credentials, log in to the Alexa developer console.
- From the developer console, navigate to the Build tab.
- Under the Custom left nav tab, click Annotation Sets to display the NLU Evaluation page.
- On the NLU Evaluation Page, click the ASR Evaluation tab to go to the ASR Annotation Sets page.
- Under User Defined Test Sets, click the + Annotation Set button to create a new annotation set.
At the prompt, name your annotation set. The page refreshes and displays your newly named annotation set.
Add utterances to your annotation set by using either of the following options:
- Record your audio utterances straight from the developer console. See Record audio utterances for an annotation set.
- Upload a .zip file of pre-recorded utterances. See Upload a pre-recorded set of audio utterances.
- Upload a .csv or JSON file of file paths for utterances, expected transcriptions, and utterance weights for an already-uploaded set of audio files. See Upload a CSV or JSON file of expected transcriptions for an annotation set.
You can save partially completed annotation sets with audio and file paths or expected transcriptions. However, you cannot evaluate partial sets until the sets are complete.
- To the upper-left, click the Save Annotation Set button to save your annotation set.
When you have finished adding utterances to your annotation set, you can edit your utterance metadata. See Edit metadata for an utterance.
Record audio utterances for an annotation set
You can record audio utterances for your annotation set for testing in the developer console.
To record audio utterances on the developer consolue
- On the ASR Evaluation tab, under User Defined Test Sets, click +Annotation Set.
- Under FILE NAME, click Press and Hold to record.
- Speak your utterance.
- Release the button when you've finished recording.
After recording your utterance, you can edit its metadata. For more details, see Edit metadata for an utterance.
Upload a pre-recorded set of audio utterances
If you have already recorded a set of audio utterances and compressed them into a .zip file, you can do a batch upload of your utterances.
To upload a pre-recorded set of audio utterances
- On the ASR Evaluation tab, under User Defined Test Sets, click +Annotation Set.
- Click the Upload button.
- In the file navigator, navigate to and select the .zip file that contains your utterances.
- Click Open to upload the file to an AWS S3 bucket.
After you finish the upload, you can edit the metadata for individual utterances. See Edit metadata for an utterance.
Upload a CSV or JSON file of expected transcriptions for an annotation set
If you've already uploaded an annotation set of audio files to an AWS S3 bucket, you can bulk edit the metadata for those files. To avoid having to manually add expected transcriptions for each utterance in your annotation set, you can upload a CSV or JSON file to your annotation set to bulk upload all of your transcriptions at one time.
To upload a CSV or JSON file of expected transcriptions for an annotation set
- Create your file with three fields:
- filePathInUpload – Path in the uploaded zip file for the utterance. For example, consider a zip file containing a folder named 'folder' and with an audio file named audio.mp3 in that folder. The path is folder/audio.mp3. Use a forward slash ('/') to concatenate directories.
- expectedTranscription – Expected transcription for the utterance.
- evaluationWeight – Assigned weight indicating the importance of the utterance in evaluation.
The following image shows an example CSV file with valid column headings:
The following image shows an example JSON file with valid column headings:
At the right side of the page for your annotation set, click the Bulk Edit button to open an Upload Annotation Set prompt.
Navigate to your CSV or JSON file and click Open.
On the Upload prompt, click the Submit and Save.
The Expected Transcription and Weight fields automatically populate with the values from your CSV.
Edit metadata for an utterance
After creating your annotation set, you can edit the metadata for each utterance to help improve the accuracy of your ASR evaluation results.
To edit metadata for an utterance
From the page for your annotation set, you can perform the following tasks:
- Listen to an utterance.
- Add the expected transcription.
- Assign a weight to the utterance for ASR evaluations.
To listen to an utterance, click the speaker icon next to the utterance.
To add the expected transcription, click the Expected Transcription field for the utterance, and enter the actual text transcription for the utterance.
To assign a weight to the utterance, choose a numeric weight from the Weight drop-down list for the utterance.
The weight for the utterance indicates the importance of the utterance. For example, if for your skill, you expect the word "coffee" to be important for users, assign a higher weight to utterances containing the word "coffee." Weight values are on a scale of 1-10, with 10 granting the highest weight to an utterance.
You can now run your ASR Evaluation. See Run an Automatic Speech Recognition (ASR) Evaluation.