Understand the AVS SpeechRecognizer

Important: Alexa Voice Service (AVS) developer tools are no longer generally available for Alexa Built-in. Please visit the Works with Alexa program if you are interested in building devices that connect to Alexa.

The SpeechRecognizer interface is the core interface of the Alexa Voice Service (AVS) and exposes directives and events for capturing and interacting with user speech. This page discusses the concepts and process flows for the SpeechRecognizer interface.

SpeechRecognizer functionality
State diagram
- Wake words

SpeechRecognizer functionality

Every user utterance leverages SpeechRecognizer, including the following interaction types between an Alexa Built-in device and AVS:

User Speech Capture: Captures user speech from an Alexa Built-in device.
User Speech Prompting: Prompts a user for more speech input when needed by Alexa to deliver an appropriate response.
Interaction Initiation Communication: Enables a device to inform AVS of how a user initiated an Alexa interaction, such as press-and-hold, tap-and-release, or voice-initiated/wake word enabled. See Device Form Factor and Alexa Interaction.
ASR Profile Selection: SpeechRecognizers chooses the appropriate Automatic Speech Recognition (ASR) profile for your product, which allows Alexa to understand user speech and respond with precision. See Automatic Speech Recognition (ASR) profile

State diagram

The following diagram illustrates state changes driven by SpeechRecognizer components. Boxes represent SpeechRecognizer states, and the connectors represent state transitions.

SpeechRecognizer has the following states:

IDLE: When not actively processing speech, the SpeechRecognizer is in an "idle" state. The idle state occurs under the following conditions:
- Before capturing user speech.
- Returning to an idle state after concluding a speech interaction with Alexa.
- When an ExpectSpeechTimedOut event elapses.
Note: In a multi-turn Alexa interaction, if Alexa requires more user speech input, SpeechRecognizer should transition from the idle state to the ExpectSpeech state without the user starting a new interaction.
RECOGNIZING: When a user begins interacting with your client, specifically when the client streams captured audio to AVS, SpeechRecognizer should transition from the idle state to the Recognize state. SpeechRecognizer should remain in the Recognize until the client stops recording speech or finishes streaming, at which point your SpeechRecognizer component should transition from the Recognize state to the "busy" state.
BUSY: While processing the speech request, SpeechRecognizer should be in the "busy" state. You cannot start another speech request until SpeechRecognizer transitions out of the busy state. From the busy state, SpeechRecognizer transitions to the idle state if Alexa processes and completes the request or to the ExpectSpeech state, if Alexa requires more speech input from the user.
EXPECTING SPEECH: SpeechRecognizer should be in the ExpectSpeech state when Alexa requires more audio input from a user. From ExpectSpeech, SpeechRecognizer should transition to the Recognize when a user interaction occurs, or the interaction is automatically started on behalf of the user. It should transition to the idle state if Alexa detects no user interaction within the specified timeout window.

The following diagram illustrates the expected transitions among the four SpeechRecognizer states:

SpeechRecognizer State Diagram — Click to enlarge

Wake words

The list of wake words informs Alexa of the possible valid wake words that a device might be set to listen for through the SetWakeWords, WakeWordsReport, and WakeWordsChanged messages.

Currently, the only wake word available for Alexa Built-in devices is ALEXA, which applies to every possible locale for a device. Therefore, specify ALEXA in the global DEFAULT scope.

Was this page helpful?

Provide feedback

Last updated: Nov 27, 2023