About the Alexa Voice Service (AVS) Interaction Model
A device that interacts with the Alexa Voice Service (AVS) encounters events/directives that produce competing audio. For example, a user might ask a question when Alexa is speaking, or a scheduled alarm plays when music is already streaming. The rules that govern the prioritization and handling of these inputs and outputs make up the AVS interaction model.
Implement the InteractionModel 1.2 to enable Alexa Routines for your product. InteractionModel 1.2 includes the
NewDialogRequest directive and modifications to the AVS interaction model voice request lifecycle.
Device vs. AVS-initiated audio interactions
Either the device or the AVS might begin an audio interaction:
- Device-initiated interactions – In a device-initiated interaction, the device sends an event to AVS. AVS processes the event and then returns any appropriate directives to the device in response. For example, when a user asks Alexa, "What time is it?" The device streams the captured user audio to AVS, and after AVS processes the event, AVS returns a directive to the device instructing the device to output speech, such as, "It's 10:00 AM."
- AVS-initiated interactions – In an AVS-initiated interaction, the device receives directives without any preceding device events. For example, when a user adjusts device volume from the Amazon Alexa app there is no event sent directly from the device to Alexa. Alexa interprets the action taken on the Amazon Alexa app and sends a directive to the device, which the device then acts upon.
Send each event to AVS in its own event stream. AVS might return directives and corresponding audio attachments in the same stream or in a separate downchannel stream. The downchannel stream delivers AVS-initiated directives to your device. The downchannel remains open in a half-closed state from the device and open from the Alexa Voice Service for the life of a connection. You have several options for implementing event and downchannel streams, depending on transport protocol. For more details on establishing both event and downchannel streams over HTTP/2, see Managing an HTTP/2 Connection.
Voice request lifecycle
When a device sends events to AVS, make sure that the device enforces the following rules:
- Your device must create a unique
Recognizeevent the device sends to AVS. The
Recognizeevent with directives sent to your device from AVS.
- Don't reuse any
dialogRequestIdwithin a session.
- Include the
- Keep track of the active
dialogRequestIdremains active until the device sends the next
Recognizeevent to AVS. After the device sends the next
Recognizeevent, cancel any directives associated with older
If AVS initiates an interaction, AVS sends a
NewDialogRequest directive with a
dialogRequestId in the payload. This
dialogRequestId replaces any older
dialogRequestIds. Cancel any directives associated with the previous
When AVS sends directives to a device, make sure that the device enforces the following rules:
- Process the directives with a
dialogRequestIdin the header that matches the active
- Set the
dialogRequestIdin the payload of
InteractionModel.NewDialogRequestdirectives to active, and then implement the directives.
- Process directives with a
dialogRequestIdin the header that matches the
- Implement directives without a
- Your device must send an
ExceptionEncounteredevent to AVS when it encounters new or unknown directives.
- If your device receives a
Speakdirective, you must fully playback the associated audio before processing subsequent directives.
For an example, see
AudioInputProcessor.cpp in the AVS Device SDK.
Channels help your device to determine priority for audio inputs and outputs. A channel can be either active or inactive and can be in the foreground or in the background at any given time.
Types of channels
Organize all audio handled by your device into three types of channels:
- Dialog channels – Active when either a user or Alexa is speaking.
- Alerts channels – Active when a timer or alarm is sounding.
- Content channels – Active when your device is playing media, such as audio streams.
Each channel maps to one or more AVS interfaces, and only one interface can be active on a given channel at a time. For example, the Dialog channel maps to the SpeechSynthesizer interface. When AVS returns a
Speak directive to your device, the Dialog channel becomes active and remains active until Alexa has finished responding. Similarly, when a timer goes off, the Alerts channel becomes active and remains active until a user cancels the timer.
The following table shows which interfaces map to each channel:
Multiple channels might be active concurrently. For instance, if a user is listening to music and asks Alexa a question, the Content and Dialog channels are concurrently active as long as the user or Alexa is speaking.
Foreground vs. background channels
Channels can either be in the foreground or background. At any given time, a device can have one channel in the foreground. When a channel in the foreground becomes inactive, the next active channel in the priority order moves into the foreground. If multiple channels are active, use the following priority order for your channels:
The following rules govern how channels interact:
- Inactive channels are always in the background.
- The Dialog channel is always in the foreground when active.
- The Alerts channel is in the foreground when the Dialog channel is inactive.
- The Content channel is in the foreground when all other channels are inactive.
- When a channel in the foreground becomes inactive, the next active channel in the priority order moves into the foreground.
- When the Content Channel is in the background, this refers to the pausing or attenuation of audio playback.
ExpectSpeechdirective in response to a
Recognizeevent prompting a user for additional speech, the Dialog channel should remain active until all directives associated with the request/response scenario are processed.
How to handle a directive for a given interface depends on the state of the associated channel. Is the channel active or inactive? Is the channel in the foreground or background? For example, if the Dialog channel is in the foreground, and an alarm sounds, the alarm should play in short alert mode as long as the Dialog channel is active. If an alarm sounds and the Dialog channel is inactive, a long alert should play.
For more details on how to handle each directive, see AVS API Overview.
Test the interaction model
Run the following test scenarios to verify that your implementation of the AVS interaction model is working as expected. You can test these scenarios on an Amazon Echo device or by the AVS Device SDK.
Test Alert and Dialog channel interactions
To test Alert and Dialog channel interactions
- Ask Alexa to set a timer for five seconds.
After Alexa notifies you that the timer was set, ask Alexa for the weather forecast.
As Alexa provides you with the forecast, the timer should go off as a short alert until Alexa has finished speaking. This behavior indicates that the Dialog channel is active, and the Alerts channel must be in the background. After Alexa finishes speaking, the Alerts channel moves to the foreground, and a long alert should continue to play until you stop the timer.
Test Content channel interactions
To test Content channel interactions
- Ask Alexa to set a timer for one minute.
When Alexa notifies you that the timer has been set, ask Alexa to play your favorite song.
The song begins playing, and one minute into playback, the music should be sent to the background as your timer plays. This behavior occurs because the Content channel can only be in the foreground if the other channels are inactive. The music should remain in the background until you stop the timer, at which point, your favorite song should return to normal volume, or resume from a paused state.
Test Dialog channel interactions
To test Dialog channel interactions
- Ask Alexa to play your favorite song.
After playback begins, ask Alexa for local news.
The music should sent to the background for your entire voice request and the response from Alexa. This behavior occurs because the Dialog channel is always in the foreground when active. When Alexa finishes responding, music should return to normal volume or resume from a paused state.
- Alerts Overview
- AudioPlayer Overview
- Display Cards Overview
- Notifications Overview
- Recommended Media Support