About Alexa Voice Service (AVS) Focus Management
Because some devices support actions that are independent of Alexa, such as offline audio playback or screen-based interactions, the Alexa Voice Service (AVS) might not be able to accurately determine what's happening on a device at a given time.
Focus management improves the accuracy of responses from Alexa when a user makes an ambiguous request. Consider the following scenario where a user asks Alexa to "pause" and "resume" an action without explicitly stating which specific action:
- A device is playing music.
-
The user asks Alexa to pause music playback:
"Alexa, pause."
-
The user asks Alexa a question:
-
The user asks Alexa to resume the previous activity without specifying music playback:
"Alexa, resume."
- The device resumes music playback.
Here, the device becomes responsible for reporting audio or visual activity to AVS so that AVS can determine the current focus and accurately respond to the user request.
Channel focus
Channels govern how a device should prioritize audio and visual inputs and outputs. Each channel maps at least one AVS interface and can be either active or inactive at a given time. The device tells Alexa which interface has the focus of an audio or visual channel, any applicable idle time, and sends this state information in the context
container under the AudioActivityTracker
and VisualActivityTracker
interfaces.
Multiple channels can be active simultaneously. For instance, if a user is listening to music and asks Alexa a question, the Content and Dialog channels are concurrently active as long as the user or Alexa is speaking.
For more details about channels and channel prioritization, see About the AVS Interaction Model.
Use cases
These use cases highlight the benefits of focus management.
Audio
The following example illustrates what happens when a sounding timer interrupts music playback. Alexa uses activity state to determine which content a user is attempting to stop:
- A device is playing music.
- A timer sounds.
-
The user asks Alexa to stop the timer:
"Alexa, stop."
- In the
context
of theRecognize
event, the device informs Alexa that the timer has focus of the audio channel. - The timer stops and music playback resumes.
Bluetooth
This example illustrates how Alexa uses activity state to determine which directive to send to stop music:
- A user connects their phone to a paired Alexa device through Bluetooth: "Alexa, connect my phone".
- The phone initiates music playback, and the device plays music.
- The user says, "Alexa, stop." The device receives a
Bluetooth.Stop
directive from Alexa through the phone over Bluetooth. - The user says, "Alexa, play Duran Duran on Amazon Music," which prompts Alexa to send an
AudioPlayer.Play
directive to the device. In this example, the content originates from an Alexa music provider rather than the paired device. - The user says, "Alexa, stop."
-
The
context
for theRecognize
event tells Alexa that theAudioPlayer
interface has focus of the audio channel. The device receives andAudioPlayer.Stop
directive.Without focus management, the device could have erroneously received a
Bluetooth.Stop
directive instead. - Music playback stops.
Visual
Display Cards
Visual focus for AVS expires after eight seconds have elapsed. Therefore, if a user makes a request after eight seconds have elapsed, Alexa might be unaware of the visual activity state for the device. The following example shows what can happen without focus management:
-
The user makes a request for movie times:
"Alexa, show me movie times for 'Star Wars'."
- The device displays the first page of movie times for Star Wars.
-
The user waits 25 seconds and says: "Alexa, next page."
- Because visual focus in the cloud expires after eight seconds elapse, Alexa is unaware that the display card still has visual focus on your product. Alexa doesn't know how to respond.
With focus management enabled, a device reports the activity state for each supported audio and/or visual channel within the context
object. Because all Recognize
events require a context
object, if the user says "Alexa, next page", Alexa recognizes that the TemplateRuntime
interface has focus of the visual channel and sends the correct directive.
Report ActivityState
Report the ActivityState
as part of Context for both the AudioActivityTracker
and VisualActivityTracker
interfaces.
AudioActivityTracker
– Specifies which interface is active for each audio channel and the time elapsed after an activity occurred for each channel.VisualActivityTracker
– Indicates that visual metadata from the TemplateRuntime interface is currently displayed to the user. Only applicable to devices with screens.
Idle time
Report the idleTimeInMilliseconds
for each channel in AudioActivityTracker
. If a channel is active at the time that a device reports context, idleTimeInMilliseconds
must be empty or set to 0
.
VisualActivityTracker
doesn't track idle time. The device must report the TemplateRuntime
interface as being in focus if the device is displaying visual metadata from Alexa, for example, a display card.
Example context
The following example message includes context for AudioActivityTracker
and VisualActivityTracker
:
{ "context": [ { "header": { "namespace": "AudioPlayer", "name": "PlaybackState" }, "payload": { "token": "{{STRING}}", "offsetInMilliseconds": {{LONG}}, "playerActivity": "{{STRING}}" } }, { "header": { "namespace": "SpeechRecognizer", "name": "RecognizerState" }, "payload": { "wakeword": "ALEXA" } }, { "header": { "namespace": "Notifications", "name": "IndicatorState" }, "payload": { "isEnabled": {{BOOLEAN}}, "isVisualIndicatorPersisted": {{BOOLEAN}} } }, { "header": { "namespace": "Alerts", "name": "AlertsState" }, "payload": { "allAlerts": [ { "token": "{{STRING}}", "type": "{{STRING}}", "scheduledTime": "{{STRING}}" } ], "activeAlerts": [ { "token": "{{STRING}}", "type": "{{STRING}}", "scheduledTime": "{{STRING}}" } ] } }, { "header": { "namespace": "Speaker", "name": "VolumeState" }, "payload": { "volume": {{LONG}}, "muted": {{BOOLEAN}} } }, { "header": { "namespace": "SpeechSynthesizer", "name": "SpeechState" }, "payload": { "token": "{{STRING}}", "offsetInMilliseconds": {{LONG}}, "playerActivity": "{{STRING}}" } }, { "header": { "namespace": "AudioActivityTracker", "name": "ActivityState" }, "payload": { "dialog": { "interface": "{{STRING}}", "idleTimeInMilliseconds": {{LONG}} }, "alert": { "interface": "{{STRING}}", "idleTimeInMilliseconds": {{LONG}} }, "content": { "interface": "{{STRING}}", "idleTimeInMilliseconds": {{LONG}} } } }, { "header": { "namespace": "VisualActivityTracker", "name": "ActivityState" }, "payload": { "focused": { "interface": "{{STRING}}", } } } ], "event": { "header": { "namespace": "SpeechRecognizer", "name": "Recognize", "messageId": "{{STRING}}", "dialogRequestId": "{{STRING}}" }, "payload": { "profile": "{{STRING}}", "format": "{{STRING}}", "initiator": { "type": "{{STRING}}", "payload": { "wakeWordIndices": { "startIndexInSamples": {{LONG}}, "endIndexInSamples": {{LONG}} } } } } } }
Additional resources
Last updated: Dec 17, 2020