About Alexa Voice Service (AVS) Focus Management


Because some devices support actions that are independent of Alexa, such as offline audio playback or screen-based interactions, the Alexa Voice Service (AVS) might not be able to accurately determine what's happening on a device at a given time.

Focus management improves the accuracy of responses from Alexa when a user makes an ambiguous request. Consider the following scenario where a user asks Alexa to "pause" and "resume" an action without explicitly stating which specific action:

  1. A device is playing music.
  2. The user asks Alexa to pause music playback:

    "Alexa, pause."

  3. The user asks Alexa a question:

  4. The user asks Alexa to resume the previous activity without specifying music playback:

    "Alexa, resume."

  5. The device resumes music playback.

Here, the device becomes responsible for reporting audio or visual activity to AVS so that AVS can determine the current focus and accurately respond to the user request.

Channel focus

Channels govern how a device should prioritize audio and visual inputs and outputs. Each channel maps at least one AVS interface and can be either active or inactive at a given time. The device tells Alexa which interface has the focus of an audio or visual channel, any applicable idle time, and sends this state information in the context container under the AudioActivityTracker and VisualActivityTracker interfaces.

Multiple channels can be active simultaneously. For instance, if a user is listening to music and asks Alexa a question, the Content and Dialog channels are concurrently active as long as the user or Alexa is speaking.

For more details about channels and channel prioritization, see About the AVS Interaction Model.

Use cases

These use cases highlight the benefits of focus management.

Audio

The following example illustrates what happens when a sounding timer interrupts music playback. Alexa uses activity state to determine which content a user is attempting to stop:

  1. A device is playing music.
  2. A timer sounds.
  3. The user asks Alexa to stop the timer:

    "Alexa, stop."

  4. In the context of the Recognize event, the device informs Alexa that the timer has focus of the audio channel.
  5. The timer stops and music playback resumes.

Bluetooth

This example illustrates how Alexa uses activity state to determine which directive to send to stop music:

  1. A user connects their phone to a paired Alexa device through Bluetooth: "Alexa, connect my phone".
  2. The phone initiates music playback, and the device plays music.
  3. The user says, "Alexa, stop." The device receives a Bluetooth.Stop directive from Alexa through the phone over Bluetooth.
  4. The user says, "Alexa, play Duran Duran on Amazon Music," which prompts Alexa to send an AudioPlayer.Play directive to the device. In this example, the content originates from an Alexa music provider rather than the paired device.
  5. The user says, "Alexa, stop."
  6. The context for the Recognize event tells Alexa that the AudioPlayer interface has focus of the audio channel. The device receives and AudioPlayer.Stop directive.

    Without focus management, the device could have erroneously received a Bluetooth.Stop directive instead.

  7. Music playback stops.

Visual

Display Cards

Visual focus for AVS expires after eight seconds have elapsed. Therefore, if a user makes a request after eight seconds have elapsed, Alexa might be unaware of the visual activity state for the device. The following example shows what can happen without focus management:

  1. The user makes a request for movie times:

    "Alexa, show me movie times for 'Star Wars'."

  2. The device displays the first page of movie times for Star Wars.
  3. The user waits 25 seconds and says: "Alexa, next page."

  4. Because visual focus in the cloud expires after eight seconds elapse, Alexa is unaware that the display card still has visual focus on your product. Alexa doesn't know how to respond.

With focus management enabled, a device reports the activity state for each supported audio and/or visual channel within the context object. Because all Recognize events require a context object, if the user says "Alexa, next page", Alexa recognizes that the TemplateRuntime interface has focus of the visual channel and sends the correct directive.

Report ActivityState

Report the ActivityState as part of Context for both the AudioActivityTracker and VisualActivityTracker interfaces.

  • AudioActivityTracker – Specifies which interface is active for each audio channel and the time elapsed after an activity occurred for each channel.
  • VisualActivityTracker – Indicates that visual metadata from the TemplateRuntime interface is currently displayed to the user. Only applicable to devices with screens.

Idle time

Report the idleTimeInMilliseconds for each channel in AudioActivityTracker. If a channel is active at the time that a device reports context, idleTimeInMilliseconds must be empty or set to 0.

VisualActivityTracker doesn't track idle time. The device must report the TemplateRuntime interface as being in focus if the device is displaying visual metadata from Alexa, for example, a display card.

Example context

The following example message includes context for AudioActivityTracker and VisualActivityTracker:

{
    "context": [
        {
            "header": {
                "namespace": "AudioPlayer",
                "name": "PlaybackState"
            },
            "payload": {
                "token": "{{STRING}}",
                "offsetInMilliseconds": {{LONG}},
                "playerActivity": "{{STRING}}"
            }
        },
        {
            "header": {
                "namespace": "SpeechRecognizer",
                "name": "RecognizerState"
            },
            "payload": {
                "wakeword": "ALEXA"
            }
        },
        {
            "header": {
                "namespace": "Notifications",
                "name": "IndicatorState"
            },
            "payload": {
                "isEnabled": {{BOOLEAN}},
                "isVisualIndicatorPersisted": {{BOOLEAN}}
            }
        },
        {
            "header": {
                "namespace": "Alerts",
                "name": "AlertsState"
            },
            "payload": {
                "allAlerts": [
                    {
                        "token": "{{STRING}}",
                        "type": "{{STRING}}",
                        "scheduledTime": "{{STRING}}"
                    }
                ],
                "activeAlerts": [
                    {
                        "token": "{{STRING}}",
                        "type": "{{STRING}}",
                        "scheduledTime": "{{STRING}}"
                    }
                ]
            }
        },
        {
            "header": {
                "namespace": "Speaker",
                "name": "VolumeState"
            },
            "payload": {
                "volume": {{LONG}},
                "muted": {{BOOLEAN}}
            }
        },
        {
            "header": {
                "namespace": "SpeechSynthesizer",
                "name": "SpeechState"
            },
            "payload": {
                "token": "{{STRING}}",
                "offsetInMilliseconds": {{LONG}},
                "playerActivity": "{{STRING}}"
            }
        },
        {
            "header": {
                "namespace": "AudioActivityTracker",
                "name": "ActivityState"
            },
            "payload": {
               "dialog": {
                    "interface": "{{STRING}}",
                    "idleTimeInMilliseconds": {{LONG}}
               },
               "alert": {
                    "interface": "{{STRING}}",
                    "idleTimeInMilliseconds": {{LONG}}
               },
               "content": {
                    "interface": "{{STRING}}",
                    "idleTimeInMilliseconds": {{LONG}}
               }
            }
        },
        {
            "header": {
                "namespace": "VisualActivityTracker",
                "name": "ActivityState"
            },
            "payload": {
                "focused": {
                    "interface": "{{STRING}}",
                }
            }
        }
    ],
    "event": {
        "header": {
            "namespace": "SpeechRecognizer",
            "name": "Recognize",
            "messageId": "{{STRING}}",
            "dialogRequestId": "{{STRING}}"
        },
        "payload": {
            "profile": "{{STRING}}",
            "format": "{{STRING}}",
            "initiator": {
                "type": "{{STRING}}",
                "payload": {
                    "wakeWordIndices": {
                        "startIndexInSamples": {{LONG}},
                        "endIndexInSamples": {{LONG}}
                    }   
                }
            }
        }
    }
}

Additional resources


Was this page helpful?

Last updated: Nov 27, 2023