Functional Requirements for AVS Products
Customers who purchase a product with Amazon Alexa expect a familiar experience. This document provides functional and design requirements and recommendations to help you meet user expectations and avoid issues as you develop, prototype, and prepare your product with Alexa Voice Service (AVS) for commercial release.
The AVS functional and design guidelines apply to device makers planning to implement the general AVS APIs and SDKs. If your AVS implementation is more specialized, see the following specific documentation for your requirements:
- Alexa for Auto – If you are implementing AVS in an automotive accessory, see the Alexa Automotive Documentation.
- Alexa for Business – If you are building with Alexa for Business, see Build with Alexa for Business and the Alexa for Business Requirements.
- Alexa Mobile Accessory (AMA) Kit – To verify that your Alexa Mobile Accessory (AMA) meets the Amazon Alexa UX and functional requirements, see the AMA Kit Functional Requirements.
Requirements subject to change
As Amazon introduces new Alexa features and functionality, these guidelines are periodically improved and updated.
These current guidelines were published on March 9, 2022.
This document consistently uses the following terms to signify requirements and recommendations:
- SHALL: Items preceded by SHALL are requirements for all commercial product releases.
- SHOULD: Items preceded by SHOULD are recommendations for all commercial product releases and improve the Alexa user experience.
This document consistently uses the following terms to describe Alexa features and concepts:
- Voice-initiated: Products activated by user speech for a hands-free experience or by user touch.
- Touch-initiated: Products activated by a user physically touching a control on the product. These products don't support voice-initiated interactions.
- Tap-to-talk: Touch-initiated products activated by the customer pushing and releasing a button before speaking.
- Hold-to-talk: Touch-initiated products activated by the customer holding down a button when speaking.
- Attention states: The parts of an Alexa conversation flow, including Listening and Thinking.
- Action button: A button used to wake or interrupt Alexa. It can be a hardware or GUI button on a device, a button on a remote control for a device, or a GUI button in a companion app.
- Device control: A control used to adjust product settings or interact with media. The control can be a hardware or GUI button on a device, a button on a remote control for a device, or a GUI button in a companion app.
- Visual cues: Visual cues are LEDs or GUI elements that provide feedback to the user on the current Alexa state.
- Audio cues: Audio cues are sounds that provide feedback to the user on transitions between Alexa attention states.
- Multi-turn: A multi-turn interaction refers to situations where Alexa requests additional spoken information from the customer to complete an interaction. Multi-turn situations are initiated when your product receives an ExpectSpeech Directive from AVS.
1. Core requirements and recommendations
The following requirements and recommendations are applicable to all products with Alexa Built-in.
1.1. Your product SHALL be capable of audio input (i.e. capturing customer speech via one or more microphones) and streaming captured speech to the cloud per the specs provided in the SpeechRecognizer Interface.
1.1.1. The microphone ON/OFF control SHALL be hardware-based. Microphones SHALL be turned OFF by removing power.
1.1.2. The product SHALL use a dedicated red LED to indicate microphone OFF state.
1.2. Wearable devices, such as smart watches, SHOULD be capable of audio output. All other devices SHALL be capable of audio output, such as speaker, headphones, line out, or Bluetooth.
1.2.1. If your product provides audio output, it SHALL provide on-device controls for adjusting volume.
1.2.2. Wearable devices, such as smart watches, with no audio output capability SHALL provide haptic feedback in place of audio cues. Wearable devices with audio output SHOULD provide haptic feedback when audio output is turned off.
1.3. A voice-initiated product with an integrated touch-screen user interface SHOULD have an Action button. All other products SHALL have an Action button. For more details about Action buttons, see UX for Product Buttons.
1.3.1. The Action button SHALL enable customers to initiate an Alexa interaction.
1.3.2. The Action button SHALL enable customers to interrupt an Alexa output (e.g. media playback, Alexa voice responses, or Alerts). See UX Interrupt Guidance for more.
1.3.3. The Action button SHALL be easily accessible to your customer.
1.3.4. The Action button SHOULD have the single purpose of initiating Alexa interactions.
1.3.5. If it does not have an Action button, a voice-initiated product with an integrated touch-screen interface SHALL display an on-screen Alert dismissal prompt.
1.4. Your product SHALL clearly convey core Alexa attention states to the customer using visual and audio cues. The core attention states are Listening, Thinking, Speaking, Microphone ON/OFF, Alerts, Notifications, and Do Not Disturb. See the AVS UX Design Overview for further information about the Alexa attention states.
1.4.1. The visual cues your product uses to satisfy Requirement 1.4 SHOULD be prominent. See the AVS UX Design Overview for further information about visual cue prominence.
1.4.2. For products that do not have prominent visual cues, Start of Request and End of Request audio cues SHALL be on by default.
1.4.3. Visual, audio, and haptic cues SHALL be synchronized to indicate when the Alexa Listening state starts and when it stops.
1.5.1. Your product SHALL use the same methods for conveying the start and end of the Listening attention state for all multi-turn interactions as for the initial interaction.
2. Voice-initiated products
The following guidelines are specific to voice-initiated products and extend the Core Requirements and Recommendations for those products.
2.1. Your product SHALL use only approved Amazon Alexa wake words, such as "Alexa".
2.1.1. Your product SHALL support cloud-based wake word verification.
2.1.2. Your product SHALL automatically activate its microphones without waiting for the wake word in multi-turn interactions. See also Requirement 1.5.
2.2. Your product SHALL enable customers to use voice to interrupt Alexa output (e.g. media playback, Alexa voice responses, and Alerts). See also Requirement 1.3.2.
2.3. Your product SHALL provide an always-available control to disable the Alexa wake word or device microphones, putting your product into the Microphone Off state. For more, see UX Attention System.
2.3.1. You SHALL provide audio cues to indicate when a user activates or deactivates the Microphone Off attention state.
2.3.2. Your product SHALL use visual cues to convey clearly and continually to the customer that the Alexa Microphone Off attention state is active.
2.4. Your product SHALL support enabling/disabling microphones when internet connectivity is unavailable. See also Requirement 1.6.
2.5. The microphones used for Alexa interactions on your product SHALL have +/- 1 dB sensitivity matching.
2.6. Your product SHALL send either Wake Word Diagnostics Information (WWDI) or Wake Word detection metadata as a part of the Alexa Service API calls. Request a copy of the WWDI Integration Guide from your AVS contact. WWDI implementation is required for AVS certification. If your product uses a third-party wake word engine, you are not required to implement WWDI.
3. Touch-initiated products
The following guidelines are specific to touch-initiated products and extend the Core Requirements and Recommendations for those products. Unless noted, the guidelines apply to both tap-to-talk and hold-to-talk products.
3.1. Your product SHALL NOT require the use of a wake word as part of the user utterance.
3.2. Your microphones on the device SHALL be disabled until a user initiates an Alexa interaction.
3.3. Your product SHALL automatically activate its microphones without waiting for a touch interaction in multi-turn situations. The sole exception is hold-to-talk devices where the microphones are only activated when the customer holds down a device control. See also Requirement 1.5.
3.4. Your product SHALL enable customers to interrupt an Alexa output (e.g. media playback, Alexa voice responses, and Alerts) using the Action button. See also Requirement 1.3.2.
3.5. Your product SHALL use audio cues to indicate the start and end of the Listening attention state.
4. Media services
The following guidelines apply to all products that support media services, such as Amazon Music, TuneIn, iHeartRadio, Audible, and Flash Briefing. These media service guidelines apply to both voice-initiated and touch-initiated products.
For more details about handling competing audio outputs, review the Alexa Voice Service Interaction Model.
4.1. Your product SHALL pause or lower the speaker volume for audio output when a customer initiates an Alexa interaction during media playback.
4.1.1. Your product SHALL pause Audible content playback when interrupted by a customer.
4.1.2. If your product pauses media because of a customer interruption, it SHOULD resume playback automatically.
4.2. Your product SHALL allow customers to resume paused media through a voice request or a device control.
4.3. Your product SHOULD sufficiently buffer media so that short interruptions in internet connectivity don't disrupt playback.
The following guidelines apply to delivering and controlling alerts, such as timers or alarms, and extends Requirement 1.6 of the Core Requirements and Recommendations.
5.1. Your product SHALL always deliver scheduled alerts to customers even when internet connectivity is unavailable.
5.1.1. If alerts are delivered while internet connectivity is unavailable, your product SHALL send the appropriate events for the delivered alerts to Alexa when an internet connection is reestablished. For additional information, see Alerts Overview.
5.2. Your product SHALL support the use of the Action button to stop sounding alerts. For more details, see Requirement 1.3.2.
5.3. Your product SHALL play alerts that contain voice responses, such as Reminders, at the same volume as other Alexa voice responses.
5.4. Your product SHOULD support independent volume control for alerts that do not contain voice responses, such as Timers. When a customer adjusts the device volume for Alexa voice responses, it SHOULD NOT affect the volume for these alerts.
The following guidelines apply to delivering and controlling notifications.
6.1. Your product SHALL download and use the audio asset specified in the directive payload for the notification.
6.1.1. If the download fails or times out, your product SHALL use the Notification sound provided by Amazon.
6.2. Your product SHOULD implement the visual Notification indicator patterns as defined in the Attention System guidance.
7. Visual displays
The following requirements are specific to screen-based devices, and apply to both voice-initiated and touch-initiated products. Display Alexa visual responses by implementing the Alexa Presentation Language (APL), Display Cards, or both.
- For more details about APL and device viewports, including the Alexa.Presentation.APL and VisualCharacteristics interfaces, see the APL Tech Docs.
- For more details about Display Cards, see the AVS UX Design Guidelines, TemplateRuntime Interface, and PlayBackController Interface.
7.1. Your product SHALL display visual Alexa responses if it uses a pixel-based screen, such as, Smart TVs, Set-Top Boxes, AVRs, and MVPDs.
7.2. If you implement support for APL visual responses in your product, you SHALL implement a an appropriate viewport or viewhost window for your device that allows all Alexa response content to render legibly.
7.2.1. Your product SHALL NOT add, remove, or alter the data supplied by APL directives.
7.3. If you implement Display Cards you SHALL follow all requirements regarding their implementation.
7.3.1. Your product SHALL render all visual metadata to the specification for its screen size and SHALL NOT add, remove, or alter the metadata in any way when presented to the user.
7.3.2. If media is enabled, your product SHALL display all playback controls provided.
7.3.3. Any Display Cards your product uses SHOULD conform to the Display Card Design Guidelines.
7.4. If your product has a camera, the camera ON/OFF control SHALL be hardware-based.
7.4.1. The camera SHALL be turned OFF by removing power.
7.4.2. The product SHALL visibly indicate camera ON/OFF state.
7.4.3. The product SHOULD have a physical cover or shutter for the camera.
8. Setup and authentication
The Alexa setup process communicates the value of Alexa to your users and helps them connect your product to their Amazon account. Ideally, the Alexa setup flow should be incorporated into the setup or first run experience on your product. See the AVS UX Design Overview for more branding and style information for the Alexa setup and authentication experience.
8.1. Your product SHALL use Login With Amazon (LWA) to authenticate the customer. See the Authorization section in the Alexa Voice Service API Overview for additional information.
8.2. Your product SHALL have an Alexa setup/sign in experience that follows the Setup and Authentication guidelines in the AVS UX Design Overview.
8.2.1. Your product SHALL have a Splash Screen before the customer enters the Login With Amazon (LWA) authentication flow. If your product doesn't use Code-Based Linking, your product SHALL use the AVS Hosted Splash Screen provided through the Login With Amazon authentication flow. Your Splash Screen SHALL include the required elements as defined in the AVS UX Setup and Authentication guidelines.
8.2.2. Your product SHALL have a Things to Try screen after the customer exits the Login With Amazon authentication flow. Your Things To Try screen SHALL include the required elements as defined in the AVS UX Setup and Authentication guidelines.
8.3. Your device SHOULD allow the customer to choose an Alexa language.
8.3.1. Your device SHOULD include language selection as part of product setup.
8.3.2. Your device SHOULD include an Alexa language selector in the companion app Settings.
8.3.3. If your device supports multilingual mode, it SHALL include a language selection screen which follows the format described in the locale combinations API.
8.4. Your product SHALL support logout by the customer.
8.5. You SHOULD include information about Alexa setup and use in your product's instructional materials.
These requirements are specific to products that use the Bluetooth interface:
9.1. Your product SHOULD support the Advanced Audio Distribution Profile (A2DP) Bluetooth profile.
If your product supports A2DP, it SHALL support receiving digital audio streams from an A2DP SOURCE device.
If your product supports A2DP, it SHALL support the Audio/Video Remote Control Profile (AVRCP) Bluetooth profile.
9.2. If your product uses the Bluetooth interface, it SHALL use the Bluetooth connect and disconnect sounds provided by Amazon.
Reporting requirements provide information to AVS about the distribution of AVS software versions. The following reporting requirements apply to all new products entering the AVS certification process after January 2, 2021:
10.1. Your product SHALL report a valid product firmware version to Alexa through the
SoftwareInfo event in the System interface.
10.2. Your product SHALL report all capability interface versions that devices support through the Alexa.Discovery interface.
10.3. Your product SHALL declare support for the Alexa.SoftwareComponentReporter interface.
10.3.1 Products that implement the AVS Device SDK must use the Alexa.SoftwareComponentReporter interface to report their AVS Device SDK version to Alexa with the component name
com.amazon.alexa.deviceSDK. Products that don't use the AVS Device SDK must still assert support for the Alexa.SoftwareComponentReporter interface and omit the entry for