In the vehicle, customers can invoke Alexa by saying the wake word or pressing a button to begin speech dialogue. To communicate that she is listening, Alexa uses sound cues and visuals (voice chrome). Voice chrome also indicates when Alexa is thinking and speaking.
There are two primary ways to invoke Alexa in the vehicle. Both are required.
- Saying the wake word “Alexa”, also known as Alexa Hands-Free
- Pressing the Push-to-talk (PTT) button or an onscreen Tap-to-Talk (TTT) button to directly invoke Alexa without the wake word.
(Required) Support Alexa Hands-Free (wake word) to invoke Alexa.
The wake word provides hands-free, voice-forward experiences with Alexa. Minimizing the need for drivers to view or touch the screen helps to reduce the visual (eyes off of the road) and manual (hands off the wheel) distractions in the car. Customers can turn off Alexa Hands-Free (wake word) in settings (see Menu and settings for details). Customers must first enable Alexa before they can begin speaking with her.
(Required) Enable invocation of Alexa only after the customer has completed Alexa setup.
To ensure customer privacy, don’t enable wake word or dialogue with Alexa until the customer has enabled Alexa in setup. See Setup for details.
(Required) Provide customers a way to disable Alexa hands-free under the Alexa menu.
Customers may choose to disable the Alexa wake word. However they may still access Alexa using a button press (PTT). See Menu and settings for details.
(Required) Indicate to the customer when Alexa Hands-Free is disabled.
To indicate that Alexa Hands-Free (AHF) is off:
- Show the red Alexa voice chrome for five seconds when the customer turns off AHF.
- Display the AHF Off icon in the status bar of your IVI screen.
PTT is another way customers can invoke Alexa. If Alexa Hands-Free has been turned off, customers should still be able to use PTT to speak to Alexa in their vehicle.
(Required) If the vehicle offers a PTT button, customers must be able to invoke Alexa via PTT without a wake word.
If a customer assigns Alexa as the default voice assistant, use a short press on the PTT button to invoke Alexa without wake word. Alexa should be invoked immediately (within 250 ms) after the PTT button has been pressed and released.
If Alexa is not set as the default assistant for PTT, it’s recommended to still allow the customer to say “Alexa” after having pressed the PTT as another way to speak with Alexa.
A customer can use the Tap-to-talk button to invoke Alexa immediately with one tap and without needing to say “Alexa”. The Tap-to-talk button should behave similarly to the Push-to-talk button.
(Required) If PTT is not available, TTT (Tap-to-Talk) via an on-screen button can be used to invoke Alexa. This must be accessible at all times.
Examples of Tap-to-Talk images
(Required) Place the TTT button persistently in the same location with a consistent style and ample space for minimal chance of accidentally pressing the wrong button.
See the Visual Language section for details on how to display the Tap-to-Talk button
(Required) The TTT button must be the Alexa Talk Bubble button in the Design toolkit.
(Required) Allow Alexa to be invoked while mobile projection applications are running or other assistants are present (e.g. Android Auto or Apple CarPlay).
Example of projection mode while Alexa is available
(Required) Allow customers to interrupt Alexa when she is speaking (barge-in).
Customers must be able to interrupt Alexa with all available invocation methods. When interrupted, Alexa will stop speaking and start listening. For example, when Alexa is speaking about the weather, the customer can barge-in with wake word, PTT or TTT and say “will it rain tomorrow?”
(Required) Allow customers to cancel listening and speaking.
Customers can stop Alexa from listening by saying “cancel” and by pressing the PTT or TTT button during the listening state. When Alexa is speaking and the customer closes a display card, Alexa’s speech should be stopped. See the table below for more interruption behaviors.
(Required) Implement interruption behaviors as described in the table below:
This table shows how interruptions are to be implemented for interactions where Alexa is already listening or speaking.
|Wake word||Start listening||No change||Barge-in||Barge-in|
|PTT press||Start listening||Cancel listening||Barge-in||Barge-in|
|TTT press||Start listening||Cancel listening||Barge-in||Barge-in|
|Presses a cancel, back or close button||-||Cancel listening||Cancel dialog||Cancel dialog|
|Dismisses the display card||-||-||Cancel dialog||Cancel dialog|
|Touches the screen (e.g. to scroll text or launch an app)||-||Listening continues||Thinking continues||Speech continues|
The Alexa attention system
Alexa is a single personality that is coherent and familiar to customers across many devices. While the physical devices might be different, the attention system ensures Alexa behaves predictably and with familiarity. This consistency creates customer trust and strengthens the customer’s understanding of Alexa.
Alexa’s attention system is comprised of non-verbal audio and visual components that work together to communicate all of Alexa’s different states to the customer. Color, sound, and animation are critical for effectively communicating Alexa's state. Audio and visual cues must be synced so that Alexa’s state change indicators occur simultaneously as the customer wakes, speaks to, and listens to Alexa.
Start of Request (wake) and End of Request (endpointing) sounds give customers confidence and clarity about when Alexa is listening without them needing to take their eyes off the road. All sounds mentioned here are provided in the Alexa Automotive Design Toolkit.
(Required) Play the Start of Request sound immediately after the wake word is detected.
This allows the customer to know when Alexa is listening without looking at the screen. This sound is required to play when visual cues display the Listening state.
(Required) Play the Touch Start of Request sound immediately after a press of the PTT or TTT button.
This allows the customer to know the system is listening without looking at the screen. This sound is required to play when visual cues display the Listening state.
(Required) Play the End of Request sound at the end of speech input.
This sound allows the customer to know your assistant has heard their request without looking at the screen. This sound is distinct from the Start of Request sounds, and is required to play when the visual cues exit the Listening state.
(Required) Allow customers to turn off the Start and End of Request sounds under the Settings menu.
See Menu and settings for more info.
(Required) Use Alexa’s sounds only for Alexa features.
Don’t use Alexa's sound cues for any other interactions, including other speech systems or voice assistants.
(Required) Display the Alexa voice chrome when the customer invokes Alexa.
Voice chrome is a visual indicator of Alexa’s attention system and is displayed whenever the customer interacts with Alexa by voice. Use linear voice chrome, as it works best with Alexa’s Display Cards and does not obscure other on-screen content.
Voice chrome should reflect that Alexa is seamlessly integrated into the vehicle’s IVI and is not limited to a single app. Place voice chrome along the bottom edge of the screen as an overlay that does not cover the entire display. This provides a less jarring experience when invoking Alexa, and makes for a more seamless integration with the vehicle.
- Place linear voice chrome along an edge of the screen, preferably at the bottom.
- Don’t use a full-screen overlay or popup with voice chrome.
- Overlay any current IVI screens, e.g. Navigation.
(Required) Use only Alexa brand graphics to indicate that Alexa is listening.
Except for physical PTT buttons (e.g. on the steering wheel), don't use additional icons to invoke or represent Alexa. Use only Alexa icons and voice chrome to represent Alexa.
Attention system states
Attention states address the personality of Alexa at a high level across all domains. The Core Alexa states are: Idle, Listening, Thinking and Speaking. For products with visual cues, it is required that these states are distinguishable from each other.
The Idle state can be considered Alexa’s default state. No visual voice chrome elements are displayed in this state, in contrast with all other states. This communicates Alexa is passively waiting for a request and not actively communicating.
The Listening state starts when Alexa has been invoked via wake word, PTT or TTT and the microphone begins streaming the customer’s request to the Alexa Voice Service. There are 3 stages to the Listening state:
- Start Listening - Alexa transitions from Idle to the Listening state and waits for a request from the customer.
- Active Listening - When the customer begins speaking, Alexa transitions into an Active Listening state. If she doesn’t hear anything from the customer, Alexa returns to the Idle state.
- End Listening - When the customer's end of speech is identified, Alexa transitions out of Listening state.
(Required) In multi-turn interactions, the Start of Request sound must play each time the mic opens during the interaction. The End of Request sound must play each time the mic closes.
When a customer completes a request, Alexa enters the Thinking state. This state lets the customer know the microphone is no longer active and Alexa is processing their request.
The Speaking state is displayed when Alexa is responding to a request with text to speech (TTS). This state is not displayed when Alexa is responding with long running mixable media such as music, books and Flash Briefings.++
(Required) Do not duplicate Alexa voice chrome or supplement with other attention state signifiers.
The Alexa voice chrome is a branded and established design pattern to convey attention states. Do not surface multiple instances of voice chromes OR use other signifiers (such as icons and texts for different states).
|State||Description||Voice chrome||Colors Blue #214CFB Cyan #05FEFE Red #FC361D||Icons||Sound Cues|
|Idle||Alexa is available through invocation methods. No visuals are displayed on screen.||No visual indicators.|
|Listening Start||Voice chrome appears and a sound cue plays once when the customer wakes Alexa and the microphone becomes active.||Blue, Cyan||If woken by touch: Touch Start of Request sound
If woken by voice Start of Request sound
|Listening Active||Voice chrome persists while Alexa is capturing speech from the customer. When end of speech is detected, a sound cue plays and voice chrome transitions to Thinking state.||Blue, Cyan||At the end of listening: End of Request sound|
|Thinking||Voice chrome plays in a loop while Alexa is processing, or 'thinking about' what the customer has said. Displaying this state ensures that the customer understands that the interaction has not ended.||Blue, Cyan|
|Speaking||Voice chrome plays in a loop while Alexa is responding to the customer via TTS.||Blue, Cyan|
|Microphone Off||Indicates that the customer has turned off hands-free listening. Display voice chrome for 5 seconds after customer turns off Alexa Hands-Free (AHF).||Red|
Note: Alexa voice chrome is available as part of the Alexa Auto SDK.
(Required) Vehicles with restricted modes must disable Alexa in those modes (e.g. valet mode or for guest drivers).
Customers expect Alexa to protect their privacy. Disable invocation and access to Alexa when restricted modes are activated in the vehicle’s system (e.g. valet mode).
Example: Driver pulls up to a hotel and quickly turns on “valet mode”. The valet gets into the vehicle and is unable to use Alexa because “valet mode” is enabled. This ensures the valet can not access the customers private information via Alexa.
This requirement does not apply to vehicles that do not have restricted modes