Design with Alexa Conversations
Alexa Conversations is offered as a beta and may change as we receive feedback and iterate on the feature. Alexa Conversations currently supports
Alexa Conversations is an AI driven approach to dialog management. The technology uses your ideal dialog paths as a training model to simulate a wide range of customer responses and unexpected turns in a conversation. Instead of building your skill with an interaction model that requires intent-based dialog management, the AI in Alexa Conversations does the work for you. The AI manages the conversational context and carries it across dialog turns, supporting choices and corrections made by the user. Users can interact with Alexa as they would with another person, using the phrases they prefer in the order they prefer for more natural conversational patterns. The following sections provide design best practices for using Alexa Conversations to create skills.
Benefits of Alexa Conversations
People don’t speak in flowcharts. Conversation between two or more parties is nuanced, unscripted, and can span many topics. Designing for many potential paths that a user can take in your skill is a manual, labor-intensive process. With Alexa Conversations, Alexa's AI can determine variations in phrasing, context, and corrections for you. This way, you can focus on designing the foundational pieces of dialog for your skill.
Alexa Conversations and intent-based models
Alexa Conversations — Useful for skills where the conversation can go in different directions, allowing users to interact with Alexa naturally using the phrases they prefer. Rather than manually designing and building out all possible conversational experiences, you provide the sample dialogs and indicate which application program interfaces (APIs) to call to train the AI to come up with those variations for you. Alexa Conversations monitors the context of the conversation, gathering information needed to complete a task while keeping the conversation flowing naturally.
The Intent-based model — Better tool for tightly directed, sequential multi-turn skills. These are skills where the user must complete certain steps to reach their goal. For example, in a sequential game, the user has to find a lantern, go north on a trail, and then slay a dragon to win their reward. If they can’t find the lantern, they can’t get to the trail, and so on. You should choose the intent-based model when you build a skill where you need to control every turn to make sure that the user follows the correct narrative arc to reach a goal.
Hybrid modeling - Mixed approach where Alexa Conversations works in tandem with an intent-based model for specific requests. Consider this option for adding Alexa Conversations to existing skills. Hybrid modeling works in both directions: you can hand off requests from Alexa Conversations to an intent and vice versa. For example, in a pizza ordering skill the user can choose from a preset menu of specialty pizzas or customize their own. Because the specialty pizzas have little ambiguity, the intent-based model works well for this side of the skill. For the customized orders, the user can present many possible scenarios and permutations that Alexa Conversations can effectively manage.
Context carryover and pronouns
When people speak, they often jump ahead in a conversation or refer back to a previous discussion point. As the conversation goes on, they often refer back to a previous topic by using pronouns such as "it" or "that" to avoid repetition. After you train Alexa Conversations around your dialog sets, the AI handles the inferred context so that the dialog can continue in a natural flow. The following example from a pizza ordering skill shows context switching and use of pronouns.
User: I'd like a medium 2-topping pizza.
Alexa: Medium 2-topping. Got it. On what type of crust?
User: How many people can that feed?
Alexa: The medium usually feeds around 2 adults.
User: Make it a large.
Alexa: Large 2-topping, got it. What type of crust would you like?
Navigate without complex queries
Navigation and browsing can become awkward and unnatural in a voice interface. A user has to rethink a normal question to provide precise, formulaic queries to get what they want. Alexa Conversations enables progressive filtered search to narrow the results for the user so they don’t have to repeat themselves. Here’s an example of a skill that helps you decide what movie to watch.
User: What adventure movies do you have?
Alexa: Here are some adventure movies: Adrift, Jumanji: Welcome to the Jungle, and Black Panther.
User: Which ones are rated PG-13?
Alexa: The PG-13 rated ones are: Jumanji: Welcome to the Jungle and Black Panther.
User: Which ones have The Rock in them?
Alexa: The Rock stars in Jumanji: Welcome to the Jungle.
In the example, Alexa Conversations remembered the previous search results (Adrift, Jumanji: Welcome to the Jungle, and Black Panther) and applied the filters such as ratings (the user asked for "PG-13") and actors ("The Rock") to those search results only.
Handle over or under-answering
Many users encounter voice experiences that are inflexible in a conversation; limited to one topic or question per turn. Users can find this experience time consuming and frustrating because they want to give extra information or they anticipate further related questions. Alexa Conversations handles users who under or over answer by allowing the skill to fill some or all of the slots at one time. Alexa can jump to the next logical question or prompt because the user already filled previous slots with information. The following example shows how an ordering skill handles over answering. In this case, the skill asks for the size but the user gives more information (size and toppings). With Alexa Conversations, the skill takes the user’s input, and then moves on to gathering the next piece of information.
Alexa: What size pizza do you want?
User: I want a large pizza with pepperoni, mushrooms, and extra cheese.
Alexa: Large pie with pepperoni, mushrooms, and extra cheese - got it. Do you want a thick or thin crust?
Make corrections without starting over
Instead of specifying where and how a correction can take place in the flow, Alexa Conversations handles it for you by using contextual memory. Users can simply course correct in one step without a problem. The following example shows a correction in a booking skill.
Alexa: I have you checking into the hotel on Tuesday at 1:30 PM. Will you need transportation?
User: Actually, make that 9 AM.
Alexa: I have you checking into the hotel on Tuesday at 9 AM. Will you need transportation?
How to design for Alexa conversations
Alexa Conversations helps you design great conversations where you can focus on the highest value parts of your experience. Here are some best practices and ways to get started with Alexa Conversations that can help you design your workflow, write dialogs, and set user expectations.
Outline your skill's functionality
First, define some core traits of your skill, which you will use as your roadmap when you start your build. Define the following traits:
What’s the purpose of your skill? The purpose is the guiding principle that your dialogs should align to that trains the AI. For example, the pizza skill lets you order custom pizza pies delivered to your door from the comfort of home.
What is the user’s goal? What do they get out of this experience? Goals help you identify your APIs. For example, users can order a pizza pie with preset options or customize a pie to their liking.
What actions can the user take in your skill? These actions will identify slots and slot types. For example, users have the option to choose set combinations or customize the number of pizzas. They can also choose the size, crust, and toppings.
From these outlined goals and actions, you can focus on the primary uses cases. These use cases include the happy path and a few alternates. You don’t need to find every permutation and edge case.
Design flexible flows
When designing flows, you think through the paths a user can take during the conversation. Focus on what success looks like; the happy path dialog and the user goals. Your dialog helps you determine what utterances and responses are part of the experience. The user goals outline what APIs you must call, as well as the criteria, or slots, for fulfilling the API requests. Alexa Conversations handles the context switching and corrections, giving the skill flexibility to adapt to many situations. However, you must also outline where a user might diverge from the happy path and need help with getting back to the optimal experience. Some questions to keep in mind when you build flows:
- What’s the most likely scenario to design for (and train the AI with) where the user diverges from the happy path?
- What if the user says the opposite of what is expected (for example, "yes" instead of "no")?
- What happens if the API responds in a way that fails to achieve the goal?
- What if the user didn’t respond to a prompt? How should the skill attempt a re-prompt before the session ends?
Dialog writing tips
Write dialogs that cover the core experience of your skill
You don’t need to write out every possible variation. Your dialogs are samples of what a user might say to help train the AI for a more natural, flexible conversation that can switch context easily. The AI figures out the variations for you.
Hand-hold through the experience
Users often expect voice experiences to be constrained, step-based experiences. They think they must answer a game of 20 questions and then they eventually get to the end goal. Change expectations; ask open-ended questions.
Train Alexa Conversations to do the work for you
Spend your time to design the foundational dialogs of your skill that will train the AI. Add dialogs, test, and iterate.
Design for corrections or context switching
The AI handles these elements for you as long as you provide your sample dialogs to train it. If something’s wrong, go back and fix your core dialogs used to train the AI.
End the prompt in a question
This technique both instructs the user it's their turn to speak and guides a user towards the goals of your skill.
Alexa: “We have plain, wheat, and garlic crusts. Which kind of crust would you like?”
End a prompt with options the user can pick
The user might interrupt before Alexa lists the options.
Alexa: “Which kind of pizza crust would you like? Plain, wheat, or garlic.”
Keep lists under four items.
On average, humans can only retain four objects within their working memory at a time.
Alexa: “What toppings would you like on your pizza?”
User: “I’d like pepperoni, sausage, and olives.”
Alexa: “Would you like black or green olives?”
User: “Black olives, thanks.”
Alexa: “Alright, I’ve got a thin crust pizza with pepperoni, sausage, and black olives. Anything else?”
Overwhelm the user with options
Users get fatigued when they try to remember everything.
Alexa: “What toppings would you like? We have pepperoni, sausage, extra cheese, anchovies, onions, extra garlic, ham, white mushrooms, cremini mushrooms, peppers, hot peppers, pineapple, broccoli, spinach, green olives, and black olives.”
User: “Uh... What?”
Always explicitly acknowledge a request
Users must know that they’re heard correctly.
Alexa: “Your flight leaves at 7:30 AM from SeaTac Airport. Do you want me to set a reminder for you?”
User: “How far away is Seattle from SeaTac?”
Alexa: “Seattle is 20 miles away.”
Alexa should avoid using pronouns to confirm requests
Although Alexa Conversations allows you to use pronouns in user utterances, use pronouns sparingly when Alexa speaks.
Alexa: “OK so I’ve got a Brooklyn pie and a family size Caesar salad. Anything else?”
User: “How much is the family size?”
Alexa: “It’s 12.95. Do you want a different one?”
Help users discover new capabilities
Teach the user about what’s possible, and then let the user discover more natural conversational patterns. Give hints for first-time users, when a user needs help, or when you add new features.
Alexa: “You can change your city name at any time.”
Prescribe exactly what to say
Don’t teach users how to say utterances in a particular way. Doing so sets the expectation that this is the only way to interact in the skill.
Alexa: “You can change your city by saying ‘Alexa change my city to” followed by your city name.”
Proactively offer more when a goal is completed
With Alexa Conversations, you can proactively engage users about other features to use that they might not discover on their own.
Offer during the experience
Nobody likes to be hassled. Don’t complicate things when a user is trying to act on a goal already.
Video: Best practices for VUI design in Alexa Conversations
Set the right expectations
Alexa Conversations comes with many new benefits and capabilities that users might not be aware of. You need to shift expectations and offer ways for users to interact with these new features.
Drop hints - Design flows for the first time that the user takes advantage of one-step correction, pronouns, topic changes, etc. Users might need a little direction first. Provide either a hint or example dialog snippet of that experience. Here are some examples with the hint in italics.
User: Alexa, I'd like a pizza.
Alexa: Great. Let’s add some toppings. Feel free to add multiple toppings by listing them out. For instance, “I’d like sausage, pepperoni, and mushrooms.” What toppings would you like?
Alexa: We’ve chosen the destination of Maui. Remember, you can change your destination at any time. Now what hotel would you like?
Reset expectations - If you add Alexa Conversations to an existing skill, your users can have some preset expectations about how your skill works. Prepare them by briefing them on the new features. For instance, in a movie recommendation skill, you can quickly drop a hint that something’s different.
Alexa: Welcome back, I’m trying something new to get you to recommendations quicker.
First time and repeat experiences - A first-time user requires onboarding into the skill so that the user understands the goals and capabilities. After the user learns the system, there’s no reason to repeat the onboarding. Repeat users don’t need dialog hints and snippets to learn the system. That content is a waste of good voice time and screen real estate. Keep interactions brief and pertinent to the situation at hand.
Use simple responses first
Write out your user utterances and Alexa’s responses directly in the developer portal in simple text first. The dialog is the core of your experience. It’s better to start in low fidelity without having to worry about all the features of Alexa Presentation Language (APL) for Audio, such as adding background music or changing the pitch. After you have the back and forth of your high-value experiences as dialog, you’re ready to fine tune the dialog for richer audio. For details, see the Alexa with APL for Audio (APL-A) reference.
From design to build
The following table shows a set of building blocks for Alexa Conversations. These building blocks to teach the AI how your skill works so it can fill in the gaps for you.
The conversation between the user and Alexa, a scripted set of turns back and forth. Each conversational turn starts with the user and ends with Alexa. For details, see Write Dialogs for Alexa Conversations.
An outline of the various ways a user could make requests to your skill. For instance, a user who asks about the weather might say, "What's the weather in Seattle?" Or, "Can you tell me the weather in Seattle?" Perhaps even, "How about Seattle's weather?" The utterance set should contain a number of variants for how users might phrase a request or response. All utterances in the same set must contain the same slots (or none at all). In the previous example, you use the
The output from Alexa to the user as an audio response. You create these responses by using APL for Audio, which recognizes TTS, SSML, and multi-layered audio that could be a mix of audio files and TTS. You specify visual and touch responses on devices by using Alexa Presentation Language (APL) to trigger an Alexa response. For details, see Define Responses from Alexa for Alexa Conversations.
Slots and slot types
When you want to collect certain data elements within a conversation, you can set them as a slot. For example, if a user asks, "What's the weather in Seattle tomorrow?" The skill would identify two slots: "Seattle" and "tomorrow" which would map to the
If your skill calls on an application program interface (API) for data, you must define what kinds of requests it can handle in the experience. APIs map to actions that a user can take in your skill. You must define the input and output slots for the API. For example, in the pizza ordering skill, you must define an API that orders a pizza and gives information about daily deals. That API request might be called
Put it all together
After you outlined your design, you can start building it out in Alexa Conversations directly.
- Write out a few sample dialogs, outlining a few variations in those dialog paths between the user and Alexa.
- Annotate your dialogs or the first turn, when Alexa asks for information, when the user provides information, or when Alexa Conversations invokes an API.
- Split your dialog into utterance sets and responses.
- Identify slots and slot types in your dialogs.
- Define your APIs that the skill calls when slots are filled.
For details, see Tutorial: Annotate a Dialog for Alexa Conversations.
The Alexa Conversation AI takes over from there to extrapolate the phrasing variations and their respective dialog paths for you. The AI has your edge cases covered, as well as taking on the task of dialog state, context management and carryover, managing lists, handling corrections, and invoking APIs.
Video: Getting started with Alexa Conversations dialog annotation
Get started using Alexa Conversations
Take the Alexa Conversations Tutorial for a practice run or check out some Alexa Conversations skill code samples in GitHub. When you’re ready to start creating or modifying an existing skill using Alexa Conversations, visit the Alexa Developer Console to enable Alexa Conversations and publish skills in the en_US locale.
- Alexa Conversations Technical Documentation
- Video: Alexa Live 2020 - Build Conversational Interfaces Faster with Alexa Conversations
- Video: Alexa Conversations Library
- The Science behind Alexa Conversations
- Baddeley, A. (1994). "The magical number seven: Still magic after all these years?" Psychological Review, 101(2), 353–356.