Receive Voice Input to a Gadget Skill
Unlike typical custom skills, skills for Alexa Gadgets must be able to handle two types of input from the user: the user's voice, and gadget input. For example, a robot gadget might enable the user to physically raise the robot's arm or, alternatively, say, "Raise my robot's arm."
To monitor gadget input, your skill must send a CustomInterfaceController.StartEventHandler
directive in response to any request from Alexa. Your skill must also be prepared to handle voice input at any time. This topic describes how your gadget skill should respond to Alexa based on whether the skill specifically expects voice input.
Types of voice input
When a skill is in session, there are two ways that users can provide voice input to the skill. If the microphone is open, the user can speak to the skill directly, without prefacing the speech with "Alexa." If the microphone is closed, the user can still provide voice input, but must preface their request with "Alexa."
Listening for voice input
This section contains information about how your skill can prepare to receive voice input and gadget input.
Opening the microphone
If your skill specifically expects voice input, your skill should do the following, for the best user experience:
- Include text-to-speech (TTS) that asks the user a question.
- Set
shouldEndSession
tofalse
. This preserves the current session and opens the microphone at the end of the response, so that it is ready for the user to speak another intent. Note that if the user doesn't respond, Alexa will issue a reprompt (if you provided one) but if the user still doesn't respond, the session will close.
Without opening the microphone
It is common for gadget skills to reach a point at which they need to keep the session open, but do not want the microphone to open and attempt to recognize speech. For example, a skill might give the user time to solve a puzzle by pressing lighted buttons on a gadget.
If your skill expects gadget input but does not specifically expect voice input, your skill should do the following, for the best user experience:
- Include text-to-speech (TTS) that lets the user know what to do.
- Make sure that the directive doesn't include a value for
shouldEndSession
. - Include a
CustomInterfaceController.StartEventHandler
directive to start an event handler to monitor gadget input. -
If the
expiration.durationInMilliseconds
specified in yourCustomInterfaceController.StartEventHandler
directive is longer than 5 seconds, the response must also include an audio file so that the user knows that the skill is still in session. The following example shows how to play a 30-second ticking sound, which is available for you to use:<speak> Ready, set, go! <audio src="https://s3.amazonaws.com/ask-soundlibrary/foley/amzn_sfx_rhythmic_ticking_30s_01.mp3" /> </speak>
Sound Effect SSML Rhythmic ticking (30s)
<audio src='https://s3.amazonaws.com/ask-soundlibrary/foley/amzn_sfx_rhythmic_ticking_30s_01.mp3'/>
Example
The following example is a typical occurrence within a trivia game that uses lighted push-button gadgets. Alexa asks a trivia question, a user presses their button to buzz in, and then the user answers the question. In this case, the following interactions occur:
-
Alexa asks the trivia question – To set this up, the skill's response includes speech and a
CustomInterfaceController.StartEventHandler
directive. The response doesn't include a value forshouldEndSession
. This setup waits for gadget input without opening the microphone. -
A user buzzes in – Alexa sends a
CustomInterfaceController.EventsReceived
request to the skill to notify it of the gadget input. -
Alexa prompts the user for an answer – The skill responds to the
CustomInterfaceController.EventsReceived
request with a response that setsshouldEndSession
tofalse
to open the microphone for the user to say the answer. The skill might also include speech such as "Player 1, what's your answer?"
Event handlers and reprompts
As with any Alexa skill, your gadget skill can specify a reprompt in its response. Alexa speaks the reprompt if the microphone has been open for a few seconds without user input. After a few seconds, Alexa speaks the reprompt and the microphone opens for a few more seconds.
If, after the reprompt, the user still doesn't respond, the microphone turns off and the session closes. To notify the skill about the session closure, Alexa sends the skill a SessionEndedRequest
, but the skill isn't given a chance to reopen the session or interact with the user in any other way. If the skill didn't specify a reprompt at all, the skill exits after the initial few seconds without user input.
With gadget skills, there are additional things that you need to consider when working with reprompts:
- A gadget event of any type (
CustomInterfaceController.EventsReceived
orCustomInterfaceController.Expired
) cancels any pending reprompt. - You cannot rely solely on the reprompt feature of the text or SSML response to reprompt the user to interact with the gadget. You must send a
CustomInterfaceController.StartEventHandler
directive to monitor for gadget input. You can send a response to theCustomInterfaceController.Expired
request to reprompt the user for input.
Last updated: Feb 14, 2022