Add Voice Control and Speech to the Game


You can use the Alexa Web API for Games to add Alexa speech and voice commands to your web-based game. For more details about the Alexa Web API for Games, see About Alexa Web API for Games.

Add Alexa interactions to your game

You can add speech to your web-based game so that Alexa talks to the user while they interact with the game. The Alexa speech might be to respond to user interactions or share information about what's happening in the game. Alexa can also prompt the user for a spoken response, as described later.

For example:

User touches the "Fire" button on the screen.
Alexa: Firing the torpedoes…. (sound effects)…. Sorry, looks like you missed. You'll have to wait until your next turn to try again! (As Alexa speaks, the display on the web app changes.)
Web app presents new graphics and waits for the user's touch input.

To make Alexa speak to the user

  1. In the web app, call alexa.skill.sendMessage() to send the skill a message.
  2. In your skill code, create a handler for the Alexa.Presentation.HTML.Message request generated by the sendMessage() call.
    This handler returns a response with:
    • The outputSpeech Alexa should say.
    • The shouldEndSession property left undefined (not set).

    This response tells Alexa to speak the text, and then leave the session without opening the microphone.

  3. In the web app, register listener functions to respond to Alexa events.
    Alexa notifies your app when speech starts and stops:

Prompt the user for voice input

Your web app can make Alexa prompt the user for voice input during the game, such as in response to a button press in the game. For example:

User touches the "Fire" button on the screen.
Alexa: Firing the torpedoes… (sound effects)…. Sorry, looks like you missed. Do you want to try that again?
Alexa opens the microphone to listen to the user's response.
User: Yes (The skill gets a normal intent from the interaction model, such as AMAZON.YesIntent.)

Game continues….

When you decide how to prompt the user for voice input, it's important to consider what methods they have for initiating speech on their own. If their device has a button available that gives the user push-to-talk functionality, it might be more natural for them to initiate conversations by pressing or pressing-and-holding the button. Alternatively, if the device supports wake-word activation, the user might be able to use the wake word to play your game hands free. You can determine what methods the device supports by using the capabilities interface.

To prompt the user for voice input

  1. In the web app, call alexa.skill.sendMessage() to send the skill a message.
  2. In your skill code, create a handler for the Alexa.Presentation.HTML.Message request. This handler returns a response with:
    • The outputSpeech Alexa should say.
    • A reprompt to use if the user doesn't respond.
    • The shouldEndSession property set to false.

    This response tells Alexa to speak the text, and then open the microphone for the user's response.

  3. In the web app, register listener functions to respond to Alexa events. Alexa notifies your app when speech starts/stops and when the microphone opens/closes:

These steps trigger a normal Alexa skill interaction. Alexa speaks the outputSpeech, and then opens the microphone for a few seconds to listen for the user's response. If the user's response isn't understood, Alexa speaks the reprompt, and then opens the microphone again. If the user still doesn't respond, Alexa closes the microphone, but keeps the session open because the web app is still displayed on the screen.

After the user responds to the prompt with an utterance that resolves to an intent in your model, your skill gets an IntentRequest. An intent handler in your skill should handle this request. For example, your intent handler might return a response that contains:

  • An Alexa.Presentation.HTML.HandleMessage to tell the web app relevant information from the user's spoken response.
  • (optional) outputSpeech if you want Alexa to say something to the user.
  • The shouldEndSession property set to either undefined (when you don't need to open the microphone for another response) or false (when you do want to open the microphone for additional spoken input).

Finally, in your web app, call alexa.skill.onMessage() to register a callback to respond to the incoming message.

Get user-initiated voice input

When your web app is on the screen, the user can use the wake word to speak to Alexa at any time. Your skill should expect user-initiated voice input while the web app is active.

User touches the screen to select several targets. Web app responds with normal sound effects and graphics.
User: Alexa, fire at the targets! (Because the skill session is open, the user can invoke an intent in your skill with just the wake word and an utterance.)

Skill receives an IntentRequest corresponding to the "fire at the targets" utterance.
Alexa: Roger. Firing the torpedoes now!
Your web app responses with sound effects and graphics.

To get user-initiated voice input

  1. In your skill's interaction model, add intents with sample utterances that users might speak when playing your game.
  2. In your intent handlers for these intents, return:
    • An Alexa.Presentation.HTML.HandleMessage to tell the web app relevant information from the user's spoken request.
    • (Optional) outputSpeech if you want Alexa to say something to the user.
    • The shouldEndSession property set to either undefined (when you don't need to open the microphone for another response) or false (when you do want to open the microphone for additional spoken input).
  3. In your web app, call alexa.skill.onMessage() to register a callback to respond to the incoming message.

Use transformers to render voice natively in HTML

The Alexa.Presentation.HTML.Start and the Alexa.Presentation.HTML.HandleMessage take an optional transformers array. A transformer converts either Speech Synthesis Markup Language (SSML) or plain text into an audio stream and provides a URL to that stream to your web app. You can use the fetchAndDemuxMP3 method in your skill backend to extract the audioBuffer and speechMarks from the output speech. You can synchronize visuals and web audio with the speechMarks so you can have finer control over the synchronization than is possible with just the speech callbacks.


Was this page helpful?

Last updated: Nov 23, 2023