Best Practices for Text Responses


A skill response typically includes a text string that Alexa converts to speech and speaks to the user. Review the following best practices for text responses and reprompts to tell Alexa what to do next.

Text-to-speech guidelines

To return a voice response, you include the text that the Alexa service converts to speech. To create a natural speaking rhythm, Alexa uses punctuation to alter the intonation of words and add a slight delay after the punctuation mark.

If you want more control over how Alexa generates speech from the text in your response, you can add Speech Synthesis Markup Language (SSML) tags to the text. You can use SSML to direct Alexa to play longer pauses, speak with emphasis, change pronunciation, and more. For details about SSML capabilities, see SSML Reference.

Use the following guidelines to build your text string:

  • The text string must be less than 8000 characters.
  • Use supported punctuation marks: comma, period, and question mark.
  • Don't include special characters, such as HTML, XML.
  • Escape quotation marks used to surround SSML attributes, or use an appropriate mix of single and double quotation marks.
  • Use plain language and simple prompts.
  • Keep responses brief. Use the fewest words to convey the most meaning.
  • Use active voice.
  • Use contractions.
  • If you're presenting more than one option, keep the options short and easy to understand.

Punctuation and pauses

Alexa uses punctuation marks, such as, comma (,), period (.), and question mark (?), to pause appropriately within and between sentences. The comma offers the shortest pause and the period provides a bit longer delay. The question mark creates a pause similar to the period, but also influences the intonation of the sentence, making it fall or rise depending on the type of the question. By doing this, Alexa can simulate the change in speech people naturally have when they ask a question and await a response. You can also add longer pauses and breaks with SSML.

For more details on voice design best practices, see Alexa Design Guide.

Examples

To return a voice response, you include the outputSpeech property in the response. For more details, see OutputSpeech object.

Plain text example

The following example shows a response that includes plain text in the output speech. Alexa speaks the last sentence as a question.

Copied to clipboard.

This code example uses the Alexa Skills Kit SDK for Node.js (v2).

Use the speak method on the ResponseBuilder object to define the speech text. The getResponse() method returns the response with the specified properties.

return handlerInput.responseBuilder
  .speak("Today will be sunny with a high of 62 degrees. Do you want to know the weather for tomorrow?")
  .getResponse();

Copied to clipboard.

This code example uses the Alexa Skills Kit SDK for Python.

Use the speak method on the ResponseFactory object to define the speech text in the response.

from ask_sdk_core.dispatch_components import AbstractRequestHandler
from ask_sdk_core.utils import is_intent_name
from ask_sdk_core.handler_input import HandlerInput
from ask_sdk_model import Response 
 
class HelloWorldIntentHandler(AbstractRequestHandler):
    """Handler for Hello World Intent."""
    def can_handle(self, handler_input):
        # type: (HandlerInput) -> bool
        return is_intent_name("HelloWorldIntent")(handler_input)
 
    def handle(self, handler_input):
        # type: (HandlerInput) -> Response
 
        speech_text = "Today will be sunny with a high of 62 degrees. Do you want to know the weather for tomorrow?"
     
        return handler_input.response_builder.speak(speech_text).response

Copied to clipboard.

This code example uses the Alexa Skills Kit SDK for Java.

Use the withSpeech() method on the ResponseBuilder object to define the speech text. The build() method returns the response with the specified properties.

@Override
public Optional<Response> handle(HandlerInput handlerInput, IntentRequest intentRequest) {

    String speechText ="Today will be sunny with a high of 62 degrees. Do you want to know the weather for tomorrow?";

    return handlerInput.getResponseBuilder()
        .withSpeech(speechText)
        .build();
}

Copied to clipboard.

This JSON response shows how you return a simple plain text outputSpeech string.

{
    "outputSpeech": {
      "type": "PlainText",
      "text": "Today will be sunny with a high of 62 degrees. Do you want to know the weather for tomorrow?"
    }
}

SSML example

The following example shows a response with SSML tags in the output speech. In this example, Alexa speaks the numbers as a phone number.

Copied to clipboard.

This code example uses the Alexa Skills Kit SDK for Node.js (v2).

Use the speak method on the ResponseBuilder object to define the speech text. In this example, the speech string contains SSML tags. The getResponse() method returns the response with the specified properties.

return handlerInput.responseBuilder
  .speak("<speak><p>You can call the restaurant to order.</p><p>Their number is <say-as interpret-as='telephone'>2025551212</say-as>.</p></speak>")
  .getResponse();

Copied to clipboard.

This code example uses the Alexa Skills Kit SDK for Python.

Use the speak method on the ResponseFactory object to define the speech text in the response. In this example, the speech_text string contains SSML tags.

from ask_sdk_core.dispatch_components import AbstractRequestHandler
from ask_sdk_core.utils import is_intent_name
from ask_sdk_core.handler_input import HandlerInput
from ask_sdk_model import Response 
 
class HelloWorldIntentHandler(AbstractRequestHandler):
    """Handler for Hello World Intent."""
    def can_handle(self, handler_input):
        # type: (HandlerInput) -> bool
        return is_intent_name("HelloWorldIntent")(handler_input)
 
    def handle(self, handler_input):
        # type: (HandlerInput) -> Response
 
        speech_text = "<speak><p>You can call the restaurant to order.</p><p>Their number is <say-as interpret-as='telephone'>2025551212</say-as>.</p></speak>"
     
        return handler_input.response_builder.speak(speech_text).response

Copied to clipboard.

This code example uses the Alexa Skills Kit SDK for Java.

Use the withSpeech() method on the ResponseBuilder object to define the speech text. In this example, the speechText string contains SSML tags. The build() method returns the response with the specified properties.

@Override
public Optional<Response> handle(HandlerInput handlerInput) {

    String speechText ="<speak><p>You can call the restaurant to order.</p><p>Their number is <say-as interpret-as='telephone'>2025551212</say-as>.</p></speak>";

    return handlerInput.getResponseBuilder()
        .withSpeech(speechText)
        .build();
}

Copied to clipboard.

This JSON response shows how you return a simple plain text outputSpeech string. In this example, the text string contains SSML tags.

{
    "outputSpeech": {
      "type": "PlainText",
      "text": "<speak><p>You can call the restaurant to order.</p><p>Their number is <say-as interpret-as='telephone'>2025551212</say-as>.</p></speak>"
    }
}

When you provide SSML, make sure that you enclose the text within the <speak> tags and use single quotations between the speak tags to mark attributes. In the following example, Alexa pauses between digits.



Was this page helpful?

Last updated: May 30, 2023