Stream Long-Form Audio with AudioPlayer
You can add long-form audio, such as podcasts, news stories, and live streams, and monitor playback to a custom skill by using the AudioPlayer interface. You provide a URL to the audio stream, and Alexa plays the audio to the user. Along with the audio, you can provide a background image to show on Alexa-enabled devices with a screen. You can send audio directives to play and stop the audio. And, Alexa can provide your skill with information about the playback state, such as when playback starts and stops, when the track is near complete, or when the user pauses the audio.
Complete the following steps to add the AudioPlayer interface to your skill.
For other audio options for custom skills, see Add Audio to a Custom Skill.
Example utterances for long-form audio
An ideal audio skill should play error-free and uninterrupted audio files. The skill should fulfill the customer request and play content relevant to the skill description. The following example shows a high-quality experience with an audio skill.
User: Alexa, open My Radio Player.
My Radio Player: Welcome to My Radio Player. To listen to live music, say, "Play live music". To view all available playlists, say, "View play lists".
User: Play live music.
The skill plays live music.
When your skill sends a
Play directive to begin playback, Alexa plays the audio stream at the specified URL. During audio streaming, users can control playback without the skill invocation name. In the response that includes the
Play directive, set the
shouldEndSession flag to
true to end the session. If you set this flag to
false, Alexa sends the stream to the device for playback, and then immediately pauses the stream to listen for the user's response.
Your skill should persist information about the audio stream and the skill session so that the
context object to get details, such as the
Amazon recommends that your skill persist attributes related to the skill session, such as the audio stream file and
If the skill session ends during audio streaming, Alexa remembers that your skill started the audio stream and sends voice and tap playback requests to your skill. However, if the user does one of the following actions, Alexa no longer remembers that your skill played the previous audio stream and the user must use the skill name again:
- Invokes audio playback with a different skill.
- Invokes another service that streams audio, such as the built-in music service or a Flash Briefing.
- Reboots the device.
The following example for a custom skill called, "My Radio Player," defines an intent
PlayLatestEpisode mapped to the sample utterance "play the latest episode."
User: Alexa, ask My Radio Player to play the latest episode.
Alexa opens a new skill session and sends the My Radio Player skill the normal
My Radio Player sends a
Play directive. The skill session closes and audio begins playing.
User: Alexa, next. (No invocation name used.)
Alexa opens a new skill session and sends the My Radio Player skill
My Radio Player takes the appropriate action for "next" and closes the skill session.
User: Alexa, pause. (Again, no invocation name.)
Alexa opens a new skill session and sends the
AMAZON.PauseIntent to the skill.
My Radio Player sends a
Stop directive, and then closes the skill session. Alexa stops the audio streaming.
At this point the audio isn't playing and there is no current session. However, the Alexa service continues to track "My Radio Player" as the last skill that streamed audio. As long as the device remains on and the user doesn't use any other audio streaming skills or services, the next example can take place at a later time without the skill name invocation.
User: Alexa, resume. (No invocation name used.)
Alexa opens a new skill session and sends the
AMAZON.ResumeIntent to the My Radio Player skill.
My Radio Player determines the previously track and sends a new
Play directive to restart playback.
AMAZON.PauseIntentinstead of the
Audio player on Alexa-enabled devices with a screen
By default, during audio streaming, Alexa-enabled devices with a screen show an audio player with a plain background and the skill name.
You can customize the screen by including album art, a background image, track title, and subtitle metadata with the
For both the default and custom backgrounds, when the user taps the screen, the screen shows tap controls, such as next , previous , and pause .
To use the
AudioPlayer interface, your custom skill must meet the following prerequisites.
Audio stream URL requirements
To use the
AudioPlayer interface, your audio stream URL must meet the following requirements:
- You must host the audio file at an Internet-accessible HTTPS endpoint on port 443.
- The web server must present a valid and trusted SSL certificate. Self-signed certificates aren't allowed. Content hosting services, such as Amazon S3, provide valid and trusted SSL certificates.
- If the stream is a playlist container that references additional streams, you must host each stream within the playlist at an Internet-accessible HTTPS endpoint on port 443 with a valid and trusted SSL certificate.
- Your audio file must be in one of the following formats: AAC/MP4, MP3, PLS, M3U/M3U8, HLS.
- Your audio stream must support bit rates of 16 – 384 KB per second.
Image requirements and recommendations
To customize the background on Alexa-enabled devices with a screen, your image must meet the following requirements and recommendations:
- You must host the image at an Internet-accessible HTTPS endpoint and the image must be available 24 hours a day seven days a week.
- The image must be in JPEG or PNG format, with the appropriate file extensions.
- (Recommended) For best results, make sure that images are transparent. Images with a transparent background work well on a wide range of shapes and sizes.
Note: Only images in PNG format can be transparent.
- The image size must be the minimum recommended size. If you provide a smaller image, the device must scale the image, which can make the image appear blurry.
- The image size must not exceed 3 MB. If you send multiple images in a response, the combined image size must not exceed 3 MB.
- (Recommended) Keep image sizes small to reduce latency and provide a better customer experience.
- (Recommended) For best results, use a square or rectangle image. If the image isn't square, it might display with extra black space on the device. The Echo Spot crops the image to a circle shape.
- Apply a 70 percent opacity black layer for optimal contrast between the image and text.
- (Recommended) Use background images with slight patterns or gradients to provide a consistent, high-quality appearance.
|Image||Recommended minimum size||Echo Show and Fire TV Cube||Echo Spot|
480 x 480 pixels
Scaled to 300 x 300 and displayed as album art.
Scaled to 480 x 480, cropped to a circle, and displayed as the background image with 70 percent opacity black scrim.
1024 x 640 pixels
Scaled to 1024 x 640 and displayed as a background image. Your image is displayed without change on the Echo Show or Fire TV Cube. Apply any fading effects in your source image if needed.
Steps to add long-form audio to your skill
Complete the following steps to add the
AudioPlayer interface to your custom skill.
- Enable the AudioPlayer interface.
- Implement AudioPlayer directives and requests.
- Implement intents for audio playback.
- Support audio on Alexa-enabled devices with a screen.
Step 1: Enable the audio player interface
You configure your skill to indicate that your skill implements the interface
To enable the audio player interface in the Alexa developer console
- Sign in to the Alexa developer console.
- From the skill list, locate your custom skill, and then, in the dropdown under ACTIONS, select Edit.
- In the left pane, click CUSTOM, and then click Interfaces.
- To enable the
AudioPlayerinterface, toggle the Audio Player option, and then click Save Interfaces.
The console adds the required built-in intents for audio playback to your interaction model.
- To rebuild your custom interaction model, on the Build page, click Build Model.
Step 2: Implement AudioPlayer directives and requests
Implement the following
AudioPlayer interfaces in your custom skill to start and stop long-form audio streaming.
Include the following directives in a response to Alexa:
AudioPlayer.Play– Requests Alexa to stream the specified audio file.
AudioPlayer.Stop– Requests Alexa to stop the current audio stream.
AudioPlayer.ClearQueue– Requests Alexa to clear the queue of all audio streams.
Handle the following requests that Alexa sends to report playback status of the audio stream:
AudioPlayer.PlaybackStarted– Sent to your skill when Alexa starts the audio stream specified in a
Playdirective. This directive lets your skill verify that playback began successfully.
AudioPlayer.PlaybackFinished– Alexa notifies your skill when the stream comes to an end on its own.
AudioPlayer.PlaybackStopped– Sent when Alexa stops playing an audio stream in response to a voice request or an
AudioPlayer.PlaybackNearlyFinished– Alexa notifies your skill when the currently playing stream is nearly complete and the device is ready to receive a new stream.
AudioPlayer.PlaybackFailed– Alexa notifies your skill when an error occurred when your skill attempted to play a stream.
Step 3: Implement intents for audio playback
In your skill code, implement the following required built-in intents to pause and resume audio:
In addition, Amazon recommends that you implement the following built-in intents for playback control:
If your skill is playing audio, or was playing audio most recently, Alexa sends these intents to your skill. Your skill code should handle the intents without error.
If any of these intents don't apply to your skill, handle the intent in a graceful way. For example, for a podcast skill, on receipt of the
AMAZON.ShuffleOnIntent intent, your skill might return, "I can't shuffle a podcast." Or, version 1.0 of a music skill that doesn't support playlists and shuffling might return, "Sorry, I can't shuffle music yet."
You need an Echo device to test the playback requests from Alexa. The Alexa simulator doesn't render audio playback, but the Skill I/O section of the simulator shows the
AudioPlayer directives sent from your skill. For details, see Test your skill with the simulator.
Step 4: Support audio on Alexa-enabled devices with a screen
If the user touches the device screen while your skill is streaming audio, an Alexa-enabled device with a screen shows audio tap controls for a short time. These controls provide access to (next), (previous), (pause), and (play) actions. Implement skill code to handle these intents appropriately.
In response to the next, previous, and play actions, Alexa sends your skill one of the following
PlaybackController.NextCommandIssued– Sent when the user uses (next) to skip to the next audio item.
PlaybackController.PlayCommandIssued– Sent when the user uses (play) to start or resume playback.
PlaybackController.PreviousCommandIssued– Sent when the user uses (previous) to go back to the previous audio item.
When the user taps the (pause) control, Alexa stops playback, but doesn't send a request to your skill. However, your skill should still handle
PlaybackController.PauseCommandIssued, because other devices, such as hardware remotes, do send this intent.
To get started, review the following audio player skill code samples on GitHub:
- Build Your Skill
- How to Handle Touch-Screen Controls for Audio Skills on Echo Show and Echo Spot
- AudioPlayer Interface Reference
- PlaybackController Interface Reference
Was this page helpful?
Last updated: May 03, 2023