API Reference Overview (VSK Echo Show)
Video skill directives are sent from Alexa to your Lambda function. Directives are JSON messages that contain instructions about performing a specific action, like getting metadata for a video. There are various directives with different names and payloads. Your Lambda function must handle the incoming directives and return a response that conforms with the expected JSON. Overall, the basic model is a request (in JSON) sent from Alexa (the request is called a directive) and a response (also in JSON) sent by your Lambda. JSON in, JSON out.
Supported Utterances on Multimodal Devices
As described in the Introduction, multimodal devices support the following utterances:
- Search: Users can search using criteria like
GenreName, etc., by querying the content providers for content to display.
- Browse: Users can browse a watch list, video library, recordings, etc., on the device.
- Pagination: Users can go to the next page of search results, or scroll beyond the viewable search results on the screen.
- Channel Navigation: Users can play a live stream or recording of a channel for the user.
- Play from Search Results: Users can play a video from the search results shown on the device. The user can play using voice or simply by tapping on an item on the screen.
- Recommendations: Users can search for recommendations by explicitly asking for them.
- Landing Page: Users can go to a landing page for your content (analogous to an app home screen). Alexa provides a landing page template that gets populated by your response.
Directives Alexa Sends
When users say any of supported utterances, Alexa converts these utterances into directives that it sends to your Lambda function. Utterances that users say for browse, search, channel change, quick play, recommendations, etc., might use similar directives (populated in different ways).
Despite the variety of utterances and scenarios, there are only a handful of directives that your Lambda must respond to. As such, when you code your Lambda for a multimodal device, focus on handling the six directives listed in the following table.
||Alexa sends this directive when the user requests to play a video on the device. You can search content based on the criteria in the directive and return entity ID for the video corresponding to the request. Your Lambda's response should favor videos that the user is entitled to play through their subscription (if applicable). If there are no videos for that request, then an appropriate error response code should be returned.|
||Alexa sends this directive when the user requests to go to a specific channel. When the view changes to the channel, content from the channel begins playing.|
||Alexa sends this directive to obtain additional metadata for results you previously returned through
||Alexa sends this directive when the user requests to search videos and view results on the device. You can search content based on the criteria in the directive and return entity IDs for the videos corresponding to the search request. If there is no content found for the request, then an appropriate error response should be returned.|
||Alexa sends this directive to obtain additional metadata for results you previously returned via
||Alexa sends this directive to get grouped content in search results. Search results on multimodal devices can be merged into browse nodes with multiple layers. Users can choose to drill down on those grouped result items by clicking on them to see more results under that grouping. Your Lambda's response to a
||Alexa sends this directive when the user tries to view more results on the screen by scrolling than are currently displayed on the screen. This directive is called to dynamically fetch more results to show on the screen, once the metadata for those results has been fetched.|
Expected Lambda Responses
When Alexa sends one of the directives (a JSON message) described above to your Lambda, your Lambda must respond with a JSON message that provides the needed info (usually content IDs) and which conforms to the expected response fields and schema. The following diagrams show the Alexa directives and expected Lambda responses.
Comparison with Fire TV Directives
Implementing the Video Skills Kit for Fire TV also involves interpreting and responding to directives from Alexa, as described in Directives and Responses (VSK FTV). The directives aren't the same as those used for multimodal devices, but they are similar:
SearchAndPlay(FTV). These directives support Play utterances.
SearchAndDisplayResults(FTV). These directives support Search utterances.
However, note that multimodal devices have two directives that are made for each play and search utterance type (
GetDisplayableItemsMetadata), because the fundamental interaction model is different. With multimodal devices, your Lambda feeds the information back to Alexa in the response. With Fire TV apps, your Lambda pushes the needed information directly to your app through Amazon Device Messaging.
Terminology – Requests versus Directives
The term "request" and "directive" are mostly synonymous in the video skills documentation here. Request is a more general term for any message Alexa sends to your Lambda. With video skills, the messages are labeled as a
directive in the code, so we refer to the requests as "directives." This aligns with terminology used in other Alexa Skills Kit documentation.
Additionally, the term "directive" provides some differentiation between the user's utterance (e.g., a request to play a movie) and the information that Alexa sends to your Lambda.