API Reference Overview (VSK Fire TV)

Warning: Video Skills Kit (VSK) is no longer supported. For questions about existing integrations, please contact your Amazon technical account manager.

Alexa converts the utterances users say (for example, utterances to search for a TV show or to watch a movie) into directives. A directive is a set of data and instructions, expressed in JSON, that Alexa sends to your app or Lambda. Video skills for Fire TV apps can support a variety of directives, such as SearchAndPlay, SearchAndDisplayResults, and more.

Your Lambda must interpret and handle the directive to fulfill the user's request. Your Lambda both sends a response back to Alexa and takes the appropriate action to fulfill the request.

Available Directives
Targeting Your Video Skill
Comparison with Multimodal Directives
Terminology – Requests versus Directives

Available Directives

Alexa sends the following directives with Fire TV apps.

Directive	Description
`RemoteVideoPlayer - SearchAndPlay`	Sent when users ask Alexa to play specific video content.
`RemoteVideoPlayer - SearchAndDisplayResults`	Sent when users ask Alexa to search for video content.
`PlaybackController`	Sent when users request to play, stop, and navigate playback for video content.
`SeekController`	Sent when users request to fast-forward (or skip) or rewind to a specific duration.
`ChannelController`	Sent when users request to change the channel
`KeypadController`	Sent when users request to scroll right or left, page up or down, or select the item in focus.

The details for each of these directives, as well as the utterances that trigger the directives, are described at the previous links.

Targeting Your Video Skill

To target your video skill with the utterance, do the following:

Say the utterance with your app open.
Make your video skill's name explicit in the request, such as "Play [X] Show on XYZ" rather than just "Play [X] Show." (This is called an explicit utterance.)

Comparison with Multimodal Directives

Implementing Video Skills Kit for Multimodal Devices also involves interpreting and respond to directives from Alexa, as described in Directives Reference Overview. The directives aren't the same as those used for Fire TV apps, but they are similar:

SearchAndPlay (FTV) is similar to GetPlayableItems (multimodal). These directives support play utterances.
SearchAndDisplayResults (FTV) is similar to GetDisplayableItems (multimodal) These directives support search utterances.

However, multimodal devices have two directives that are made for each of the previous items (GetPlayableItems and GetPlayableItemsMetadata, and GetDisplayableItems and GetDisplayableItemsMetadata), because the fundamental interaction model is different. With multimodal devices, your Lambda feeds the information back to Alexa in the response. With Fire TV apps, your Lambda pushes the needed information directly to your app through Amazon Device Messaging.

Terminology – Requests versus Directives

The term "request" and "directive" are mostly synonymous in the video skills documentation here. Request is a more general term for any message Alexa sends to your Lambda. With video skills, the messages are labeled as a directive in the code, so we refer to the requests as "directives." This aligns with terminology used in other Alexa Skills Kit documentation.

Additionally, the term "directive" provides some differentiation between the user's utterance (e.g., a request to watch a movie) and the information that Alexa sends to your Lambda.