AV Synchronization in Android Applications (Fire TV)

Precise audio and video synchronization is one of the key performance measurements for media playback. In general, audio and video recorded the same time on the recording device need to be played back at the same time on playback devices (for example, on TVs and monitors). Follow these guidelines to ensure correct audio-video synchronization on devices running Android API level 19+.

Theory of Audio-Video Synchronization
Maintaining Audio-Video Sync in Applications
1. Using ExoPlayer
2. If you are Using a Custom-made Media Player
- 2.1 AudioTrack.getTimestamp() (API level 19+)
- 2.2 getPlaybackHeadPosition() (API Level 3+)
3. Using Standard Android MediaPlayer
4. Using OpenSL ES Framework of the Android NDK
- 4.1 When the Audio Player Object Is of Type CAudioPlayer
- 4.2 If Your Audio Player Object Is Created by SLEngineItf Interface API CreateAudioPlayer()
Additional Resources

Theory of Audio-Video Synchronization

The smallest unit of data handled together is called a frame. Both the audio and video stream is sliced into frames, and all frames are marked to be shown at a specific timestamp. Audio and video can be downloaded and decoded independently, but the audio and video frames with matching timestamps should be presented together. In theory, if you want to match your audio and video processing, there are three AV sync solutions available:

Play audio frames continuously: Use the audio playback position as the primary time reference and match the video playback position to it.
Use a system time as reference: Match both audio and video playback to the system time.
Use video playback as reference: Let audio match video.

The first option is the only one with a continuous flow of audio data without any adjustments in presentation time, playback speed, or duration of the audio frames. Any adjustments of these parameters are easily noticed by the human ears may lead to disturbing audio glitches unless the audio is resampled. Consequently, general multimedia applications should use audio playback position as primary time reference. The following paragraphs discuss this solution. (The other two options are outside the scope of this document.)

Maintaining Audio-Video Sync in Applications

The pipelines for audio and video must render their frames with identical timestamps at the same time. The audio playback position is used as primary time reference, while the video pipeline simply outputs the video frames that match the latest rendered audio frame. For all possible implementations, accurately calculating the last rendered audio timestamps is essential. Android provides several APIs to query the audio timestamps and latencies at various stages of the audio pipeline. The following guidance describes best practices.

1. Using ExoPlayer

It is highly recommended to use ExoPlayer for media playback on Fire OS. Amazon's port of ExoPlayer is compatible with all generations of Fire TV devices, provides many additional fixes, and also avoids changing the original ExoPlayer behavior on non-Amazon platforms.

In terms of AV sync, Amazon's port of ExoPlayer uses the methods described in the next sections to maintain correct audio latency calculations for pre-"API level 21" Amazon devices. When this port is used as the media player, it will automatically perform the synchronization. You do not need to manually adjust timestamps for latency.

2. If you are Using a Custom-made Media Player

In a custom-made media player, the app has full control of audio and video data flows and knows how long it takes to decode audio and video packets. The application also has the freedom to increase or decrease the amount of buffered video data in order to maintain continuous playback. The video pipeline needs to be adjusted to the timestamps rendered by the audio pipeline. The following two APIs should be used:

2.1 AudioTrack.getTimestamp() (API level 19+)

If this audio pipeline supports querying the latest rendered timestamp, the getTimestamp() method provides a simple way to determine the value we were looking for. If a timestamp is available, the AudioTimestamp instance is filled in with a position in frame units, together with the estimated time when that frame was presented. This information can be used to control the video pipeline to match video frames to audio frames.

Note the following:

The recommended timestamp querying frequency is once per every 10 seconds. Slight skew is possible but sudden changes are not expected, so there is no need to query the timestamps more frequently. Querying it more often might increase CPU and battery usage that might be a concern for battery powered devices.
When timestamps are correctly returned, the application should trust the values without using any additional hard-coded offset. Adding experimental values are strongly discouraged. They are not platform independent, and the pipeline might be updated at any time (for example, when a Bluetooth sink is connected) making any previously correct values inaccurate.
The AudioTrack.getTimestamp() API returns 0 and might not update the timestamp value continuously during the initial warm-up period of the audio pipeline. This transient period might take several seconds, so in order to avoid any audio-video synchronization issues at the beginning of the playback, you will need to fall back to the AudioTrack.getPlaybackHeadPosition() API described in the next section.

See the getTimestamp() method in the Android documentation for details including available parameters and returns values.

2.2 getPlaybackHeadPosition() (API Level 3+)

When the audio pipeline does not support querying the latest rendered audio timestamps as described in the previous section, an alternative approach is needed.

This solution consists of two parts using two separate functions of the AudioTrack class. The first calculates the latest audio timestamp based on the current head position expressed in frames returned by the method getPlaybackHeadPosition():

private long framesToDurationUs(long frameCount) {
    return (frameCount * C.MICROS_PER_SECOND) / sampleRate;
}

long timestamp = framesToDurationUs(audioTrack.getPlaybackHeadPosition());

The above calculated timestamp value does not take into account the latency introduced by the lower layers; as a result, some adjustments are needed.

The second part of the solution determines the missing latency values using the function getLatency(). Because the getLatency() method is a hidden member of the AudioTrack class (not part of the public SDK), reflection is needed to access it:

Method getLatencyMethod;
if (Util.SDK_INT >= 18) {
  try {
    getLatencyMethod =
     android.media.AudioTrack.class.getMethod("getLatency", (Class < ? > []) null);
   } catch (NoSuchMethodException e) {
      // There's no guarantee this method exists. Do nothing.
   }
}`

The returned value includes the latency of the mixer, audio hardware driver, and also the latency introduced by the AudioTrack buffer. To get the latency of the layers only under AudioTrack, the latency introduced by the buffers (bufferSizeUs) must be subtracted.

long bufferSizeUs = isOutputPcm ? framesToDurationUs(bufferSize / outputPcmFrameSize) : C.TIME_UNSET;
int audioLatencyUs = (Integer) getLatencyMethod.invoke(audioTrack, (Object[]) null) * 1000L - bufferSizeUs;

Using the two parts together, the full solution to calculate the closest approximation of the last timestamp rendered by the audio pipeline will be as follows:

int latestAudioFrameTimestamp = framesToDurationUs(audioTrack.getPlaybackHeadPosition() - audioLatencyUs;

Note: Applications should implement both approaches from section 2.1 and 2.2. However, these two approaches should not be used at the same time. Only if the approach in section 2.1 fails to query the output timestamp should the application fall back to using the approach described in section 2.2.

A sample implementation can be seen in the AudioTrackPositionTracker class of the Amazon port of ExoPlayer.

3. Using Standard Android MediaPlayer

The standard Android MediaPlayer classes that handle audio and video playback are supported on Fire OS. These media classes can handle basic media playback with AV sync requirements. However, their capabilities are limited in multiple ways. Using the Amazon port of ExoPlayer (or one of the paid media player options) is highly recommended instead.

Note: When using the standard MediaPlayer classes, an application should rely on the functions and values returned by Android MediaPlayer and by Fire OS. Manual adjustment of timestamps, audio latency, buffering, etc. is strongly discouraged, as any hard-coded value might become inaccurate when the audio pipeline is updated.

4. Using OpenSL ES Framework of the Android NDK

When OpenSL ES queries audio latency through its standard APIs, it only obtains the audio hardware latency reported by audio flinger. Any latency introduced by software (mainly by audio track buffering) is not included. To get accurate audio latency values that include both hardware and software audio delays, in Fire OS 6 and 7 the OpenSL ES API android_audioPlayer_getConfig() was updated to report complete audio latency.

The following code samples demonstrate usage of these functions to calculate the latency value introduced by both software and hardware layers.

4.1 When the Audio Player Object Is of Type CAudioPlayer

// Sample code for getting Fire OS software+hardware audio latency when using OpenSL framework.
SLuint32 audioLatency = 0;
SLuint32 valueSize = 0;

// variable ap is the audio player object of type CAudioPlayer.
if (android_audioPlayer_getConfig((CAudioPlayer * ) & ap, (const SLchar * )
  "androidGetAudioLatency",
  (SLuint32 * ) &valueSize, (void *) &audioLatency) == SL_RESULT_SUCCESS) {
    // The hardware + software audio latency is filled into variable audioLatency of type `SLuint32`.
} else {
    // Call your current get_audio_latency API. You will only query hardware audio latency value.
}

4.2 If Your Audio Player Object Is Created by SLEngineItf Interface API CreateAudioPlayer()

If your audio player was created in the following way:

result = (*engine)->CreateAudioPlayer(engine, &playerObject, &audioSrc, &audioSink, NUM_INTERFACES, ids, req);

You can use the sample code below to get total audio latency for the created audio player. Note that the variable playerObject has to point to the same instance as the method CreateAudioPlayer() was called with.

// Sample code for getting Fire OS software+hardware audio latency when using OpenSL framework.

// Create playerObject with latency query interface support
// SL_IID_ANDROIDCONFIGURATION is requested to be included on CreateAudioPlayer
const SLInterfaceID ids[] = { SL_IID_ANDROIDCONFIGURATION };
const SLboolean req[] = { SL_BOOLEAN_TRUE };
SLint32 result = 0;

result = (*engine)->CreateAudioPlayer(engine, &playerObject,
              &audioSrc, &audioSink, 1 /* size of ids & req array */, ids, req);

if (result != SL_RESULT_SUCCESS) {
    ALOGE("CreateAudioPlayer failed with result %d", result);
    return;
}

SLAndroidConfigurationItf playerConfig;
SLuint32 audioLatency = 0;
SLuint32 paramSize = sizeof(`SLuint32`);

// Realizing playerObject is required before latency query
result = (*playerObject)->Realize(playerObject, SL_BOOLEAN_FALSE);

if (result != SL_RESULT_SUCCESS) {
    ALOGE("playerObject realize failed with result %d", result);
    return;
}

// Get the audio player's interface
result = (*playerObject)->GetInterface(playerObject,
                                       SL_IID_ANDROIDCONFIGURATION,
                                       &playerConfig);
if (result != SL_RESULT_SUCCESS) {
    ALOGE("config GetInterface failed with result %d", result);
    return;
}

// Get the audio player's latency
 result = (*playerConfig)->GetConfiguration(playerConfig,
           (const SLchar * )"androidGetAudioLatency", &paramSize, &audioLatency);

if (result == SL_RESULT_SUCCESS) {
    // The hardware+software audio latency is filled in SLuint32 type of variable audioLatency.
} else {
    // Call your current get_audio_latency API. You will only get hardware audio latency value.
}

Additional Resources

The following external resources might be helpful for learning more about AV sync.

Audio-to-video synchronization (Wikipedia)
Audio and Video Synchronization: Defining the Problem and Implementing Solutions (The Telos Alliance)

Last updated: Oct 29, 2020