Spatial audio

Spatial audio gives the creator the ability to place sound around the user. Unlike traditional mono/stereo/surround audio, it responds to head rotation in sync with video. While listening to spatial audio content, the user receives a real-time binaural rendering of the audio stream.

Output from Jump Assembler contains a stereo (not spatial) scratch audio track recorded from one of the camera’s onboard microphones. This scratch audio track is useful for audio/video synchronization.

This page describes the type of spatial audio supported by Jump and YouTube, as well as how to capture and produce spatial audio.


The type of spatial audio supported by the Jump platform is ambisonic audio, also referred to as ambisonics. The specific ambisonic scheme used is ACN/SN3D.

The ambisonic order of an ambisonic audio stream corresponds to the spatial fidelity of the signal. As the ambisonic order increases, the number of channels increases quadratically. For example:

  • zeroth-order ambisonics (1 channel) is a purely omni-directional signal containing no directional information.
  • first-order ambisonics (4 channels) contains directional information, but its accuracy is blurry.
  • third-order ambisonics (16 channels) contains dramatically more directional information than first-order; sources can be localized with considerable accuracy.
Why Ambisonics?

Ambisonics has many advantages over traditional surround sound formats:

  • It is not biased toward a particular listener or speaker configuration. For example, 7.1 surround sound recordings are made for a specific, fixed arrangement of speakers.
  • It provides height information. Most traditional surround sound formats (5.1, 7.1, etc.) contain only horizontal sound information.
  • Its spatial quality is infinitely extensible; by storing more channels of information, the spatial precision of the audio increases.
  • It is rotationally invariant; arbitrary rotations of the audio do not cause information loss. This is especially important for VR applications, where spatial audio must be smoothly rotated as the listener turns their head.
  • It is a scene-based, rather than object-based, coding. This means that, as the number of sources grows in a scene, the data necessary to represent the scene remains constant; with an object-based coding, the size of the data typically grows linearly with the number of sources.
Spatial audio support

YouTube supports spatial audio on Android and desktop. Support for other platforms is in development. On unsupported platforms, a static, non-head-tracked stereo downmix is delivered to the user.

YouTube Android currently supports first-order ambisonics (4 channels of audio). It will support higher-order ambisonics (9+ channels of audio) in the future.

For more information, see Use spatial audio in 360-degree and VR videos in YouTube Help.

Note: Support for higher-order ambisonics is currently experimental.

Capturing ambisonics

There are a number of microphones available that are capable of recording ambisonics:

A Zoom H2n is included with the GoPro Odyssey camera rig. The Zoom H2n is an easy-to-use ambisonic microphone, and it can record horizontal spatial audio compatible with the Jump platform when updated with firmware version 2.00 or newer. For more information, see Recording Spatial Audio with the Zoom H2n.

As with traditional film audio production, a single-microphone approach only goes so far with ambisonics; they will pick up all audio in a space, and they are not ideal for mixing/equalizing multiple sources.

For the best audio experience, it is desirable to use traditional recording techniques, such as close-miking, in addition to recording with an ambisonic microphone. The close-miked sources can be spatialized in post-production (see the following section), and the ambisonic microphone can be mixed in for ambient audio or used as a spatial audio scratch track.

Producing ambisonics

There are a number of tools available for manipulating and generating ambisonic audio:

For ambisonic sound design, the Jump Team recommends using the free ambiX plugin set with REAPER, a digital audio workstation (DAW). REAPER is ideal for ambisonic audio production because it supports up to 64 channels (that is, up to seventh order ambisonics) of audio per track. For a basic introduction, REAPER template, and export instructions for Jump Inspector and YouTube, see First-order Ambisonics in REAPER.

For more information about basic editing of ambisonics in and export from Adobe Premiere, see First-order Ambisonics in Adobe Premiere.