Audio and video can be a convenient way of presenting information to users. Content creators often use these types of media on modern websites. Implementing audio and video that do not exclude a part of the audience, such as people with visual or auditory impairments, from accessing them requires additional effort. Web Content Accessibility Guidelines, which are the most recognized and universal standard for Web accessibility, describe what should be done in that aspect. However, WCAG tend to be written in a very formal manner, so they may be considered difficult to understand at best and cryptic at worst, especially the parts focused on media. In this article we will take a closer look at WCAG 2.1 and figure out what exactly should we take into consideration when designing accessible audio and video.

Approach to understanding WCAG

WCAG 2.1 has a specific guideline devoted to time-based media (1.2). There are nine criteria which are a part of it and all of them will be in our interest. There are two main reasons why they might look perplexing:

  • They all refer to audio and video but it’s not obvious which criteria apply to a specific type of media. For instance, if we are only considering a video asset without an audio track, it’s not immediately clear what exactly needs to be provided for users in order to make it accessible.
  • They heavily refer to terms whose meaning might not be obvious. The way they are written is very academic and leaves little room for error but it also makes them hard to digest at first glance.

It should be stressed that links to suitable complementary resources, such as articles explaining the intent of specific criteria, are provided by WCAG authors. They are extensive, helpful and written in a more human-oriented style. Yet, it still takes substantial time to distill the information we might need for a specific use case (and we don’t always have the luxury of spending that much time on assessing what has to be done when working on a real project).

In order to address both issues and demystify media-related criteria, we will first explain the definitions of the terms that might come as confusing. Then, hopefully having more clarity on what the criteria mean, we will transform them into a decision tree that will give us understandable requirements for every type of audio and video that we might be interested in putting on our websites.

Explaining the definitions

We can split the definitions into two groups: those describing the asset that we are going to use on the website and those referring to the additional elements which we should provide for users to make it accessible. We will begin with the first group.

Guidelines mention such terms as audio, video and synchronized media as types of media that might require additional accessibility-focused features. While audio and video should be self-explanatory, synchronized media sounds less obvious, even though in fact it’s simple: in our case, it’s just audio synchronized with video (in other words, a video which is not mute but has an accompanying audio track). A question might arise then: why are authors using such unintuitive term as “synchronized media”? It’s certainly not to confuse the reader but because in some cases synchronized media might be something different than a video with an audio track. These two formats are not the only ones falling into this category. An example of another type of media, mentioned in the WCAG resources, can be “an interactive shopping environment (...) that allows users to steer themselves around in a virtual store and shop”. However, this article is focused only on audio and video and we will not cover other, perhaps more unusual combinations of formats.

Therefore, by saying audio or video content in synchronized media, which often occurs in the WCAG criteria, we will simply mean audio-video. The other formats that we will consider are audio-only and video-only which, as their names suggest, are not paired with any other format.

The other characteristic of the media asset is whether it’s live or prerecorded. Live means that information is transmitted real-time to the user (e.g. a video game stream or radio broadcast). Prerecorded is cleverly defined as any media which is not live, for example an embedded video commercial.

Media format Explanation Example WCAG definition
audio-only audio track without accompanying video prerecorded: podcast episode (accessible via audio player)
live: radio broadcast (accessible via audio player)
WCAG definition of audio-only
video-only video which is mute (does not have accompanying audio track) prerecorded: background video in the top section of the page
live: real-time security camera stream
WCAG definition of video-only
audio content in synchronized media or video content in synchronized media video with accompanying audio track prerecorded: commercial embedded on the page (with audio track)
live: real-time video game stream (with audio commentary)
WCAG definition of synchronized media

There is one more important term related to media formats. A few WCAG criteria mention that some extra capabilities should be provided “except when the audio or video is a media alternative for text”. It means that such media doesn’t contain any additional information to what is already presented in text. In that case providing additional features to improve accessibility is not necessary (e.g. captions).

We will now move on to the definitions of the elements that improve accessibility of audio and video media, which may need to be provided to users in order to meet the criteria. The most mentioned and probably most confusing is an alternative for time-based media. It can apply to both audio and video, depending on the criterium and conformance level, but the idea behind it is to have a text document that presents the same information as the media. In many cases it will be a transcript but it might not always be sufficient. If audio or video contains spoken dialogue, it should be transcribed to text but the user reading such a document might need more information in order to have the same understanding as someone listening to audio or watching video (e.g. sounds or visual context could be crucial). It depends on the content of the specific asset. The W3C article explains it in more detail, if it still seems confusing.

We might need to provide captions in the case of audio. It’s important to differentiate them from subtitles as these two are not the same things even though these terms are often considered synonyms. Subtitles present only the content of the spoken dialogue while captions include also the information about the names of the speakers and the description of sound effects and music.

Sign language interpretation can be crucial for people suffering from hearing impairments. The criterium requiring it can be met by including a sign language interpreter in the corner of the video (they are usually embedded in the asset as a part of the video production process).

Audio description is used to describe what is happening in the video but is not included in the original soundtrack. It enriches the audio track, usually during pauses in dialogue. If the person watching the video has visual impairments, the audio description allows them to understand the content of the video. In the case where pauses in dialogue are too short to fit the additional description, the video can be paused so the narrator has enough time to explain what is happening. This is called extended audio description.

Element Explanation WCAG definition
Alternative for time-based media text document telling the same story as presented in audio/video; a transcript is an example but might not be sufficient; the exact content of the document depends on specific asset WCAG definition of alternative for time-based media
Captions transcription of spoken dialogue and description of sound effects present in the audio track, appearing on screen in sync with the audio track; should not be confused with subtitles which contain just the text of the dialogue WCAG definition of captions
Sign language interpretation translation of spoken language to a sign language; usually implemented by displaying a person doing the translation in the corner of the video WCAG definition of sign language interpretation
Audio description spoken description of the video story taking place during pauses in dialogue WCAG definition of audio description
Extended audio description spoken description of the video story requiring the video to be paused; this technique is used only when audio description would not be sufficient WCAG definition of extended audio description

Creating the decision tree

After we have familiarised ourselves with the WCAG definitions, we should be able to take a closer look at the guideline 1.2 of WCAG. We will transform the criteria into a decision tree. This will help with determining what we should take into consideration in order to make the audio and video on our site accessible. After answering a few questions about the asset that we are going to use we will get a list of techniques that should be applied to meet specific WCAG conformance level (A, AA, AAA). The tree is presented in the image below and the entry point is on the bottom.

decision tree presenting audio and video requirements for different media formats

An example would certainly shed some light on how to use this diagram. Let’s say we will provide an audio-only podcast episode that will be available for our users via the HTML audio player. Our path on the decision tree is as follows:

  • What do I need to make my audio/video accessible? (It’s the entry point.)
  • Prerecorded or live? (Prerecorded.)
  • Audio-video or audio-only/video-only? (Audio-only/video-only.)
  • Audio-only or video-only? (Audio-only.)
  • Output: alternative for time-based media* (A, 1.2.1)

We can see that in this case only the alternative for time-based media should be provided (stated by the criterium 1.2.1, required to meet the A level). It would probably be a transcript of the dialogue and a description of additional sounds and music if they are present on the recording. The analysed example is highlighted in green on the diagram below:

decision tree presenting audio and video requirements for different media formats


Throughout this article, we have explained the guideline 1.2 of WCAG 2.1 which is entirely focused on time-based media. We interpreted the definitions of the terms related to audio-video accessibility aspects and then constructed the decision tree allowing us to easily identify what we should provide for our users to not exclude anyone. However, it’s important to remember that even though this article is an attempt to make WCAG requirements more approachable, it’s recommended to treat it as a companion tool and always refer to WCAG specification nonetheless. Let’s recap the aspects which had been simplified for the sake of approachability:

  • Synchronized media might not always mean audio-video. There are also other combinations of formats.
  • WCAG, as the name suggests, are guidelines. They should not be blindly followed. The ultimate goal should always be providing accessible sites for our users, not merely adhering to the specification.
  • There are also a few guidelines referring to different aspects of media accessibility, such as 1.4.2 Audio Control. Not everything is included in the guideline 1.2.

Having that in mind, hopefully, the tables containing the definitions and the decision tree will come in handy for the reader in future projects.