Most businesses think they have hearing-accessible media because YouTube shows a CC button. They don't. Auto-generated captions routinely miss 15 to 20 percent of spoken words and garble another 10 percent — the difference between a video that informs a deaf user and one that insults them.
Most businesses think they have hearing-accessible media because YouTube shows a CC button. They don't. Auto-generated captions routinely miss 15 to 20 percent of spoken words and garble another 10 percent — the difference between a video that informs a deaf user and one that insults them. Genuine website accessibility for hearing impaired users requires captions that a deaf viewer can actually follow, transcripts for audio content, and a handful of production standards most teams never document.
Hearing-related complaints now account for a meaningful share of Title III digital demand letters, according to the 2025 UsableNet year-in-review. Most of those complaints do not say "the video has no captions." They say "the captions exist but are unusable" — wrong speaker, missing punctuation, no sound effects marked, bad timing. That is the gap this guide breaks down.
What Hearing-Impaired Users Actually Need From Your Site
Hearing loss is a spectrum. Roughly 37 million American adults report some degree of it, per the National Institute on Deafness and Other Communication Disorders, and the population splits into groups with very different needs on the web.
- Late-deafened and hard-of-hearing users often rely on captions as a supplement — they may hear partial audio but need text to fill gaps, especially for unfamiliar names, numbers, and technical terms.
- Culturally Deaf users may have English as a second language after ASL. For them, captions need to be clear, well-timed, and ideally supplemented with sign language video for critical content.
- Users in sound-off environments — offices, trains, late-night browsing — rely on captions even without a hearing condition. Meta's own research found that 85 percent of Facebook video views happen with the sound off.
WCAG 2.2 level AA codifies the baseline: captions for all prerecorded audio content, transcripts for prerecorded audio-only content, and audio description or a text alternative for video where the visuals carry information the audio does not. Level AAA adds live captioning and sign language interpretation, but AA is the legal floor for most Title III, ADA, and Section 508 cases.
The Four Media Failures That Draw Complaints
After reviewing the public complaint language from accessibility demand letters and Department of Justice settlements across 2023 to 2025, the same four failures show up over and over. None of them require exotic technology to fix.
1. Auto-generated captions treated as captions
YouTube, Zoom, Vimeo, and every major platform generate automatic captions using speech recognition. These are a starting point, not a final product. Accuracy hovers around 60 to 80 percent depending on audio quality, accents, and domain vocabulary. WCAG explicitly requires captions to be accurate and synchronized. A 75 percent-accurate auto caption fails both tests, and courts have accepted that reasoning in multiple consent decrees.
2. Captions that strip non-speech audio
Real captions are not just a transcript of the words. They include speaker identification when it is not visually obvious, sound effects that carry meaning ([door slams], [phone rings]), and music cues ([upbeat music]). A deaf viewer watching a captioned explainer where a ringing phone triggers the next scene needs the ring marked — otherwise the narrative jumps for no visible reason. This is the difference the WCAG spec calls "captions" versus "subtitles."
3. Audio content with no transcript
Podcasts, interview recordings, and audio testimonials count as media under WCAG. They require a synchronized caption track if they are paired with video, or a full text transcript if they are audio-only. "Listen to our founder explain the product" as an embedded MP3 with no transcript is a classic failure and one of the easiest to remediate — a one-time transcription runs a few dollars per audio minute.
4. Live video without real-time captioning
Webinars, product launches, and live streams often ship with only the platform's auto-caption feature active. For prerecorded content WCAG permits post-production captioning, but live content needs real-time captions at AA. The common fix is a professional CART (Communication Access Realtime Translation) provider running alongside the stream, typically 125 to 250 dollars per hour. For regular webinar schedules that cost pays for itself against the risk of a single demand letter.
Auto-captions are a useful draft. They are not compliance. A caption file that reads smoothly, identifies speakers, marks non-speech audio, and syncs within a quarter-second of the dialogue is what WCAG AA actually asks for — and what a deaf user can actually use.
The Quality Bar a Real Caption File Has to Hit
There is a practical specification most compliant teams follow. It maps to WCAG 1.2.2 (captions for prerecorded content) and 1.2.4 (captions for live content), and it borrows from the FCC's broadcast caption quality rules.
- Accuracy above 99 percent on words, names, and numbers. Auto-generated captions start far below this. Human review is the only way to get there reliably.
- Synchronization within 250 milliseconds of the spoken word. Captions that run several seconds behind break the connection between visual and audio.
- Speaker identification whenever the speaker is off-screen or when multiple speakers could be confused.
- Non-speech information in square brackets — sound effects, music cues, laughter, silence that carries meaning.
- Proper punctuation and sentence casing. Auto-caption output is often one long uppercase run with no periods. That is exhausting to read and technically non-compliant.
- A real caption track format — WebVTT or SRT — not burned-in text. Burned-in captions cannot be styled by users who need larger fonts or higher contrast.
We worked with a training company whose entire library was captioned by the platform's auto feature. On audit, accuracy averaged 71 percent. After one pass of human correction, accuracy moved past 99 percent and total viewing time per user rose 22 percent — hearing users benefited just as much as deaf users.
What a Compliant Media Pipeline Looks Like
For any business publishing more than a few videos a quarter, the sustainable answer is a documented media pipeline rather than ad-hoc fixes. A working version looks like this.
- Scripts first. If a video starts from a script, that script becomes the caption draft. No transcription step required.
- Auto-generation as a draft. For unscripted content, run the file through automated transcription to produce an initial caption file.
- Human review pass. A captioner or editor corrects the draft against the audio, adds speaker tags, inserts non-speech cues, and times cues to dialogue.
- QA playback. A second person watches the video with captions on at normal speed. Catches what the editor missed.
- Transcript export. Generate a text transcript from the final caption file and publish it alongside the video on the same page — this covers WCAG 1.2.3 and helps SEO since search engines can index the text.
- Archive the caption source file in WebVTT format so future edits, translations, or platform migrations do not require re-captioning from scratch.
For live events, a CART provider joins the stream 10 minutes early, tests audio, and produces captions through the broadcast. After the event, the live caption log becomes the starting transcript for the on-demand replay, which gets a second human pass before publication.
The Business Case Beyond Compliance
Hearing accessibility is unusual in that nearly every improvement also lifts performance metrics for users without hearing loss. The Meta sound-off statistic is the most quoted, but there is more. Discovery Digital Networks saw a 7.32 percent lift in video views after adding accurate captions to their YouTube library. A Verizon study found 80 percent of caption users in the US have no hearing difficulty at all. Captions make videos watchable in silent contexts, improve comprehension for non-native speakers, and contribute text content that search engines can index.
At Revenue Group, when we build or remediate sites, genuine website accessibility for hearing impaired users is handled at the media pipeline level, not as a retrofit. Every video we ship has a WebVTT caption track reviewed by a human, a published transcript, and documented compliance against WCAG 2.2 AA 1.2.2 and 1.2.3. The cost is measured — it adds roughly 4 to 8 dollars per video minute at volume — and the upside is a media library that is usable by 37 million deaf and hard-of-hearing Americans, watchable by the 85 percent of silent-scrollers, and defensible if a demand letter ever arrives.
Is Your Video Library Actually Compliant?
We will audit your captioning, check accuracy against the WCAG AA bar, and show you exactly where auto-captions are leaving you exposed — no pitch, just the findings.
Get My Free Audit →