VoxChron
Next Generation Accessibility Captioning
Automatic captions that go beyond words — every sound event captured.
Every sigh, laugh, door slam, and silence tells a story. VoxChron is the only AI platform that transcribes speech and captures every non-verbal sound in your audio — making it the only captioning tool that truly meets accessibility standards.
Trusted by podcasters, broadcasters, and accessibility teams
[00:01:02] Speaker 1 (hesitant, low tone): I... I don't think we should go.
[00:01:04] [deep sigh]
[00:01:05] [silence — 2.3s]
[00:01:08] [door creaks open]
[00:01:09] [wind howling — background]
[00:01:11] [FOOTSTEPS — rapid, approaching]
[00:01:12] Speaker 2 (urgent, rising pitch): We have to. There's no other way.
[00:01:14] [nervous laughter]
[00:01:16] [keys jangling]
Real output from VoxChron — every sound captured, timestamped, and labeled.
What is VoxChron?
VoxChron is an AI-powered closed captioning platform that transcribes speech and detects non-verbal sounds — including sighs, laughter, door slams, silence, and 500+ other sound events. Unlike standard transcription tools that capture only spoken words, VoxChron annotates every audible event with precise timestamps and labels, producing captions that automatically meet FCC §79.1, ADA Title III, WCAG 2.1 AA, and Section 508 accessibility requirements. It supports pre-recorded files up to 10 GB, live streams via RTMP/HLS/WebRTC, and exports to SRT, VTT, JSON, TXT, and DOCX formats. Accuracy exceeds 98% across 500+ sound categories.
The Problem
Other transcription tools only capture half the story
Standard transcription gives you words. But audio is so much more than words. The laughter that breaks tension. The silence that speaks volumes. The door slamming that changes the mood. The trembling voice that reveals fear.
VoxChron captures it all. Our AI detects and annotates 500+ distinct non-verbal sounds, emotions, and environmental audio events — automatically.
"I don't think we should go."
What We Detect
Six layers of non-verbal intelligence
Every audio file contains layers of meaning. VoxChron detects them all and gives you granular control over what to include in your output.
Speech
Word-level transcription with speaker diarization. Verbatim or clean modes.
Paralanguage
Hesitations, filler words, sighs, laughter, throat clearing, stammering — the sounds people make that aren't words.
Non-Word Sounds
Coughs, sneezes, yawns, whistles, claps, snaps — human-generated sounds that carry meaning but aren't speech.
Environmental Sounds
Traffic, wind, rain, crowd noise, silence, music — ambient sound captions that set the scene for every listener.
Sound Effects & Foley
Door slams, footsteps, glass breaking, explosions, phone rings — sound effect subtitles with precise timestamps for every discrete event.
Emotional Cues
AI-detected emotional tone per utterance — happy, sad, angry, fearful, surprised, neutral — from voice alone.
How It Works
From upload to compliant captions in minutes
- 1📁
Upload your file
Drag and drop any audio or video file up to 10 GB — MP3, WAV, MP4, MOV, M4A, FLAC, OGG, WEBM, and more. Or connect a live RTMP/HLS/WebRTC stream.
- 2🧠
AI analyzes 6 signal layers
VoxChron processes speech, paralanguage, non-word sounds, environmental sounds, sound effects, and emotional cues simultaneously — detecting 500+ distinct sound categories.
- 3✅
Download compliant captions
Export to SRT, VTT, JSON, TXT, or DOCX with every sound timestamped and labeled. Automatically meets FCC, ADA, WCAG 2.1, and Section 508 requirements.
Context Modes
Adapts to any content type
Tell VoxChron what you are captioning — even the specific movie genre — and it automatically adjusts which non-verbal sounds to prioritise, how to format them, and which compliance rules to apply. Horror gets silence and jump-scare detection. Comedy gets laughter and timing. Legal gets every breath and pause. No other captioning tool does this.
Content Types
Movie Genres — Detection adapts to the genre
Each mode tunes detection weights, sound priorities, and formatting rules automatically.
Compliance
Compliance-ready out of the box
Non-verbal sound annotations formatted to meet real-world regulatory standards. No manual cleanup needed.
Broadcast-grade closed captions with sound annotations
Accessible captions with non-verbal context for hearing impaired
Verbatim transcripts with pauses, hesitations, and sounds
Web-accessible media with full audio description
The only captioning tool that truly meets accessibility standards
Every accessibility guideline — FCC, ADA, WCAG 2.1, Section 508 — requires that captions include non-verbal sounds. No other tool does this automatically. Until now.
Every other captioning tool
Non-compliant
[00:01:02]
"I don't think we should go."
[00:01:05]
— nothing captured —
[00:01:11]
"We have to go. There's no other way."
A hearing impaired viewer misses the entire emotional scene
VoxChron
Meets requirements
[00:01:02]
(hesitant) "I... I don't think we should go."
[00:01:04]
[deep sigh] [silence — 2.3s] [door creaks] [wind]
[00:01:11]
[FOOTSTEPS — rapid, approaching]
(urgent) "We have to. There's no other way."
True accessibility — nothing is lost
Why this matters — what the standards actually require
FCC §79.1
Captions must include non-speech information like sound effects and speaker identification
ADA Title III
Effective communication requires captions that convey the full auditory experience
WCAG 2.1 AA
Guideline 1.2.2: Captions must include identification of non-speech sounds needed to understand content
Section 508
Federal media must include captions with all significant sound effects and speaker changes
Every standard requires non-verbal sounds in captions — because true captions for the hearing impaired must convey the full auditory experience, not just words. VoxChron is the only tool that does this automatically.
Broadcasters
Filmmakers
Universities
Government
Healthcare
Legal
Ready to meet accessibility standards without manual work?
Start free — 5 minutes includedComparison
VoxChron vs other captioning tools
Most transcription tools capture words. Only VoxChron captures everything.
| Feature | VoxChron | Descript | Rev | Otter.ai |
|---|---|---|---|---|
| Speech transcription | ✓ | ✓ | ✓ | ✓ |
| Non-verbal sound detection (500+) | ✓ | ✗ | ✗ | ✗ |
| 6 signal layers (paralanguage, environmental, emotional) | ✓ | ✗ | ✗ | ✗ |
| Speaker diarization | ✓ | ✓ | ✓ | ✓ |
| Emotional tone detection | ✓ | ✗ | ✗ | ✗ |
| Silence & pause annotation | ✓ | ✗ | ✗ | ✗ |
| Full Hearing Impaired Mode (Full CC) | ✓ | ✗ | ✗ | ✗ |
| Content-type adaptation (26 modes) | ✓ | ✗ | ✗ | ✗ |
| Adjustable detection sensitivity | ✓ | ✗ | ✗ | ✗ |
| FCC/ADA/WCAG/508 auto-compliance | ✓ | ✗ | Partial | ✗ |
| Live streaming transcription | ✓ | ✗ | ✓ | ✓ |
| Translation (75+ languages) | ✓ | ✗ | ✓ | ✗ |
| Dual subtitles (original + translated) | ✓ | ✗ | ✗ | ✗ |
| SRT/VTT/JSON/TXT/DOCX export | ✓ | ✓ | ✓ | Partial |
| REST API access | ✓ | ✓ | ✓ | ✓ |
| Pay-as-you-go (no subscription required) | ✓ | ✗ | ✓ | ✗ |
| GDPR compliant / EU data processing | ✓ | ✗ | ✗ | ✗ |
Comparison based on publicly available features as of 2026. VoxChron is the only platform purpose-built for non-verbal sound detection in captions.
Pricing
Simple, transparent pricing
Pay only for what you use. No hidden fees. Start with 5 free minutes.
| Service | Price | Turnaround |
|---|---|---|
AI Transcription Speech-to-text with word-level timestamps | $0.25/min | 1-5 minutes |
AI Non-Verbal Sounds All 6 signal layers — every sound captured | $0.45/min | 1-5 minutes |
Full Hearing Impaired ModeIncluded Speaker labels, music notation, all background sounds — FCC/ADA compliant. 4 sensitivity presets from Key Moments to Full CC. | Included with Non-Verbal plan — no extra cost | 1-5 minutes |
Translation Add-onNew Translate transcripts to 75+ languages — optional non-verbal label translation included | +$1.50/min | 1-5 minutes |
Dual Subtitles Add-onNew Original + translated subtitles in one file — top/bottom, side-by-side, or inline formats | +$0.50/min | 1-5 minutes |
Live StreamingNew | ||
Live Clean Verbatim Real-time clean transcription — readable, no filler words | $0.50/min | Real-time (~300ms) |
Live Verbatim Real-time verbatim — includes filler words (um, uh), hesitations | $0.80/min | Real-time (~300ms) |
Live Non-Verbal Detection Add [laughs], [sighs], [coughs], [pause] tags to your live stream | +$0.80/min | Real-time (~2s buffer) |
- ✓All signal layers
- ✓Full Hearing Impaired Mode
- ✓All export formats
- ✓API access (10 req/min)
- ✓Priority processing
- ✓Full Hearing Impaired Mode
- ✓API access (60 req/min)
- ✓Dedicated support
- ✓Full Hearing Impaired Mode
- ✓API access (300 req/min)
All paid plans include API access. USDC payments supported via Stripe.
Need more volume? Contact us for a custom plan.
For Developers
Built for developers
Integrate non-verbal sound intelligence into your own apps. One API call to transcribe speech and detect every sound.
- ✓Simple REST API with structured JSON responses
- ✓API key authentication with rate limiting
- ✓All 6 signal layers configurable per request
- ✓Export to SRT, VTT, DOCX, or JSON
- ✓Webhook callbacks on job completion
# Transcribe with non-verbal sound detection
curl -X POST https://api.voxchron.com/v1/transcribe \
-H "Authorization: Bearer vs_live_xxx" \
-H "Content-Type: application/json" \
-d '{"file_url": "https://...",
"transcription_mode": "verbatim",
"context_mode": "podcast",
"signal_layers": {
"paralanguage": true,
"non_word_sounds": true,
"environmental": true,
"sound_effects": true,
"emotions": true}}'
Real-time transcription for live streams
Connect any RTMP, HLS, or WebRTC stream and get transcripts delivered in real-time with ~300ms latency. Optionally detect non-verbal sounds live.
FAQ
Frequently asked questions
What exactly does VoxChron detect?
How is this different from regular transcription?
What file formats do you support?
How does live streaming transcription work?
What languages do you support?
Is VoxChron compliant with accessibility standards?
How accurate is the non-verbal detection?
What is Full Hearing Impaired Mode?
Do you offer an API?
How does pricing work?
Can I try it for free?
How does VoxChron compare to Descript, Rev, or Otter.ai?
What is render-time threshold filtering?
What content types and genres does VoxChron support?
How does VoxChron handle speaker diarization?
Can VoxChron be used for podcast transcription?
Is my data secure on VoxChron?
How fast does VoxChron process audio and video files?
Trusted by creators & enterprises worldwide
“VoxChron cut our captioning costs by 90% and turnaround from days to minutes. It's the only tool that captures everything — laughs, pauses, background sounds. Nothing else comes close.”
— Sarah Chen, Head of Content at StreamCore
“We needed FCC-compliant captions for 200+ hours of broadcast content. VoxChron detected non-verbal sounds we didn't even know were there. The Full CC mode is a game-changer for accessibility teams.”
— Marcus Taylor, Compliance Director at BroadcastOne
“As a podcast producer, I tried every transcription tool out there. VoxChron is the only one that catches the sighs, laughter, and awkward pauses that make conversations real. My deaf listeners finally get the full experience.”
— Elena Ruiz, Producer at The Daily Dialogue
Stop losing the sounds that matter
Upload your first audio file and see every non-verbal sound in your content — 5 minutes free, no credit card required.
Get started freeYour data stays private
Built for institutions, healthcare, legal, and enterprise teams with strict data requirements.
Read our Privacy Policy →AES-256 Encryption
All uploads and processed files are encrypted in transit and at rest using AES-256 — the same standard used by banks and governments.
Never Used for Training
Your audio, video, and transcript data is never used to train AI models — not ours, not third parties. Your content stays yours.
GDPR Compliant
VoxChron is fully GDPR compliant. Data is processed within the EU and retained only as long as necessary for your job.