Product Walkthrough
How VoxChron Works
From upload to export in four simple steps. Designed for broadcasters, educators, and accessibility teams who need captions that capture every sound — not just words.
The 4-step captioning workflow
Upload Your File
Drag and drop any audio or video file (up to 10 GB, 90 minutes, or longer via multipart upload). We support MP4, MOV, WAV, MP3, AAC, FLAC, and 30+ other formats. Files are encrypted in transit (TLS 1.3) and at rest (AES-256).
Select Your Context Mode
Choose from 26 context modes tailored to your content type — from Legal interviews to Horror films. Each mode automatically adjusts detection sensitivity, sound priorities, and formatting rules so captions match your audience's expectations.
AI Processes Your Audio
Our multi-model pipeline transcribes speech, detects 500+ non-verbal sounds (laughter, sighs, door slams, music cues), identifies speakers, and applies punctuation with broadcast-grade accuracy. Typical turnaround: 5–15% of media duration.
Export & Deliver
Download captions in SRT, VTT, JSON, TXT, or DOCX. Review and edit in our in-browser editor, then push directly to YouTube, Vimeo, or your CMS. Live streams? Get captions in under 3 seconds end-to-end over WebSocket, RTMP or HLS.
What Makes VoxChron Different
Non-verbal sound detection
Captures laughter, sighs, coughs, music, effects — so captions feel human, not robotic.
26 context modes
10 content types + 16 movie genres, each with its own detection profile.
Speaker diarization
Automatically labels speakers even in overlapping dialogue.
120+ languages
Full transcription and translation across broadcast-grade languages.
Live captioning
Sub-3-second latency for live streams, webinars, and broadcasts.
Broadcast compliance
Aligned with FCC §79.1, ADA Title III, WCAG 2.1 AA, and Section 508.