Product Walkthrough

How VoxChron Works

From upload to export in four simple steps. Designed for broadcasters, educators, and accessibility teams who need captions that capture every sound — not just words.

Upload Your File

Drag and drop any audio or video file (up to 10 GB, 90 minutes, or longer via multipart upload). We support MP4, MOV, WAV, MP3, AAC, FLAC, and 30+ other formats. Files are encrypted in transit (TLS 1.3) and at rest (AES-256).

Select Your Context Mode

Choose from 26 context modes tailored to your content type — from Legal interviews to Horror films. Each mode automatically adjusts detection sensitivity, sound priorities, and formatting rules so captions match your audience's expectations.

AI Processes Your Audio

Our multi-model pipeline transcribes speech, detects 500+ non-verbal sounds (laughter, sighs, door slams, music cues), identifies speakers, and applies punctuation with broadcast-grade accuracy. Typical turnaround: 5–15% of media duration.

Export & Deliver

Download captions in SRT, VTT, JSON, TXT, or DOCX. Review and edit in our in-browser editor, then push directly to YouTube, Vimeo, or your CMS. Live streams? Get captions in under 3 seconds end-to-end over WebSocket, RTMP or HLS.

What Makes VoxChron Different

Non-verbal sound detection

Captures laughter, sighs, coughs, music, effects — so captions feel human, not robotic.

26 context modes

10 content types + 16 movie genres, each with its own detection profile.

Speaker diarization

Automatically labels speakers even in overlapping dialogue.

120+ languages

Full transcription and translation across broadcast-grade languages.

Live captioning

Sub-3-second latency for live streams, webinars, and broadcasts.

Broadcast compliance

Aligned with FCC §79.1, ADA Title III, WCAG 2.1 AA, and Section 508.

The 4-step captioning workflow