Help & Answers
VoxChron FAQ — AI Captioning & Non-Verbal Sound Detection
Everything you need to know about VoxChron’s AI transcription, 500+ non-verbal sound events, FCC / ADA / WCAG 2.1 compliance, live captioning and pricing. Can’t find your answer? Contact us.
Getting Started
What is VoxChron?+
VoxChron is an AI-powered captioning platform that transcribes speech and also detects 500+ non-verbal sounds — laughter, sighs, music, applause, door slams, environmental effects and more. It supports 120+ languages, 26 context modes, live captioning over RTMP/SRT/WebRTC, and exports to SRT, VTT, JSON, TXT and DOCX.
How is VoxChron different from Rev, Otter, Descript or Whisper?+
Most transcription tools only capture words. VoxChron additionally detects over 500 non-verbal sound events and timestamps them inline (e.g. [laughs], [door slams], [pause 2.3s]) — which is required by the FCC, ADA and WCAG 2.1 for truly accessible closed captions. It also offers context modes (Action, Fantasy, News, Education, etc.) that bias the sound model toward genre-specific events.
How long does it take to process a file?+
Typical turnaround is 5–15% of the media duration. A 60-minute video usually completes in 3–9 minutes. Very long files (5 GB+) may take longer. You receive an in-app notification and an email when your file is ready.
Do I need to install any software?+
No. VoxChron runs entirely in your browser. Upload your file, review the captions in the in-browser editor, and download the finished exports. There is also a REST API and official integrations with YouTube, Vimeo, WordPress, Brightcove and OBS.
Is there a free trial?+
Yes — 30 minutes of free processing credits on signup, no credit card required. Live captioning also includes 5 free minutes so you can test RTMP/WebRTC end-to-end latency before subscribing.
Accuracy & Quality
How accurate is VoxChron?+
Speech accuracy averages 95–98% for clean studio audio in supported languages. Non-verbal sound detection exceeds 90% for common events (laughter, music, applause, silence). Accuracy varies with audio quality, accents, and background noise — you can always correct output in the in-browser editor before exporting.
What non-verbal sounds can VoxChron detect?+
Over 500 sound categories grouped into: human sounds (laughter, sighs, gasps, coughs, crying, sneezing), crowd reactions (applause, cheering, booing), music cues (♪ theme music, ♪ dramatic sting), environmental (doors, footsteps, vehicles, weather, phone rings, typing), and genre-specific sounds like gunshots (Action mode) or magical effects (Fantasy mode).
What is Full Hearing-Impaired (Full CC) Mode?+
Full CC Mode lowers the detection threshold to 45% so every perceptible non-verbal sound is captioned — producing FCC §79.1 and ADA Title III-compliant closed captions for Deaf and hard-of-hearing viewers. Other presets are Key moments only (90%), Recommended (80%) and Enhanced detection (65%).
Can I edit the captions before exporting?+
Yes. Every job opens in our in-browser editor where you can correct text, adjust timing, rename speakers, merge or split segments, and add or remove sound events. Changes are saved continuously and exports re-render instantly.
Pricing & Billing
How does pricing work?+
Pricing is based on processing minutes consumed. You can buy credit packs or subscribe to monthly plans. Live captioning is billed per hour of streamed content. Volume discounts apply at 1,000 / 10,000 / 100,000 minute tiers. See our pricing page for current rates.
Do unused minutes expire?+
Credit pack minutes are valid for 12 months from purchase. Monthly subscription minutes reset each billing cycle.
Can I cancel anytime?+
Yes. Subscriptions cancel at the end of the current billing period. No lock-in contracts. See our Refund Policy for refund eligibility.
Do you offer enterprise pricing?+
Yes. Contact our sales team for volume pricing, SLAs, dedicated infrastructure, custom context modes and private-cloud / on-premise deployment options.
Files & Formats
What file formats do you support?+
Video: MP4, MOV, MKV, AVI, WEBM, FLV, WMV. Audio: MP3, WAV, AAC, FLAC, OGG, M4A, WMA. Plus 30+ other codecs via FFmpeg. Broadcast formats such as MXF and ProRes are supported on enterprise plans.
What is the maximum file size?+
10 GB per file, which is roughly 90 minutes at high bitrate or 6+ hours at podcast-quality audio. Files over 5 GB use multipart upload automatically.
What export formats are available?+
SRT (SubRip), VTT (WebVTT), JSON with word-level timestamps, plain TXT, and DOCX with speaker labels and timestamps. Broadcast-ready SCC / STL / EBU-TT exports are available on enterprise plans.
How many languages does VoxChron support?+
VoxChron transcribes 120+ languages and dialects. Non-verbal sound detection is language-agnostic — the same 500+ sound categories apply across all languages. Automatic language detection is on by default.
Live Captioning
What is the latency for live captions?+
End-to-end latency is under 3 seconds from spoken word to on-screen caption. This includes audio capture, transcription, non-verbal sound detection, formatting and display over WebSocket.
Does VoxChron support RTMP, SRT and HLS live streams?+
Yes. VoxChron ingests RTMP, SRT, WebRTC and HLS/DASH streams, and outputs captions over WebSocket, CEA-608/708 (for broadcast), and an overlay URL for OBS or vMix browser sources.
Which streaming platforms does VoxChron integrate with?+
RTMP, SRT, WebRTC, OBS Studio, vMix, Wirecast, Zoom, Microsoft Teams, Google Meet, YouTube Live, Twitch, Facebook Live, and custom HLS/DASH endpoints.
Can I choose the content type for live streams?+
Yes. When you enable Non-Verbal Sound detection on a live stream, you can choose a content type — General, Podcast, News, Accessibility, Live Broadcast, Vlog / Social Media, Education, Training Video or Marketing Video — and the model biases its sound detection and formatting accordingly.
Security & Compliance
Is my data secure?+
All uploads use HTTPS / TLS 1.3. Files are encrypted at rest with AES-256 on our Hetzner object storage (EU datacenters). Source files are deleted automatically after processing. Only exported captions are retained. See our Security page for full details.
Is VoxChron GDPR compliant?+
Yes. VoxChron is operated by a UK-registered company and processes data in accordance with UK GDPR and EU GDPR. We offer a signed Data Processing Agreement (DPA) on request. See our Privacy Policy and DPA.
Does VoxChron meet FCC, ADA, WCAG and Section 508 standards?+
VoxChron helps meet captioning requirements under FCC §79.1 (Television Closed Captioning), ADA Title III, WCAG 2.1 AA (Success Criteria 1.2.2 & 1.2.4) and Section 508. Full CC Mode and non-verbal sound detection are specifically designed to satisfy the "non-speech information" requirement that other transcription tools miss. Final compliance determinations depend on specific use cases and should be reviewed by qualified counsel.
Where is my data stored?+
Uploads are stored on encrypted Hetzner Object Storage in the EU (Nuremberg, Germany) by default. Enterprise customers can request Falkenstein, Helsinki or private-cloud deployment with regional hosting.
API & Integration
Does VoxChron have a REST API?+
Yes. The REST API supports file uploads (including presigned multipart), job status polling, webhook callbacks, live-stream session creation, and export downloads. Rate limits are generous on paid plans. See our API Docs for full reference.
Can I integrate VoxChron with my CMS or video platform?+
Yes. Official integrations exist for YouTube, Vimeo, WordPress, Brightcove, Kaltura and JW Player. Custom integrations via REST API and webhooks (job.completed, job.failed, stream.started, stream.stopped).
Can VoxChron translate captions into other languages?+
Yes. After transcription you can request translation of the caption track into any of 50+ target languages, with timestamps preserved. Non-verbal sound markers (e.g. [laughs], [♪ music]) are localized or kept universal as you prefer.