WikiPlus
Video & Audio · 7 tools

Video & Audio

The WikiPlus video transcriber extracts subtitles from a video file in the browser using the Whisper speech-recognition model. Upload MP4, WebM, or MOV directly, wait while the model decodes the audio…

100% private processing

All operations happen on your device using WebAssembly. Nothing is uploaded — perfect for sensitive documents.

Filter

The WikiPlus video transcriber extracts subtitles from a video file in the browser using the Whisper speech-recognition model. Upload MP4, WebM, or MOV directly, wait while the model decodes the audio track and generates time-stamped text, then export as a plain transcript, SRT, or VTT ready to drop into a video editor or YouTube's subtitle manager. The whole pipeline — decoding, transcription, formatting — runs locally in WebAssembly, so raw footage never uploads.

Every tool on this page runs entirely inside your browser. Nothing is uploaded to our servers, nothing is cached for later, and no account is required. Files are processed on your own device using WebAssembly modules and the open-source libraries that power each utility, which means confidential documents stay confidential — even if you disconnect from the internet after the page loads, most tools will still finish their job. Pick the utility you need below and start working straight away.

Frequently asked questions

Which video formats does the transcriber accept?
MP4, WebM, MOV, and MKV with H.264, H.265, VP9, or AV1 video tracks. Audio is extracted to 16 kHz mono PCM before Whisper processes it, so even unusual audio codecs usually work via the browser's decode pipeline.
How accurate is the Whisper transcription?
Whisper's medium model, used by default, achieves 5 to 8 percent word error rate on clear speech in supported languages. Background noise, heavy accents, or overlapping speakers raise the error rate — always review the output before publishing.
Can I transcribe live recordings?
The tool works on uploaded files only. For live transcription, record first — a phone voice memo is enough — then drop the file into the page. The model needs the complete audio stream to produce accurate timestamps.