¿Qué es Transcriptor de Audio?
Audio Transcriptor convierte tus archivos de audio en texto limpio con marcas de tiempo. Suelta un MP3, WAV, M4A u OGG. Funciona con podcasts, notas de voz, llamadas de Zoom y entrevistas. La herramienta agrega puntuacion y detecta cambios de hablante. La salida viene como texto plano, subtitulos SRT o VTT. Los periodistas encuentran citas mas rapido. Los podcasters crean notas de episodio en minutos. Los estudiantes convierten clases en apuntes que pueden buscar. Los investigadores codifican datos de entrevistas. No hay tarifas por minuto. Los archivos se quedan en tu navegador. Las grabaciones privadas nunca llegan a ningun servidor externo. Un modelo de IA de voz corre en local. Elige el modelo tiny para velocidad o el modelo large para resultados casi humanos. Funciona en ingles, espanol, aleman, frances, polaco y mas de 90 idiomas mas.
¿Cuándo debo usar esta herramienta?
- Transcribe un episodio de podcast en notas del programa para los oyentes
- Convierte notas de voz de reuniones en elementos de acción escritos
- Crea archivos de texto buscables de grabaciones antiguas de entrevistas de radio
- Convierte un diario grabado en voz en un registro diario escrito
¿Cómo transcribir un archivo de audio a texto?
- 1Arrastra tu archivo de audio MP3 o WAV a la zona de carga.
- 2Confirma el idioma detectado o elige uno a mano.
- 3Haz clic en Transcribir para cargar el modelo de voz en tu navegador.
- 4Espera a que la IA del navegador procese toda la pista de audio.
- 5Revisa, copia o descarga la transcripción en tu formato preferido.
Preguntas frecuentes
¿Qué formatos de audio puedo subir al transcriptor?
The transcriber accepts any audio format that the browser's native media decoder can parse, which in practice covers the most common formats without conversion. MP3 is supported in all major browsers including Chrome, Edge, Firefox, and Safari. AAC and M4A are supported in Chrome, Edge, and Safari; Firefox supports AAC on most platforms depending on the operating system's media codec availability. WAV and AIFF are universally supported as uncompressed PCM formats with no licensing concerns. OGG Vorbis and OGG Opus are supported in all browsers. WebM audio is supported in Chrome and Firefox. FLAC is supported in Chrome, Edge, and Firefox but has inconsistent Safari support on older macOS versions. For speech extracted from video — MP4 or WebM video files used as audio sources — the tool accepts those containers directly and reads the audio track. The only formats that consistently fail across browsers are Windows Media Audio WMA, Apple's ALAC in bare CAF containers, and some exotic podcast formats like Ogg FLAC. If your file does not load, convert it to WAV or MP3 first using a free tool like Audacity or CloudConvert. File size limit is constrained by browser memory — files under 500 MB process reliably on most systems. Very long recordings above 2 hours may require splitting before upload. Practical tip: for podcast production workflows, export a WAV master from your DAW before uploading — WAV avoids the codec compatibility matrix entirely and gives the transcription engine the highest-quality input.
¿El audio sale de mi computadora durante la transcripción?
Whether audio leaves your computer depends on which transcription mode the tool uses. WikiPlus offers two paths. The first is the on-device path using the Web Speech API's SpeechRecognition interface, which is available in Chrome, Edge, and Safari. In Chrome and Edge on desktop, SpeechRecognition sends audio to Google's or Microsoft's cloud speech service respectively for processing. This is a browser-level behavior built into the platform, not a decision the tool makes. If you use Chrome, your audio is transmitted to Google's servers for recognition. Safari on macOS and iOS uses Apple's on-device speech recognition for short clips when the device is configured for offline recognition, but longer recordings may also route through Apple's cloud. The second path is fully on-device using the Whisper model compiled to WebAssembly. When the tool loads the Whisper WASM build, all transcription runs inside your browser with zero network requests for the audio data. The model weights are downloaded once and cached locally. In this mode, no audio byte leaves your machine. For sensitive recordings — legal consultations, medical notes, confidential business meetings, personal diaries — always verify that the on-device Whisper mode is active before uploading. The tool displays which engine is in use in the status panel. Practical tip: check the Network tab in browser Developer Tools while transcribing a test clip — if no requests appear after the initial page load, you are using the on-device path and no audio is transmitted.
¿Por qué mi archivo largo tarda tanto en procesarse?
Transcription time scales roughly linearly with audio duration, but the exact pace depends heavily on which processing mode is active and the hardware it runs on. When using the Web Speech API route through Chrome or Edge, the browser streams audio to a remote recognition service in chunks. Network latency, server load, and the streaming overhead mean that a 60-minute recording takes approximately 5 to 15 minutes of real time depending on connection speed and current server load. When using the on-device Whisper WASM path, processing time is determined entirely by your CPU. Whisper's base model running in WebAssembly on a modern laptop CPU at 2 to 3 GHz typically processes audio at 0.3 to 0.8 times real speed — meaning a 10-minute recording takes 12 to 33 minutes of CPU time. The large model offers higher accuracy but processes at 0.1 to 0.2 times real speed, so a 60-minute recording may take 5 to 10 hours. GPU acceleration is not available through the WASM path in current browsers. The tool processes audio in segments and updates the transcript progressively, so you see results as they emerge rather than waiting for the entire file to complete. For very long recordings — interviews, lectures, meetings over an hour — splitting the file into 15 to 20-minute segments using the Audio Trimmer tool before uploading significantly reduces per-segment processing time and makes the output easier to review incrementally. Practical tip: use the base or small Whisper model for initial drafts and switch to the medium model only for final cleanup passes where accuracy on technical terminology matters.
¿Qué tan precisa es la transcripción resultante?
Accuracy depends on audio quality, speaker clarity, background noise, and the model selected. For clean recordings of a single speaker with minimal background noise — podcasts, voiceovers, recorded lectures, video call recordings — Whisper base achieves word error rates of 5 to 10 percent on standard English. That translates to approximately 1 mistake per 10 to 20 words before manual correction. Whisper medium reduces error rates to 3 to 6 percent on the same content. The Web Speech API via Chrome achieves similar accuracy to Whisper small on English when connection quality is good. Accuracy degrades with several factors. Heavy background noise — traffic, crowd, HVAC — is the biggest degrader. Multiple overlapping speakers are difficult for any single-pass transcription system. Strong regional accents, non-native speaker pronunciation, and domain-specific technical vocabulary are recognized less reliably than standard broadcast English. Medical, legal, and engineering terminology that does not appear frequently in training data produces more errors. Non-English languages vary widely by model support. Whisper handles Spanish, French, German, Portuguese, and Japanese with near-English accuracy. Less common languages have meaningfully higher error rates. The transcript output requires human review and correction before professional use — transcription tools are a time-saving first pass, not a final deliverable. Practical tip: after downloading the transcript, run a single pass looking specifically for proper nouns, product names, and technical terms — these are the categories most likely to be misrecognized and the most important to correct for professional documents.
El contenido de esta pagina esta disponible bajo CC BY 4.0.