ASR · Telugu · wav2vec2
Swecha Gonthuka ASR
A Telugu automatic speech recognition model trained on the community-collected Swecha Gonthuka dataset. Converts spoken Telugu audio to text using a fine-tuned wav2vec2 architecture.
Architecture
Base model wav2vec2
Task Speech-to-text (ASR)
Language Telugu (te)
Input format 16 kHz mono WAV
Output Telugu text (Unicode)
Evaluation metric CER (Character Error Rate)
The model is fine-tuned from a pretrained wav2vec2 checkpoint. Text is normalized to Telugu script and spaces before scoring. See the research page for training details.
Capabilities
- Transcribes spoken Telugu audio to Unicode text.
- Handles dialectical variation present in the training data.
- Operates on standard 16 kHz mono WAV recordings.
- Optimized for Telugu-only speech; not designed for code-switched input.
Evaluation Results
| Dataset | Test Samples | CER (%) |
|---|---|---|
| FLEURS (te_in) | 304 | 6.32 |
| OpenSLR66 | 420 | 9.00 |
| Common Voice 22 (te) | 58 | 11.92 |
Evaluation used only samples containing no English words. See the research page for evaluation design rationale.
Use Cases
Transcription
Converting recorded Telugu speech to text for archiving or content workflows.
Accessibility
Captions and voice interfaces for Telugu-speaking users.
Agriculture
Voice-enabled information access for farmers in Telugu.
Healthcare
Digitizing spoken patient records at rural medical camps.
Education
Real-time captioning for Telugu-medium technical instruction.
Documentation
Preserving spoken Telugu across dialects and regions.
Dictation Application
A browser-based Telugu dictation tool built on this model. Inference runs entirely on-device via WebGPU (Chrome/Edge) or WASM (Firefox/Safari) — no audio leaves the browser.
Push-to-talk
Hold to record, release to transcribe.
Continuous mode
Silero VAD detects speech segments automatically.
Works offline
Model cached in browser after first load (~190 MB fp16 on WebGPU, ~380 MB fp32 on WASM).
Model Weights
Weights, usage instructions, and API documentation on Hugging Face.
View on Hugging Face