ASR · Telugu · wav2vec2

Swecha Gonthuka ASR

A Telugu automatic speech recognition model trained on the community-collected Swecha Gonthuka dataset. Converts spoken Telugu audio to text using a fine-tuned wav2vec2 architecture.

Architecture

Base model wav2vec2
Task Speech-to-text (ASR)
Language Telugu (te)
Input format 16 kHz mono WAV
Output Telugu text (Unicode)
Evaluation metric CER (Character Error Rate)

The model is fine-tuned from a pretrained wav2vec2 checkpoint. Text is normalized to Telugu script and spaces before scoring. See the research page for training details.

Capabilities

  • Transcribes spoken Telugu audio to Unicode text.
  • Handles dialectical variation present in the training data.
  • Operates on standard 16 kHz mono WAV recordings.
  • Optimized for Telugu-only speech; not designed for code-switched input.

Evaluation Results

DatasetTest SamplesCER (%)
FLEURS (te_in)3046.32
OpenSLR664209.00
Common Voice 22 (te)5811.92

Evaluation used only samples containing no English words. See the research page for evaluation design rationale.

Use Cases

Transcription
Converting recorded Telugu speech to text for archiving or content workflows.
Accessibility
Captions and voice interfaces for Telugu-speaking users.
Agriculture
Voice-enabled information access for farmers in Telugu.
Healthcare
Digitizing spoken patient records at rural medical camps.
Education
Real-time captioning for Telugu-medium technical instruction.
Documentation
Preserving spoken Telugu across dialects and regions.

Dictation Application

A browser-based Telugu dictation tool built on this model. Inference runs entirely on-device via WebGPU (Chrome/Edge) or WASM (Firefox/Safari) — no audio leaves the browser.

Push-to-talk
Hold to record, release to transcribe.
Continuous mode
Silero VAD detects speech segments automatically.
Works offline
Model cached in browser after first load (~190 MB fp16 on WebGPU, ~380 MB fp32 on WASM).

Model Weights

Weights, usage instructions, and API documentation on Hugging Face.

View on Hugging Face

Try the Dictation App

Live browser demo — no install needed. Runs on WebGPU or WASM.

Open App