Research

Research: Swecha Gonthuka Telugu ASR

An overview of the research behind the Swecha Gonthuka Telugu speech recognition system. A full paper covering methodology, findings, and evaluation design is in preparation.

Swecha Gonthuka ASR Model | Swecha Swecha Gonthuka Dataset

Overview

Telugu is spoken by over 80 million people but remains underrepresented in open ASR research. Existing multilingual models often treat Telugu as a low-resource tail language, resulting in weak recognition across native speakers and dialectical variation.

Swecha Gonthuka starts from a community-collected, Telugu-first dataset rather than adapting a generic multilingual model. The aim is a recogniser that performs across the range of speakers in the training distribution — not only standardised studio speech.

Research Paper

Releasing Soon

A formal write-up is in preparation. It will cover:

Methodology
Model Training
Evaluation Design
Findings
Limitations
Future Directions

Use the Model

Model weights and usage documentation are available on Hugging Face.

View on Hugging Face