60-80ms Typical streaming latency
Hindi + English Code-mixed speech support
Proprietary Private model and implementation

Access

Evaluate in the playground. Contact RinggAI for production access.

This Space provides product information for Ringg Parrot STT V1. The model weights, training code, and internal implementation are not open sourced.

  • Playground access is available at ringg.ai.
  • Model weights are not available for download from this Space.
  • Production and commercial access requires RinggAI approval.
Contact sales@ringg.ai

SDK and Integration

Integrate with voice-agent and real-time audio pipelines.

The Ringg SDK helps developers connect Ringg STT into application workflows. Ringg Parrot STT V1 is highly compatible with Pipecat toolkit using built-in VAD events.

  • Python SDK is available through the ringglabs package on PyPI.
  • Built for low-latency streaming speech recognition.
  • Supports modern voice-agent orchestration patterns.
View ringglabs on PyPI

Benchmarks

WER comparison across ASR benchmark datasets.

WER stands for Word Error Rate. Lower values indicate better transcription accuracy. The lowest WER in each row is highlighted.

Original WER

Lower is better
Dataset Ringg ElevenLabs Deepgram Sarvam
indictts 11.58 16.06 13.65 15.37
commonvoice 14.30 16.59 20.04 18.21
fleurs 15.20 11.99 17.14 16.00
kathbath 11.78 13.24 15.93 17.53
kathbath_noisy 13.09 13.14 17.44 16.19
mucs 14.55 11.69 21.97 16.72
Overall WER 13.79 13.00 19.23 16.72

Normalized WER

Lower is better
Dataset Ringg ElevenLabs Deepgram Sarvam
indictts 3.94 8.52 6.93 7.84
commonvoice 6.37 13.02 14.88 13.06
fleurs 9.73 7.67 11.35 9.54
kathbath 7.15 10.15 11.38 10.41
kathbath_noisy 8.37 10.01 12.98 11.78
mucs 6.28 6.75 12.07 7.58
Overall WER 7.27 8.94 12.36 9.76

Features

  • Hindi-English code-mixed speech recognition.
  • Real-time streaming transcription.
  • File-based transcription for common audio formats.
  • Low-latency inference for voice products.

Supported Inputs

  • Hindi, English, and code-mixed speech.
  • Clear audio with minimal background noise.
  • 16kHz or higher sample rate recommended.
  • WAV, MP3, FLAC, M4A, OGG, and OPUS.

Use Cases

  • Voice assistants and AI agents.
  • Contact center transcription.
  • Meeting and conversation intelligence.
  • Voice search, subtitling, and accessibility workflows.

Limitations

  • Accuracy may vary with noisy or low-quality audio.
  • Overlapping speakers and dialect variation can affect quality.
  • Very long files or unsupported encodings may require preprocessing.
  • The hosted demo may differ from production deployment settings.

Benchmark Dataset

Released benchmark data and ASR transcriptions.

RinggAI has released the ASR Benchmarking Open-Source Dataset, which includes benchmark audio/data and transcriptions generated by Ringg, ElevenLabs, Deepgram, and Sarvam.

Privacy and Data Notice

Review deployment terms before using sensitive data.

Audio handling may depend on the selected deployment, integration, and commercial terms. Review RinggAI privacy terms and deployment documentation before using the service with sensitive, regulated, or personally identifiable data.