Revolutionizing Speech Recognition: Rev’s Reverb ASR Model

ully
2 min readOct 6, 2024
Image 1: Image

Rev has just released an open-source speech recognition model dubbed the “Whisper terminator,” setting a new benchmark in speech recognition and speaker diarization.

Named Reverb ASR, this model not only boasts impressive performance but also generously shares its model weights on the Hugging Face Hub.

Reverb ASR: A Super Model Trained on 200K Hours of Data

Reverb ASR is no ordinary model. It has been trained on an unprecedented 200,000 hours of human-transcribed data, achieving the industry’s lowest word error rate (WER).

What’s more exciting is that this model supports customizable word-by-word transcription. This means users can adjust the precision and style of the transcription according to their needs.

Speaker Diarization: Enhanced with 26K Hours of Labeled Data

Rev’s team didn’t stop at speech recognition. They also made significant strides in speaker diarization (Diarization).

By leveraging 26,000 hours of labeled data, they fine-tuned the pyannote model, releasing two versions of the speaker diarization model:

  • v1 version: Based on the pyannote3.0 architecture, trained for 17 rounds.

--

--

ully
ully

No responses yet