Whisper
In Whisper, OpenAI uses large-scale weak supervision to train the system for speech recognition. It is designed not only for speech recognition but also for translation tasks, transcribing speech into text in its original language or translating it into English, voice activity detection, and language identification
|Earlier versions of Whisper used the following training data:|
|- 680,000 hours of labeled audio data,
- with 117,000 hours covering 96 languages,
- and 125,000 hours of translation data.|
|The latest Whisper large-v3-turbo was trained on 680,000 hours of audio and the corresponding transcripts collected from the internet where:|
|- 438,000 hours represent English-language audio and matched English transcripts,
- 126,000 hours (~18%) represent non-English audio and English transcripts,
- 117,000 hours (~17%) represent non-English audio and the corresponding transcript.|
|Non-English data now includes 98 different languages, so in total, Whisper works with 99 languages.|
|Released in September 2022, Whisper has gained more popularity now in 2024. But why did this happen?|
|## Story of Whisper|
|First of all, we’ll give a brief timeline of Whisper releases:|
|- September 2022: Whisper original series
- December 2022: Whisper large-v2, an improved large model
- November 2023: Whisper large-v3, which is a better and upgraded version of large-v2
- September 2024: Whisper large-v3-turbo model, optimized for inference speed|
|It's a pruned version of the large-v3 model, meaning it has fewer decoder layers (4 instead of 32). This contributes to its speed and efficiency. Now Whisper is completely open-source and can be run in your browser via Hugging Face. As OpenAI claims, the open-source version of Whisper and the API version are the same, but the API offers an optimized process.|
|But why is Whisper popular now more than in 2022?|
|We have found several reasons that might have cause this tendency. First of all, the huge rise of OpenAI, as a major player in AI, leads to more attention to all of their developments, including Whisper. Secondly, fully open-source makes Whisper accessible for everyone and involve more people to use it. And thirdly (maybe it’s the most obvious reason), the updated version of Whisper demonstrates better performance and capabilities compared to the first versions of the model.
|## How does Whisper work?| |An encoder-decoder Transformer architecture has been proven to scale well, that’s why OpenAI has chosen it to build Whisper. This capability is crucial for a large-scale weak supervision approach. Let’s explore all the part of the model architecture.| ||