I’m excited to announce the initial release of Mozilla’s open source speech recognition model that has an accuracy approaching what humans can perceive when listening to the same recordings. We are also releasing the world’s second largest publicly available voice dataset, which was contributed to by nearly 20,000 people globally.
This is why we started DeepSpeech as an open source project. Together with a community of likeminded developers, companies and researchers, we have applied sophisticated machine learning techniques and a variety of innovations to build a speech-to-text engine that has a word error rate of just 6.5% on LibriSpeech’s test-clean dataset.
In our initial release today, we have included pre-built packages for Python, NodeJS and a command-line binary that developers can use right away to experiment with speech recognition.
Our aim is to make it easy for people to donate their voices to a publicly available database, and in doing so build a voice dataset that everyone can use to train new voice-enabled applications.
Today, we’ve released the first tranche of donated voices: nearly 400,000 recordings, representing 500 hours of speech. Anyone can download this data.
To this end, while we’ve started with English, we are working hard to ensure that Common Voice will support voice donations in multiple languages beginning in the first half of 2018.
Finally, as we have experienced the challenge of finding publicly available voice datasets, alongside the Common Voice data we have also compiled links to download all the other large voice collections we know about.