Speech Recognition

10 Best Free Linux Speech Recognition Tools – Open Source Software

Speech is an increasingly popular method of interacting with electronic devices such as computers, phones, tablets, and televisions. Speech is probabilistic, and speech engines are never 100% accurate. But technological advances have meant speech recognition engines offer better accuracy in understanding speech. The better the accuracy, the more likely customers will engage with this method of control. And, according to a study by Stanford University, the University of Washington and Chinese search giant Baidu, smartphone speech is three times quicker than typing a search query into a screen interface.

The speech recognition market is estimated to be worth about $10 billion a year in the next four years. Witness the rise of intelligent personal assistants, such as Siri for Apple, Cortana for Microsoft, and Mycroft for Linux. The assistants use voice queries and a natural language user interface to attempt to answer questions, make recommendations, and perform actions without the requirement of keyboard input. And the popularity of speech to control devices is testament to dedicated products that have dropped in large quantities such as Amazon Echo, Google Home, and Apple HomePod. Speech recognition is also used in smart watches, household appliances, and in-car assistants. In-car applications have lots of mileage (excuse the pun). Some of the in-car applications include navigation, asking for weather forecasts, finding out the traffic situation ahead, and controlling elements of the car, such as the sunroof, windows, and music player.

The key challenge for developing speech recognition software, whether it’s used in a computer or another device, is that human speech is extremely complex. The software has to cope with varied speech patterns, and individuals’ accents. And speech is a dynamic process without clearly distinguished parts. Fortunately, technical advancements have meant it’s easier to create speech recognition tools. Powerful tools like machine learning and artificial intelligence, coupled with improved speech algorithms, have altered the way these tools are developed. You don’t need phoneme dictionaries. Instead, speech engines can employ deep learning techniques to cope with the complexities of human speech.

There aren’t that many speech recognition toolkits available, and some of them are proprietary software. Fortunately, there are some very exciting open source speech recognition toolkits available. These toolkits are meant to be the foundation to build a speech recognition engine.

This article highlights the best open source speech recognition software for Linux.

Before examining our recommendations, Jasper is worthy of a special mention. It’s an excellent open source platform for developing always-on, voice-controlled applications. You may be wondering why HTK doesn’t appear below. For sure, HTK is a popular speech recognition toolkit. But HTK is not eligible to feature in the recommended solutions. Not because it’s copyright is owned by Microsoft, but simply because it’s proprietary software.

Speech Recognition Tools

Let’s explore the 10 free speech recognition tools at hand. For each title we have compiled its own portal page with a full description and an in-depth analysis of its features.

Open Source Speech Recognition Tools
DeepSpeechTensorFlow implementation of Baidu's DeepSpeech architecture.
wav2letter++End-to-End speech recognition system
deepspeech.pytorchImplementation of DeepSpeech2 using Baidu Warp-CTC.
KaldiC++ toolkit designed for speech recognition researchers.
JuliusTwo-pass large vocabulary continuous speech recognition engine
ESPnetEnd-to-End speech processing toolkit
OpenSeq2SeqTensorFlow-based toolkit for sequence-to-sequence models
CMUSphinxSpeech recognition system for mobile and server applications
EesenEnd-to-End Speech Recognition
SimonFlexible speech recognition software
Return to our complete collection of recommended free and open source software including our latest additions.
Share this article

6 comments

    1. This clause is particularly damning:

      2.2 The Licensed Software either in whole or in part can not be distributed or sub-licensed to any third party in any form.

  1. Sadly my machine doesn’t have sufficient RAM on my graphics card to experiment with DeepSpeech. Any recommendations for a good GPU that works well with DeepSpeech?

  2. Thanks for the comprehensive info regarding the open source tools. From the perspective of a visually impaired person, what I would like to know is which of these would be most suitable (now or in near future) for dictating to get text that could go into documents, e-mail, etc. Is that Simon?

Share your Thoughts

This site uses Akismet to reduce spam. Learn how your comment data is processed.