DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques. It’s a TensorFlow implementation of Baidu’s DeepSpeech architecture.
This open-source platform is designed for advanced decoding with flexible knowledge integration.
The software is in an early stage of development.
The core of the system is a bidirectional recurrent neural network (BRNN) trained to ingest speech spectrograms and generate English text transcriptions. A pre-trained English model is available for use.
The software needs Python 2.7, and the Git Large File Storage – a Git extension for versioning large files.
- 3 different ways to use the software:
- Python package.
- Command-line client.
- Node.JS package.
- Works with signed 16-bit PCM data.
- Takes word lattice as input, perform feature extraction specified by developers, generate factor graphs based on descriptive rules, and perform learning and inference automatically.
Supported operating systems:
- OS X 10.10, 10.11, 10.12 and 10.13
- Linux x86 64 bit with a modern CPU (needs at least AVX/FMA)
- Linux x86 64 bit with a modern CPU + NVIDIA GPU (Compute Capability at least 3)
- Raspbian Jessie on Raspberry Pi 3
|Read our complete collection of recommended free and open source software. The collection covers all categories of software.
The software collection forms part of our series of informative articles for Linux enthusiasts. There's tons of in-depth reviews, alternatives to Google, fun things to try, hardware, free programming books and tutorials, and much more.