ESPnet is an end-to-end speech processing toolkit, mainly focuses on end-to-end speech recognition and end-to-end text-to-speech.
ESPnet uses chainer as a main deep learning engine, and also follows Kaldi style data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments.
ESPnet is free and open source software.
Key Features
- Hybrid CTC/attention based end-to-end ASR:
- Fast/accurate training with CTC/attention multitask training.
- CTC/attention joint decoding to boost monotonic alignment decoding.
- Encoder: VGG-like CNN + BiRNN (LSTM/GRU), sub-sampling BiRNN (LSTM/GRU) or Transformer.
- Attention: Dot product, location-aware attention, variants of multihead.
- Incorporate RNNLM/LSTMLM trained only with text data.
- Batch GPU decoding.
- Transducer based end-to-end ASR:
- Available: RNN-Transducer, Transformer-Transducer, Transformer/RNN-Transducer.
- Support attention extension and VGG-Transformer (encoder).
- Tacotron2 based end-to-end TTS.
- Transformer based end-to-end TTS.
- Feed-forward Transformer (a.k.a. FastSpeech) based end-to-end TTS.
- Transformer based end-to-end ST.
- Transformer based end-to-end MT.
- Flexible network architecture thanks to chainer and pytorch.
- Kaldi style complete recipe:
- Support numbers of ASR recipes (WSJ, Switchboard, CHiME-4/5, Librispeech, TED, CSJ, AMI, HKUST, Voxforge, REVERB, etc).
- Support numbers of TTS recipes with a similar manner to the ASR recipe (LJSpeech, LibriTTS, M-AILABS, etc).
- Support numbers of ST recipes (Fisher-CallHome Spanish, Libri-trans, IWSLT’18, How2, Must-C, Mboshi-French, etc).
- Support numbers of MT recipes (IWSLT’16, the above ST recipes etc).
- Support speech separation and recognition recipe (WSJ-2mix).
- State-of-the-art performance in several ASR benchmarks (comparable/superior to hybrid DNN/HMM and CTC).
- State-of-the-art performance in several ST benchmarks (comparable/superior to cascaded ASR and MT).
- Flexible front-end processing thanks to kaldiio and HDF5 support.
- Tensorboard based monitoring.
Website: espnet.github.io/espnet
Support: GitHub Code Repository
Developer: Tomoki Hayashi, Hirofumi Inaguma, Naoyuki Kamo, Shigeki Karita, and many contributors
License: Apache License 2.0
ESPnet is written in Python. Learn Python with our recommended free books and free tutorials.
Related Software
| Speech Recognition Tools | |
|---|---|
| Whisper | Automatic speech recognition (system trained on 680,000 hours of data |
| Flashlight | Fast, flexible machine learning library written entirely in C++. |
| Coqui STT | Deep-learning toolkit for training and deploying speech-to-text models |
| Kaldi | C++ toolkit designed for speech recognition researchers. |
| SpeechBrain | All-in-one conversational AI toolkit based on PyTorch |
| Handy | Offline speech-to-text application |
| ESPnet | End-to-End speech processing toolkit |
| deepspeech.pytorch | Implementation of DeepSpeech2 using Baidu Warp-CTC. |
| Whispering | Transcription application with global speech-to-text functionality |
| Julius | Two-pass large vocabulary continuous speech recognition engine |
| CMUSphinx | Speech recognition system for mobile and server applications |
| Simon | Flexible speech recognition software |
| hyprwhspr | Native speech-to-text designed for Arch / Omarchy |
| ostt | Open Speech-to-Text |
| DeepSpeech | TensorFlow implementation of Baidu's DeepSpeech architecture. |
| OpenSeq2Seq | TensorFlow-based toolkit for sequence-to-sequence models |
| Eesen | End-to-End Speech Recognition |
Read our verdict in the software roundup.
Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk. You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more. Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form. |

