Voice Recognition

Eesen – End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding

The Eesen framework drastically simplifies the existing pipeline to build state-of-the-art ASR systems. Acoustic models in Eesen are deep bidirectional RNNs trained with the CTC objective function.

Eesen contains 4 key components to enable end-to-end ASR:

  • Acoustic Model — Bi-directional RNNs with LSTM units.
  • Training — Connectionist temporal classification (CTC) as the training objective.
  • WFST Decoding — A principled decoding approach based on Weighted Finite-State Transducers (WFSTs), or
  • RNN-LM Decoding — Decoding based on (character) RNN language models, when using Tensorflow.

Acoustic modeling in Eesen involves learning a single recurrent neural network (RNN) predicting context-independent targets (phonemes or characters). To remove the need for pre-generated frame labels, the connectionist temporal classification (CTC) objective function is adopted to infer the alignments between speech and label sequences. A distinctive feature of Eesen is a generalized decoding approach based on weighted finite-state transducers (WFSTs), which enables the efficient incorporation of lexicons and language models into CTC decoding. Experiments show that compared with the standard hybrid DNN systems, Eesen achieves comparable word error rates (WERs), while at the same time speeding up decoding significantly.

Key Features

  • The WFST-based decoding approach can incorporate lexicons and language models into CTC decoding in an effective and efficient way.
  • The RNN-LM decoding approach does not require a fixed lexicon.
  • GPU implementation of LSTM model training and CTC learning, now also using Tensorflow.
  • Multiple utterances are processed in parallel for training speed-up.
  • Fully-fledged example setups to demonstrate end-to-end system building, with both phonemes and characters as labels, following Kaldi recipes and conventions.

Website: github.com/srvk/eesen
Support:
Developer: Yajie Miao and contributors
License: Apache License 2.0

Eesen is written in C++. Learn C++ with our recommended free books and free tutorials.


Related Software

Speech Recognition Tools
WhisperAutomatic speech recognition (system trained on 680,000 hours of data
FlashlightFast, flexible machine learning library written entirely in C++.
Coqui STTDeep-learning toolkit for training and deploying speech-to-text models
KaldiC++ toolkit designed for speech recognition researchers.
SpeechBrainAll-in-one conversational AI toolkit based on PyTorch
HandyOffline speech-to-text application
ESPnetEnd-to-End speech processing toolkit
deepspeech.pytorchImplementation of DeepSpeech2 using Baidu Warp-CTC.
WhisperingTranscription application with global speech-to-text functionality
JuliusTwo-pass large vocabulary continuous speech recognition engine
CMUSphinxSpeech recognition system for mobile and server applications
SimonFlexible speech recognition software
hyprwhsprNative speech-to-text designed for Arch / Omarchy
osttOpen Speech-to-Text
DeepSpeechTensorFlow implementation of Baidu's DeepSpeech architecture.
OpenSeq2SeqTensorFlow-based toolkit for sequence-to-sequence models
EesenEnd-to-End Speech Recognition

Read our verdict in the software roundup.


Best Free and Open Source Software Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.

This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.

You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more.

Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form.
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments