eSpeak is a Text to Speech engine for good quality English, and potentially other languages. Compact size and clear pronunciation. It speaks text files and also operates as a "talker" within the KDE TTS system as an alternative to Festival etc. Read more hot
Blather is a speech recognizer that will run commands when a user speaks preset sentences. new
LiSpeak is a simple voice command system for Ubuntu and other linux distros. LiSpeak has been designed for the everyday user, LiSpeak runs only as an app indicator only displaying information when you want it. new
The Cainteoir Text-to-Speech program consists of a graphial user interface that supports reading, editing and recording documents.
CVoiceControl is a speech recognition system that enables a user to connect spoken commands to unix commands. It automagically detects speech input from a microphone, performs recognition on this input and - in case of successful recognition - executes the associated unix command. CVoiceControl is a KDE and X Windows independent version of its predecessor KVoiceControl.
(ftp only) Ears is a partially completed speech synthesis program.
Festival offers a general framework for building speech synthesis systems as well as including examples of various modules. As a whole it offers full text to speech through a number APIs: from shell level, though a Scheme command interpreter, as a C++ library, from Java, and an Emacs interface. Festival is multi-lingual (currently English (British and American), and Spanish). The system is written in C++ and uses the Edinburgh Speech Tools Library for low level architecture and has a Scheme (SIOD) based command interpreter for control. Read more
Flinger is a program for synthesizing singing voice from a MIDI file input. It based on the Festival Speech Synthesis System, the C++ library contained in earlier versions of the TclMIDI package, various modules, and a singing-voice synthesize.
Gespeaker is a GTK+ frontend for espeak. It allows to play a text in many languages with settings for voice, pitch, volume, speed and word gap. The text played can also be recorded to WAV file.
gvoice is a plugin for GKrellM that allows the user to have voice alerts, via IBM's ViaVoice technology.
ISIP is a speech recognition engine. The toolkit includes a front-end, a decoder, and a training module. It's a functional toolkit.
The Jovie text-to-speech system is a plugin based service that allows any KDE (or non-KDE) application to speak using the D-Bus interface. This software aims to become the standard subsystem for all KDE applications to provide speech output. Read more
Julius Speech Recognition Engine
Julius is a high-performance, two-pass large vocabulary continuous speech recognition (LVCSR) decoder software for speech-related researchers and developers. Based on word 3-gram and context-dependent HMM, it can perform almost real-time decoding on most current PCs in 20k word dictation task. Read more
KMouth is a KDE program which enables persons that cannot speak to let their computer speak, e.g. mutal people or people who have lost their voice. KMouth itself does not contain a speech synthesizer. Instead it assumes that a speech synthesizer is available in the system. Currently it either uses any shell command specified by the user or the the text-to-speech interface of KTTSD.
KTTS -- KDE Text-to-Speech -- is a subsystem within the KDE desktop for conversion of text to audible speech. KTTS is currently under development and aims to become the standard subsystem for all KDE applications to provide speech output.
LLiaPhon is a text-to-speech application; it translates text into phonetic descriptions, which can be used by a speech synthesizer to produce audio. It is designed for French texts.
The aim of the MBROLA project, initiated by the TCTS Lab of the Faculte Polytechnique de Mons (Belgium), is to obtain a set a speech synthesizers for as many languages as possible, free of use for non-commercial applications. Central to the MBROLA project is MBROLA, a speech synthesizer based on the concatenation of diphones. It takes a list of phonemes as input, together with prosodic information (duration of phonemes and a piecewise linear description of pitch), and produces speech samples on 16 bits (linear), at the sampling frequency of the diphone database used (it is therefore NOT a Text-To-Speech (TTS)synthesizer, since it does not accept raw text as input).
(commercial) Nuance is a speech recognition and natural language understanding server, delivers the recognition accuracy, scalability and robustness required in telecommunications, enterprise, and Internet applications.
Open Mind Speech
Open Mind Speech is part of the Open Mind Initiative and aims to develop free (GPL) speech recognition tools and applications, as well as collect speech data from "e-citizens" using the Internet.
Orca Screen Reader
Orca Screen Reader (Orca) is a free, open source scriptable screen reader which provides access to applications and toolkits. It provides alternative access to the desktop by using speech synthesis, braille, and magnification. Read more
pidgin-festival is a plugin for pidgin that interfaces with the popular program festival. It allows for instant messages to be spoken by festival so you can hear it through your speakers.
Praat is a program for speech analysis and synthesis written by Paul Boersma and David Weenink at the Department of Phonetics of the University of Amsterdam. The program is constantly being improved and a new build is published almost every week. Read more
Screader is a screen reader using software or hardware speech synthesizer.
Simon is open source speech recognition software which aims to be flexible and highly customizable. You can open programs, URLs, type configurable text snippets, simulate shortcuts, control the mouse and keyboard and more. Read more
Sirius is an open end-to-end standalone speech and vision based intelligent personal assistant (IPA) service similar to Apple?s Siri, Google?s Google.
to provide a device independent layer for speech synthesis through a simple, stable and well documented interface. It takes care of most of the tasks necessary to solve in speech enabled applications. What is a very high level GUI library to graphics, Speech Dispatcher is to speech synthesis
SpeechLion is a speech recognition application for desktop command and control. It is based on the Sphinx-4 recognizer, and it allows the user to control the Linux desktop using simple spoken commands.
Sphinx is a speaker-independent large vocabulary continuous speech recognizer under Berkeley's style license. It is also a collection of open source tools and resources that allows researchers and developers to build speech recognition system.
Tao is for sound synthesis using physical models. It provides a virtual acoustic material constructed from masses and springs which can be used as the basis for building quite complex virtual musical instruments.
TkFestival is a frontend to Festival, a speech synthesis program. TkFestival is written in tcl/tk and uses expectk to communicate with the festival binary.
VoiceApp works by reading audio directly from your soundcard (in Linux it's file /dev/dsp), and transforming data via FFT. The result is thrown on the screen in matrixish colors.
VoxForge collects transcribed user-submitted speech audio files (collectively called a "speech corpus") to create acoustic models for use with speech recognition engines such as HTK, Julius, CAVS (formerly ISIP), and Sphinx. The current focus is on collecting audio to create acoustic models for command and control applications on a PC, and for voice over IP telephony speech recognition applications, i.e. IVR (Interactive Voice Response).
XVoice enables speech to text translation for many X applications. XVoice accepts continuous speech input from IBM's ViaVoice SDK for Linux, and then re-target the resulting text at many X applications.