Machine Learning in Linux: Tortoise TTS - text-to-speech program - Page 2 of 2

In Operation

Let’s pick a voice to use. Here’s a list of the available pre-trained voices.

Here’s example output using the emma voice. We’ve chosen the high_quality preset which does significantly increase the processing time.

$ python tortoise/do_tts.py --output_path /home/sde/results --preset high_quality --voice emma --text "Tortoise is a text to speech program built with the following priorities: Strong multi-voice capabilities. Highly realistic prosody and intonation."

To give you a flavour of some of the other voices, check out these example outputs.

$ python tortoise/do_tts.py --output_path /home/sde/results --preset high_quality --voice pat --text "We hope you enjoy our reviews. We cover both software and hardware from a Linux perspective. We love receiving your thoughts on our site, so please share in the comments section below."

$ python tortoise/do_tts.py --output_path /home/sde/results --preset high_quality --voice tim_reynolds --text "Thanks to everyone that has donated to our site. We really appreciate your support."

There are also a couple of scripts that let you use text files. Here’s example output with random voices:

$ python tortoise/read.py --textfile /home/sde/linux-intro --voice random

Tortoise lives up to its name in processing speed. The above clip took just over 18 minutes to generate. But there are other presets available.

Here’s the same text read using the ultra fast preset. Processing takes a mere 104 seconds.

$ python tortoise/read.py --preset ultra_fast --textfile /home/sde/linux-intro --voice random

With the high quality preset, processing time took over 25 minutes for the same text.

$ python tortoise/read.py --preset high_quality --textfile /home/sde/linux-intro --voice random

Summary

Tortoise is an awesome text-to-speech program. It is extremely slow in generating samples if you don’t use the ultra_fast preset, but the quality is extremely good. For the best results, you’ll need to train your own voices.

Tortoise has its own API which lets you use it programmatically.

It’s possible to use Tortoise without a dedicated NVIDIA GPU, but expect processing to be really slow. As we explained above, our test machine with its NVIDIA RTX 3060 Ti graphics card took 104 seconds to generate 70 seconds of audio with the ultra_fast preset. For illustration purposes, we repeated the process with a machine without a dedicated graphics card. The machine has a respectable CPU (an Intel i7-1360P with 12 cores, 16 threads). Processing took 28 minutes 58 seconds to generate the audio file with that CPU.

The upshot is that you really do need an NVIDIA dedicated graphics card if you want to run Tortoise.

Website: github.com/neonbjb/tortoise-tts
Support:
Developer: James Betker
License: Apache License v2.0

Tortoise is written in Python. Learn Python with our recommended free books and free tutorials.

For other useful open source apps that use machine learning/deep learning, we’ve compiled this roundup.

Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation and Summary

Related Software

Speech Tools
Piper	Fast, local neural text to speech system
Tortoise	Multi-voice text-to-speech system trained with an emphasis on quality
Coqui TTS	Offers pretrained models in more than 1,100 different languages
Bark	Transformer-based text-to-audio model.
Dia	1.6B parameter text to speech model
Festival	General multi-lingual speech synthesis system
PraatSpeechAnalyser	Software for speech analysis and synthesis
Speech Note	Speech to Text, Text to Speech and Machine Translation
Mimic 3	Lightweight Text to Speech engine
OrcaScreenReader	Scriptable screen reader
MeloTTS	High-quality multi-lingual text-to-speech library
Parler-TTS	Lightweight text-to-speech (TTS) model
Flite	Small, fast run time text to speech synthesis engine
RHVoice	Gives the visually impaired a synthesis voice with their screen reader
eSpeak NG	Continuation of the eSpeak project
eSpeak	Speech synthesizer using a formant synthesis method
Orpheus-TTS-FastAPI	High-performance self-hosted text-to-speech server
Gespeaker	GTK-based frontend for eSpeak
VoiceGen	Simple text-to-speech application
Glate	Google Translator and Text To Speech Service

Read our verdict in the software roundup.

Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.

This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.

You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more.

Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form.

Pages: 1 2

Documents	Internet	Education
Audio	Video	Graphics
Admin	Desktop	Productivity
Science	Games	Security
Utilities	Coding	Finance
Web Apps	Other	Books

Google	Microsoft	Apple
Adobe	IBM	Autodesk
Oracle	Atlassian	Corel
Cisco	Intuit	SAS
Progress	Salesforce	Citrix