Let’s pick a voice to use. Here’s a list of the available pre-trained voices.
Here’s example output using the emma voice. We’ve chosen the high_quality preset which does significantly increase the processing time.
$ python tortoise/do_tts.py --output_path /home/sde/results --preset high_quality --voice emma --text "Tortoise is a text to speech program built with the following priorities: Strong multi-voice capabilities. Highly realistic prosody and intonation."
To give you a flavour of some of the other voices, check out these example outputs.
$ python tortoise/do_tts.py --output_path /home/sde/results --preset high_quality --voice pat --text "We hope you enjoy our reviews. We cover both software and hardware from a Linux perspective. We love receiving your thoughts on our site, so please share in the comments section below."
$ python tortoise/do_tts.py --output_path /home/sde/results --preset high_quality --voice tim_reynolds --text "Thanks to everyone that has donated to our site. We really appreciate your support."
There are also a couple of scripts that let you use text files. Here’s example output with random voices:
$ python tortoise/read.py --textfile /home/sde/linux-intro --voice random
Tortoise lives up to its name in processing speed. The above clip took just over 18 minutes to generate. But there are other presets available.
Here’s the same text read using the ultra fast preset. Processing takes a mere 104 seconds.
$ python tortoise/read.py --preset ultra_fast --textfile /home/sde/linux-intro --voice random
With the high quality preset, processing time took over 25 minutes for the same text.
$ python tortoise/read.py --preset high_quality --textfile /home/sde/linux-intro --voice random
Tortoise is an awesome text-to-speech program. It is extremely slow in generating samples if you don’t use the ultra_fast preset, but the quality is extremely good. For the best results, you’ll need to train your own voices.
Tortoise has its own API which lets you use it programmatically.
It’s possible to use Tortoise without a dedicated NVIDIA GPU, but expect processing to be really slow. As we explained above, our test machine with its NVIDIA RTX 3060 Ti graphics card took 104 seconds to generate 70 seconds of audio with the ultra_fast preset. For illustration purposes, we repeated the process with a machine without a dedicated graphics card. The machine has a respectable CPU (an Intel i7-1360P with 12 cores, 16 threads). Processing took 28 minutes 58 seconds to generate the audio file with that CPU.
The upshot is that you really do need an NVIDIA dedicated graphics card if you want to run Tortoise.
Developer: James Betker
License: Apache License v2.0
For other useful open source apps that use machine learning/deep learning, we’ve compiled this roundup.