Machine Learning in Linux

Machine Learning in Linux: Dia – 1.6B parameter text to speech model

Artificial intelligence icon Our Machine Learning in Linux series focuses on apps that make it easy to experiment with machine learning. All the apps covered in the series can be self-hosted.

Dia is a 1.6B parameter text to speech model which is capable of generating ultra-realistic dialogue in one pass. It’s free and open source software.

Installation

The easiest way to install Dia is courtesy of Pinokio, a browser that lets you install, run, and manage any server application, on your local machine. These applications are AI software. Pinokio is not a browser in the traditional sense.

Pinokio literally makes installation a single click affair.

Installing Dia
Click image for full size

In Operation

Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc.

Features include:

  • Generate dialogue via [S1] and [S2] tag.
  • Generate non-verbal like (laughs), (coughs), etc.
    • Below verbal tags will be recognized, but might result in unexpected output.
      (laughs), (clears throat), (sighs), (gasps), (coughs), (singing), (sings), (mumbles), (beep), (groans), (sniffs), (claps), (screams), (inhales), (exhales), (applause), (burps), (humming), (sneezes), (chuckle), (whistles)
  • Voice cloning.

Enter some text and you can quickly convert that to extremely realistic speech.

Dia in action
Click image for full size

Here’s example output.

Here’s output with the same text prompt and settings. You get different voices every time you run the model.

The wav files each took about 40 seconds to generate with an NVIDIA GeForce RTX 3060 Ti graphics card.

Summary

Dia can generate remarkably realistic dialogue with very little effort.

It’s definitely worth installing and gets our recommendation. Currently only English generation is supported, but there are plans to support other languages.

At the moment only GPU support is available, but there are plans to support CPU.

Website: github.com/nari-labs/dia
Support:
Developer: Nari Labs
License: Apache License 2.0

Dia is written in Python. Learn Python with our recommended free books and free tutorials.

Artificial intelligence icon For other useful open source apps that use machine learning/deep learning, we’ve compiled this roundup.

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Read our Comment FAQ before commenting.

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Aleksandar
Aleksandar
8 hours ago

Those sound clips sound pretty authentic. Nice.