Show HN: Dia, an open-weights TTS model for generating realistic dialogue

IshKebab · 2025-04-21T17:29:17 1745256557

Why does it say "join waitlist" if it's already available?

Also, you don't need to explicitly create and activate a venv if you're using uv - it deals with that nonsense itself. Just `uv sync`.

stuartjohnson12 · 2025-04-21T17:23:40 1745256220

Impressive project! We'd love to use something like this over at Delfa (https://delfa.ai). How does this hold up from the perspective of stability? I've spoken to various folks working on voice models, and one thing that has consistently held Eleven Labs ahead of the pack from my experience is that their models seem to mostly avoid (while albeit not being immune to) accent shifts and distortions when confronted with unfamiliar medical terminology.

A high quality, affordable TTS model that can consistently nail medical terminology while maintaining an American accent has been frustratingly elusive.

toebee · 2025-04-21T17:07:07 1745255227

Hey HN! We’re Toby and Jay, creators of Dia. Dia is 1.6B parameter open-weights model that generates dialogue directly from a transcript.

Unlike TTS models that generate each speaker turn and stitch them together, Dia generates the entire conversation in a single pass. This makes it faster, more natural, and easier to use for dialogue generation.

It also supports audio prompts — you can condition the output on a specific voice/emotion and it will continue in that style.

Demo page comparing it to ElevenLabs and Sesame-1B https://yummy-fir-7a4.notion.site/dia

We started this project after falling in love with NotebookLM’s podcast feature. But over time, the voices and content started to feel repetitive. We tried to replicate the podcast-feel with APIs but it did not sound like human conversations.

So we decided to train a model ourselves. We had no prior experience with speech models and had to learn everything from scratch — from large-scale training, to audio tokenization. It took us a bit over 3 months.

Our work is heavily inspired by SoundStorm and Parakeet. We plan to release a lightweight technical report to share what we learned and accelerate research.

We’d love to hear what you think! We are a tiny team, so open source contributions are extra-welcomed. Please feel free to check out the code, and share any thoughts or suggestions with us.

strobe · 2025-04-21T17:18:31 1745255911

just in case, another opensource project using same name https://wiki.gnome.org/Apps/Dia/

https://gitlab.gnome.org/GNOME/dia

toebee · 2025-04-21T17:28:38 1745256518

Thanks for the heads-up! We weren’t aware of the GNOME Dia project. Since we focus on speech AI, we’ll make sure to clarify that distinction.

(评论) (comments)

(评论)
(comments)