Developers can now generate talking-head avatar videos from a still image and either an audio clip or Higgs TTS text input.

Higgs TTS 3 is built for voice chat: it speaks, not just reads. It turns model responses into expressive conversational speech across 100 languages, with zero-shot voice cloning and inline control over emotion, style, prosody, pauses, and sound effects.
A real-time foundation model that brings human-like digital presence to customer conversations, virtual assistants, training, and interactive experiences

Today, we are publicly releasing Higgs STT 3, a state-of-the-art Speech-to-Text (STT / ASR) foundation model. It supports 94 languages with sophisticated language detection, advanced sentiment and semantic understanding, and outperforms whisper-v3-large by a large margin on key languages.

Today, we are proud to launch Higgs TTS 2.5, the latest iteration of Boson AI’s audio model, designed to bring high-fidelity generation into production environments. Building on Higgs TTS 2, this release combines improved efficiency with the stability required for real-world deployment.

Announcing Higgs TTS 2, our latest advancement in audio generation technology with enhanced multi-speaker and dialog capabilities. Now open source.