
A benchmark for post-interruption recovery in voice agents — whether what the model says after you cut in resumes the workflow correctly. 428 interruption samples, 6 types, 10 enterprise domains, closed- and open-weight models.

A benchmark for post-interruption recovery in voice agents — whether what the model says after you cut in resumes the workflow correctly. 428 interruption samples, 6 types, 10 enterprise domains, closed- and open-weight models.
Developers can now generate talking-head avatar videos from a still image and either an audio clip or Higgs TTS text input.

Higgs TTS 3 is built for voice chat: it speaks, not just reads. It turns model responses into expressive conversational speech across 100 languages, with zero-shot voice cloning and inline control over emotion, style, prosody, pauses, and sound effects.

A benchmark for conversational proactivity in LLMs — noticing and acting on what the user implied but never said. 198 curated dialogues, 624 trigger points, 16 models, and a leaderboard where Recovery proves dramatically hard.
A real-time foundation model that brings human-like digital presence to customer conversations, virtual assistants, training, and interactive experiences

Today, we are publicly releasing Higgs STT 3, a state-of-the-art Speech-to-Text (STT / ASR) foundation model. It supports 94 languages with sophisticated language detection, advanced sentiment and semantic understanding, and outperforms whisper-v3-large by a large margin on key languages.