More articles
Meet Higgs Avatar v1: real-time avatars for voice agents
A real-time foundation model that brings human-like digital presence to customer conversations, virtual assistants, training, and interactive experiences
A great voice agent solves half the conversation. The other half is the face. The thousand small signals a person reads without thinking. A glance. A nod. A flicker of recognition. They’re what tell you the person on the other side is actually with you.
Today we’re introducing Higgs Avatar v1, Boson’s real-time avatar foundation model. From a single still image, it generates a live, expressive face that speaks, listens, and reacts. Frame by frame, in step with the voice.
What you’re seeing above is fully AI-generated and unscripted. No animation pipeline. No pre-rendered loop. Every frame is rendered live — voice, dialogue, lip-sync, head motion, expression. The full pipeline runs on a single H100.
Use cases
How it works
Streams every frame live
We took a pre-trained video model and adapted it to generate one frame at a time. Each frame streams alongside the audio as the conversation happens.
Starts from one still image
A single photo is all it needs. Co-designed with Higgs Audio so lip-sync, expression, and head motion stay locked to the voice.
Renders faster than real-time
Generating one frame takes about 16 ms. The threshold for keeping up with a live conversation is 62.5 ms. The face never falls behind the voice.
Runs eight conversations per GPU
A single H100 hosts eight conversations at once, which keeps the per-conversation cost low enough for production.
A new layer for the Boson stack
Boson builds real-time voice agents for production business workflows, from customer support and sales, to education, telehealth, and beyond.
Higgs Avatar v1 adds a new layer to that stack. Higgs Audio handles speech understanding and generation. Higgs Avatar brings the face. Both are foundation models we trained ourselves, and they were designed to run together from the start.
We build our own models because production conversational AI cannot be assembled from external parts. Latency, turn-taking, speech understanding, speech generation, emotional alignment between voice and face, animation, workflow orchestration, end-to-end behavior.
We’ll share more from across the stack soon.
Getting started
Higgs Avatar v1 is in private preview. It will ship with Boson Presence, our upcoming voice chat experience. Join the waitlist to be the first to use it.
For enterprise integration, custom models, or API access, email us at contact@boson.ai
Join WaitlistAcknowledgments
Model: Junming Chen
Data: Junming Chen, Mu Li
Voice Agent: Weisu Yin
Release: Ke Bai, Alex Chen, Lindsey Allen, Mu Li