Boson AI

Jun. 29, 2026

IHBench: What Voice Agents Say After You Interrupt

A benchmark for post-interruption recovery in voice agents — whether what the model says after you cut in resumes the workflow correctly. 428 interruption samples, 6 types, 10 enterprise domains, closed- and open-weight models.

May. 28, 2026

ProactBench: Beyond what the user asked for

A benchmark for conversational proactivity in LLMs — noticing and acting on what the user implied but never said. 198 curated dialogues, 624 trigger points, 16 models, and a leaderboard where Recovery proves dramatically hard.

Aug. 5, 2024

Introducing RPBench-Auto

Since the release of Higgs Llama 2, we have received much positive feedback from the community. While we are amazed by the community's creativity in utilizing our model, we realize the importance of providing an automated benchmark to effectively evaluate large language model (LLM)'s roleplaying capability.

Jul. 15, 2024

Announcing Higgs Llama 2

At Boson AI, we are working on intelligent agents that can serve as human companions and helpers. Today we are excited to share Higgs Llama 2 70B, a new model that significantly improves upon its predecessor. It narrows the gap to the very best proprietary models on benchmarks relevant for dialog interaction and understanding.

Jun. 5, 2024

Announcing the Higgs Family of LLMs

Since founding Boson AI in 2023, we have dedicated ourselves to empower enterprises with AI technologies, with a mission to transform how stories are told, knowledge is learned, and insights are gathered. We helped customers build intelligent agents to interact with their users by playing various roles, including game characters, language tutors, insurance agents and financial advisors.