Neil Zeghidour on Voice AI’s ‘Her’ Moment
Neil Zeghidour on Voice AI’s ‘Her’ Moment
Publish Date: 2026-05-09 12:01:00
Source Domain: www.startuphub.ai
-
“Her” Moment and Natural Conversational AI: Neil Zeghidour, CEO of Gradium AI, initiated the discussion by emphasizing the long-anticipated goal of developing conversational AI that behaves similarly to Samantha from the film “Her.” While advancements have been made, creating truly seamless, natural, and empathetic human-AI interactions remains a work in progress.
-
Gradium AI’s Mission and Technology: Zeghidour highlighted Gradium AI’s mission to enhance the potential of voice AI by making natural voice the primary interface for AI. The company focuses on training voice models for applications such as speech-to-text, text-to-speech, and speech-to-speech translation, incorporating comprehensive solutions like “Moshi” to improve voice interaction quality.
-
Challenges of Current Voice AI Systems: Zeghidour extensively discussed the limitations of current voice AI systems, particularly focusing on latency and scalability. The cascaded systems involving separate models for speech-to-text, language processing, and text-to-speech introduce delays that impair the conversational flow, making interactions feel unnatural. He also emphasized the need for models capable of complex reasoning and context maintenance for true conversational intelligence.
-
Path Forward: End-to-End Models and On-Device Inference: Zeghidour proposed the future of voice AI includes developing end-to-end models that minimize latency by processing speech directly. Gradium AI’s “Phonon” model exemplified this approach, offering faster processing and personalization on less demanding hardware, thereby broadening applicability and addressing privacy concerns.
-
Applications and Natural User Experiences: To conclude, Zeghidour showcased practical demonstrations of how advanced voice AI can create more natural and engaging user experiences, such as travel agent chatbots that convincingly mimic human-like interaction, thus closing the gap towards the “Her” moment in voice AI.