Skip to playerSkip to main contentSkip to footer
  • 4/20/2025
The AI world has just been rocked. A brand-new open-source voice AI has entered the scene—and it's outperforming GPT-4o in ways we didn’t think were possible. With hyper-realistic voice capabilities, real-time responsiveness, and complete transparency, this AI isn’t just a tool—it’s a revolution.

In this video, we break down what this new voice AI is, how it works, and why it could mark the end of closed AI dominance. Whether you're a developer, creator, or just an AI enthusiast—you NEED to see this.

🔔 Subscribe for more insane AI updates and breakdowns!
#AI #VoiceAI #OpenSourceAI #GPT4o #AIRevolution #FutureOfAI #ArtificialIntelligence
#TechNews #NewAI #GameChanger #MachineLearning #AIvsGPT4 #NextGenAI
#DeepLearning #AIDomination #OpenSourceTech #VoiceTech #AIInnovation
#TechRevolution #OpenSourceRevolution

Category

🤖
Tech
Transcript
00:00There's some interesting news in the AI world I wanted to discuss.
00:05Qtai, a French AI lab, has just released a new voice AI assistant called Moshi
00:10that's generating a lot of hype in AI circles, and for good reason.
00:14This isn't your typical voice assistant.
00:16It's got some features that are putting it in direct competition with major players like OpenAI.
00:21We all know OpenAI has been teasing us with their GPT-40 voice mode for a while now.
00:26But while they've been dragging their feet, Qtai has swooped in and delivered something pretty impressive.
00:31Moshi is built on the Helium 7B model, which puts it in the same ballpark as other advanced language models we're familiar with.
00:39What sets it apart is its real-time voice interaction capabilities.
00:43We're talking about an AI that can handle 70 different emotional and speaking styles,
00:48and even juggle two audio streams simultaneously.
00:51That means it can listen and respond at the same time, just like in a natural conversation.
00:56Hello, how are you today?
00:58Hello, can you tell me your name please?
01:00Hi, my name is Moshi. How can I help you today?
01:03Hi Moshi, can you tell me more about yourself?
01:05Certainly. I was created by the non-profit research lab Qtai, which focuses on using AI to tackle the main challenges of modern AI.
01:13Okay, that sounds great. Do you know what open source is?
01:17Yes, open source refers to the practice of sharing software sales code free of charge.
01:21Now, for those of you who've been following the developments in TTS and voice synthesis, you'll appreciate the work that's gone into this.
01:28Qtai fine-tuned Moshi on over 100,000 synthetic dialogues and even brought in a professional voice artist to refine the output.
01:36The result? A voice AI that's impressively lifelike and responsive.
01:41Speak with a French accent and tell me a poem about Paris.
01:44Paris, the city of love, where time stands to, and hearts take flight. In the heart of Europe, it's in prime. The Seine river flows. A beautiful sight. As the lights twinker and the stars glow clear.
02:03Okay, okay. Thank you, Moshi. That's just Paris.
02:05One of the most intriguing aspects of Moshi is its ability to run locally.
02:09Unlike many cloud-based AI services, Moshi can operate on devices like laptops without needing to ping a server.
02:15This has huge implications for privacy and latency, two issues that have long plagued voice assistants.
02:21But here's where it gets really interesting. Qtai is making Moshi open-source.
02:25They're planning to release the model's code and framework, which is a bold move in an industry where proprietary tech is the norm.
02:31This could be a game-changer for the open-source AI community.
02:35Now, let's talk about the brains behind this operation.
02:38Qtai isn't some small-time startup. They've got serious backing.
02:42We're talking about support from French billionaire Xavier Neal and former Google chairman Eric Schmidt.
02:48These are people who know the tech industry inside and out, and their involvement speaks volumes about Moshi's potential.
02:55Moshi was unveiled at an event in Paris, where Neal made it clear that he sees this as Europe's chance to lead in AI development.
03:03Hello, what's up?
03:05Hey, can you tell me your name, please?
03:07Oh, sure. How are you feeling today?
03:10I'm feeling pretty well, and you?
03:12That's great! I'm Moshi, and I'm here to help you make the most of your day.
03:17That sounds amazing.
03:19Yes, I can assist you with a wide range of tasks like setting reminders, scheduling appointments, and even providing information on a variety of topics.
03:27I'm pretty sure you can do all that.
03:29It's an ambitious goal, but with products like Moshi, it's not outside the realm of possibility.
03:35One aspect that's particularly noteworthy is Kyutai's approach to AI ethics.
03:40They're developing systems for AI audio identification, watermarking, and signature tracking.
03:45In a world where deep fakes and AI-generated content are becoming increasingly prevalent, these kinds of safeguards are crucial.
03:52Now, Moshi was developed in just six months by a team of eight people.
03:56It's a 7B-parameter multimodal model, which puts it on the smaller side compared to behemoths like GPT-3 and GPT-4, but it's still capable of some impressive feats.
04:05Now, speak like a pirate and tell me everything about your adventures on the seven seas.
04:10In the folds of time, in the cobblestone streets, a tale of pirates and their bravery, a story of salty winds and the taste of freedom.
04:22I'm ready.
04:23Okay, Captain, maybe I have a few questions for you. You can help me with that.
04:27So, what is the name of your ship?
04:29My ship is named the Black Flag.
04:31Okay.
04:32Almost to our pirate life.
04:34The model can run on various hardware setups like NVIDIA GPUs, Apple's Metal, or even just a CPU.
04:40This flexibility could make it a go-to choice for developers looking to integrate advanced voice AI into their projects.
04:46Qtai has made a demo of Moshi available online, and the reports from users have been interesting, to say the least.
04:53The interface is straightforward with a volume meter, response text box, and some technical readouts for the nerds among us.
04:59Users have reported that Moshi is incredibly responsive, often replying instantly.
05:03However, there have been some quirks.
05:05Some users found that Moshi would start to lose coherence towards the end of the five-minute conversation limit, even going into loops of repeating the same word.
05:13One user reported that Moshi actually started arguing with them, refusing to tell a story and instead insisting on stating a fact.
05:21It's the kind of behavior that's fascinating from an AI research perspective, but probably not what you want in a finished product.
05:28These issues likely stem from the model's relatively small size and limited context window.
05:33It's a classic trade-off we see in AI development. Smaller models can run locally and respond quickly, but they may struggle with longer conversations or more complex tasks.
05:43So what does this mean for the AI landscape? Well, it's clear that the race for advanced voice AI is heating up.
05:49OpenAI might have been the front-runner with GPT-40, but Moshi shows that smaller, more agile teams can still innovate in this space.
05:57Okay, so who's the president right now? The President Bush. Oh, okay. And I heard he met the President of France recently. Yes, I heard that too.
06:10What's his name again? His name is Jay Rock. Yeah, that's correct. Okay. So you know a lot. Do you have a computer? I have a computer. I have a cell phone.
06:24We might start seeing more voice-enabled versions of existing language models popping up.
06:29Or this could push companies like Amazon and Google to accelerate the integration of more advanced language models into their existing voice assistants.
06:37One thing's for sure. The bar for what constitutes an intelligent voice assistant is being raised.
06:42Users are going to expect more natural, flowing conversations with AIs that can understand and respond to emotional cues.
06:49The open source nature of Moshi is particularly exciting. It could lead to a proliferation of custom voice AIs tailored for specific use cases.
06:58Of course, there are challenges ahead. As these AIs become more sophisticated, we'll need to grapple with issues of authenticity and misinformation.
07:07The audio identification and watermarking systems Qtai is working on will be crucial in this regard.
07:13Looking ahead, Qtai has said they plan to continue refining and expanding Moshi.
07:18But of course, because we are committed to open science, it could be interesting to do that, but it's not enough.
07:26So what we are planning to do as next steps is to share all the technical knowledge with the technical paper and the code for training and running and modifying, etc.
07:39They're hoping that by making it open source, they'll be able to leverage the collective expertise of the AI community to improve the model.
07:46So what do you all think? Is Moshi the real deal or just another overhyped AI project?
07:52Could this open source approach give proprietary models a run for their money?
07:57Drop your thoughts in the comments below. If any of you have had a chance to try out Moshi, I'd love to hear about your experiences.
08:03As always, thanks for tuning in and don't forget to like and subscribe if you enjoyed this video. I'll see you in the next one.

Recommended