OpenAI Unveils GPT-4o with Advanced Chinese Voice Surpassing Google’s Gemini-1.5-Pro-002

After months of anticipation, OpenAI has finally rolled out the highly anticipated advanced voice mode for its GPT-4o, a move that has left Google in the dust. This new feature, which includes custom commands, memory, five new voices, and improved accents, marks a significant leap forward in the capabilities of AI-generated speech.

The advanced voice mode is now available to all Plus and Team users, though free users will have to wait. Plus users pay $20 per month, while Team users pay $30 and enjoy additional usage. OpenAI plans to gradually roll out access to more users, with all Plus users expected to gain access by the end of autumn.

In a recent announcement, the company stated, Advanced voice mode is live today! (Will be completed this week), hope your wait was worth it. This was followed by a sly,委屈的小桃心 (委屈 means to feel wronged or deceived and 小桃心 means little heart in Chinese, implying a feeling of being wronged) emoji, adding a touch of humor to the announcement.

The new voice mode includes a range of improvements, such as the ability to use the phrase I’m sorry I’m late in over 50 languages. In a demo video, GPT-4o demonstrates its fluency in Chinese, apologizing to its user with the words, Grandma, I’m sorry I’m late, I didn’t mean to keep you waiting for so long… This phrase not only showcases the model’s linguistic capabilities but also adds a personal touch, making the interaction feel more human-like.

In addition to the advanced voice mode, OpenAI has also released a new multilingual dataset called MMMLU (Multilingual Massive Multitask Language Understanding), which includes test data in 14 languages, covering a wide range of topics from basic knowledge to advanced professional disciplines. This dataset will be crucial for further refining the model’s understanding and application across different languages and domains.

For Plus and Team users, the advanced voice mode comes with several new features, including custom commands and memory. Users can now create custom commands and have the model remember past conversations, enhancing the interactive experience. In a demonstration, OpenAI’s project manager Charlotte used the model by providing her personal information, such as her name and address, and then asked about fun outdoor activities for the weekend. The model responded with a range of suggestions tailored to Charlotte’s preferences.

OpenAI has also introduced five new voices, named Arbor, Maple, Sol, Spruce, and Vale, which will be available in both standard and advanced voice modes. These voices are crafted using professional voice actors from around the world, each with a warm and engaging tone. The company emphasizes that these voices are designed to be accessible and relatable, making the interaction with the model more natural and enjoyable.

However, the new voice mode is not yet available in the European Union, the UK, Switzerland, Iceland, Norway, or Liechtenstein. OpenAI is still working on expanding its reach globally.

The launch of the advanced voice mode has generated significant interest and excitement within the tech community. Greg Brockman, OpenAI’s COO, even took to Twitter to promote the feature, noting, The launch of advanced voice mode makes you realize how unnatural typing on a computer can be.

In summary, OpenAI’s advanced voice mode for GPT-4o represents a significant milestone in AI technology, offering users a more human-like and engaging interaction experience. As the company continues to expand access to this feature, it is likely to further solidify its position as a leader in the AI field.

>>> Read more <<<