谷歌新AI服务 Gemini Live 助英语人士面试与演讲

谷歌在今天召开的Pixel 9系列手机发布会上，正式推出了其最新的AI语音聊天服务——Gemini Live。这项服务专为英语用户设计，将首先向Gemini Advanced订阅用户开放，并将在今天开始提供服务。Gemini Live旨在提供更为自然和流畅的对话体验，它采用了增强型语音引擎，能够进行更加连贯、有情感表达力且逼真的多轮对话。

Gemini Live对标OpenAI的ChatGPT Advanced Voice模式，并允许用户在聊天机器人说话时打断并提出问题，机器人将实时适应用户的说话模式。此外，用户可以选择从10种新的自然声音中选择聊天机器人的回应声音，并且在对话中可以按照自己的节奏说话。

谷歌还展示了Gemini Live的一个模拟场景，即用户与招聘经理的交谈，Gemini Live将提供演讲技巧推荐和优化建议。谷歌发言人表示，Gemini Live使用的是Gemini Advanced模型，该模型经过特别调整以增强对话性。当用户与Gemini Live进行长时间对话时，将使用该模型的大型上下文窗口。

值得注意的是，Gemini Live目前不支持多模态输入，这意味着它还不具备谷歌在I/O大会上展示的功能，即通过手机摄像头捕捉照片和录像来理解和响应用户周围的环境。谷歌表示，多模态输入功能将在今年晚些时候推出，但具体细节尚未透露。

随着Gemini Live的发布，谷歌继续在AI语音聊天领域深耕，旨在为用户提供更加智能和个性化的交流体验。这一新服务无疑将进一步推动AI技术在语音交互领域的创新和发展。

英语如下：

News Title: “Google’s New AI Service Gemini Live Aids English Speakers in Interviews and Speeches”

Keywords: Gemini Live, AI Voice Chat, Simulation of Interview Scenarios

News Content: At today’s Pixel 9 series smartphone launch event, Google officially unveiled its latest AI voice chat service – Gemini Live. Designed specifically for English users, Gemini Live will initially be available to subscribers of Gemini Advanced and will commence service today. The service aims to offer a more natural and fluid conversational experience, leveraging an enhanced voice engine capable of seamless, expressive, and lifelike multi-round conversations.

Gemini Live competes with OpenAI’s ChatGPT Advanced Voice mode, allowing users to interrupt the chatbot while it’s speaking and ask questions, with the bot responding in real-time to the user’s speaking pattern. Additionally, users can choose from 10 new natural voice options for the bot’s responses, and can speak at their own pace during conversations.

Google also showcased a simulated scenario with Gemini Live, where a user converses with a hiring manager, providing tips on speech techniques and optimization. A Google spokesperson stated that Gemini Live utilizes the Gemini Advanced model, which has been specially tuned to enhance conversational capabilities. When users engage in long conversations with Gemini Live, the large context window of the model will be used.

It is noteworthy that Gemini Live currently does not support multimodal input, meaning it does not yet have the functionality demonstrated at the I/O conference, which involves using a smartphone camera to capture photos and videos and understand the user’s surroundings. Google has indicated that multimodal input functionality will be introduced later this year, but specific details have not been disclosed.

With the release of Gemini Live, Google continues to deepen its involvement in the AI voice chat field, aiming to provide users with a more intelligent and personalized communication experience. This new service is likely to further drive innovation and development in AI technology in the voice interaction sector.

【来源】https://www.ithome.com/0/788/290.htm