Grok-2, the AI chatbot developed by xAI, has undergone a significant speed boost, delighting its users with faster response times. The enhanced performance is a direct result of a marathon coding session by xAI’s development team, consisting of Igor Babuschkin, Lianmin Zheng, and Saeed Maleki, who spent three consecutive days rewriting the chatbot’s reasoning stack using SGLang. The outcome? Grok-2 mini, a lightweight variant of the chatbot, now operates at twice its original speed.

xAI recently introduced Grok-2 to the market, offering the service on platform X for a monthly fee of $8. Users who have been using Grok-2 have indeed noticed the improvement, and their observations are not mere illusions. Both Grok-2 and its streamlined counterpart, Grok-2 mini, have shown marked increases in their abilities to analyze information and generate replies.

In an update to the Lmsys Chatbot Arena, an independent third-party platform that benchmarks AI model performance, Grok-2’s main model scored an impressive 1293 points out of 6686 votes. This achievement has propelled Grok-2 to the second spot in the global rankings, tying with Google’s Gemini-1.5 Pro and trailing only OpenAI’s latest ChatGPT-4o. Notably, Grok-2 surpassed GPT-4o, which was released in May 2024.

Grok-2 mini also benefited from the optimization, climbing to the fifth position with a score of 1268 in the Arena rankings, behind GPT-4o mini and Claude 3.5 Sonnet. The team’s dedication was acknowledged by none other than the company’s boss, Elon Musk, who sent a congratulatory message.

According to Babuschkin’s response on platform X, the primary advantage of using Grok-2 mini over the full Grok-2 model lies in its enhanced speed. Babuschkin further assured users that xAI plans to increase Grok-2-mini’s processing speed, making it an even more appealing option for those seeking high performance with low computational overheads. He also hinted at upcoming API improvements.

The key to this remarkable acceleration lies in SGLang, an open-source (Apache 2.0 licensed) system designed to execute complex language model programs efficiently. Developed by researchers from the University of California, Berkeley, the University of California, San Diego, and Carnegie Mellon University, SGLang enhances interaction with large language models (LLMs) by unifying the backend runtime system and frontend language, thereby improving speed and control.

SGLang currently supports models like Llama, Mistral, and LLaVA, and is compatible with API-based open models, including OpenAI’s GPT-4. Its ability to optimize execution through automatic caching and parallel processing makes it a powerful tool for developers working with large-scale language models. The recent release of SGLang Runtime v0.2, a universal service engine for LLMs and VLMs, has shown superior throughput and latency performance compared to vLLM and TensorRT-LLM, particularly in scenarios involving Llama series models.

In conclusion, the dedication and innovative work of xAI’s development team have led to a significant leap forward for Grok-2 and Grok-2 mini. By harnessing the power of SGLang, these chatbots are not only providing faster service but also demonstrating the potential for further advancements in AI technology, pushing the boundaries of speed and efficiency in the realm of conversational AI.

【source】https://www.jiqizhixin.com/articles/2024-08-26-7

Views: 1

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注