AI芯片初创公司Groq,由谷歌TPU团队原班人马打造,近日推出了一款基于自研芯片的推理加速方案。据该公司介绍,该方案的生成速度已经接近每秒500 tokens,推理速度相较于英伟达GPU提高了10倍,成本却降低到十分之一。这使得任何一个大模型都可以轻松部署实现。
Groq的这一突破性成果,为AI领域带来了巨大的想象空间。据悉,目前已经能支持Mixtral 8x7B SMoE、Llama 2的7B和70B这三种模型,并且公司还提供了直接体验Demo的机会。这一举措无疑将进一步推动AI技术的发展和应用。
来源:量子位
英语如下:
**News Title:** “Groq Sets New AI Chip Record: 500 Tokens Per Second, Cost Reduced by Tenfold”
**Keywords:** Groq launches new inference chip, capable of generating 500 tokens per second, fast and cost-effective. Groq chip, inference acceleration, high cost-performance ratio.
**News Content:** ### Groq Unveils Fastest Inference Chip for Large Models: Generates 500 Tokens Per Second
AI chip startup Groq, founded by the original team behind Google’s TPU, has recently introduced an inference acceleration solution based on its self-developed chip. The company reports that this solution achieves a generation speed of nearly 500 tokens per second, with an inference speed that is 10 times faster than NVIDIA GPUs, at one-tenth the cost. This makes deployment of any large model easy.
This breakthrough by Groq has brought immense possibilities to the AI field. It is said that the solution now supports three models: Mixtral 8x7B SMoE, Llama 2’s 7B and 70B, and the company has also provided an opportunity to directly experience the Demo. This initiative will undoubtedly further promote the development and application of AI technology.
**Source:** Quantum Bit
【来源】https://mp.weixin.qq.com/s/tMDJP234MksYeUu_RUPzBA
Views: 1