近日,一款名为StreamingLLM的开源方案在人工智能领域引起了广泛关注。该方案在原有基础上进行了升级,实现了在不牺牲生成效果和推理速度的前提下,处理多轮对话共400万个token的能力,推理速度提升了22.2倍。据悉,该项目自上线不到3个月时间内,在GitHub上的标星数已达到5.7k。
StreamingLLM采用了原生PyTorch实现,对于多轮对话推理场景的低成本、低延迟、高吞吐等需求,仍有优化空间。为了进一步提升大模型推理性能,Colossal-AI团队推出了基于TensorRT的SwiftInfer,可以有效解决上述问题,将推理性能再提升46%。
SwiftInfer的推出,标志着StreamingLLM在推理性能上的再次突破。这对于人工智能领域的研究人员和开发者来说,无疑是一个利好消息。他们可以借助SwiftInfer,更高效地实现大模型的推理,推动人工智能技术的发展。
英文标题:Upgrade of Open Source Scheme StreamingLLM Boosts Reasoning Performance
Keywords: Open Source Scheme, StreamingLLM, Reasoning Performance
News content:
Recently, an open-source scheme named StreamingLLM has attracted widespread attention in the field of artificial intelligence. The scheme has been upgraded to achieve the ability to handle multi-turn dialogue with a total of 4 million tokens without sacrificing generation effects and reasoning speed, and the reasoning speed has been increased by 22.2 times. It is reported that since the launch less than 3 months ago, the project has received 5.7k stars on GitHub.
StreamingLLM is implemented using native PyTorch, and there is still room for optimization in terms of low cost, low latency, and high throughput requirements for multi-turn dialogue reasoning scenarios. To further improve the reasoning performance of large models, the Colossal-AI team has introduced SwiftInfer, based on TensorRT, which can effectively solve the above problems and increase reasoning performance by 46%.
The launch of SwiftInfer marks another breakthrough in the reasoning performance of StreamingLLM. This is undoubtedly good news for researchers and developers in the field of artificial intelligence. They can use SwiftInfer to achieve more efficient reasoning of large models and promote the development of artificial intelligence technology.
【来源】https://mp.weixin.qq.com/s/fiYSESKcOgZIDe8dpLdAdQ
Views: 1