Customize Consent Preferences

We use cookies to help you navigate efficiently and perform certain functions. You will find detailed information about all cookies under each consent category below.

The cookies that are categorized as "Necessary" are stored on your browser as they are essential for enabling the basic functionalities of the site. ... 

Always Active

Necessary cookies are required to enable the basic features of this site, such as providing secure log-in or adjusting your consent preferences. These cookies do not store any personally identifiable data.

No cookies to display.

Functional cookies help perform certain functionalities like sharing the content of the website on social media platforms, collecting feedback, and other third-party features.

No cookies to display.

Analytical cookies are used to understand how visitors interact with the website. These cookies help provide information on metrics such as the number of visitors, bounce rate, traffic source, etc.

No cookies to display.

Performance cookies are used to understand and analyze the key performance indexes of the website which helps in delivering a better user experience for the visitors.

No cookies to display.

Advertisement cookies are used to provide visitors with customized advertisements based on the pages you visited previously and to analyze the effectiveness of the ad campaigns.

No cookies to display.

上海枫泾古镇正门_20240824上海枫泾古镇正门_20240824
0

Munich, Germany – In a significant development for the field of large language models (LLMs), the original creators of xLSTM have unveiled a revamped architecture boasting a remarkable 50% speed increase over the popular Mamba model. The enhanced xLSTM, now scalable to 7 billion parameters, comes with fully open-sourced weights and code, promising to accelerate research and development in the rapidly evolving AI landscape.

The breakthrough addresses a critical challenge in LLM deployment: inference speed. While Transformer-based models have dominated the field, their computational demands surge exponentially with increasing input sequence length. This bottleneck has fueled renewed interest in alternative architectures, particularly recurrent neural networks like LSTMs, which offer computational scaling that is linear with sequence length.

Sepp Hochreiter, a pioneer in LSTM research, introduced xLSTM last year as a potential successor to Transformers. xLSTM aimed to extend LSTM capabilities to billions of parameters, providing a stable memory footprint and linear computational scaling. However, limitations in scaling and a lack of comprehensive performance evaluations prompted further investigation.

Now, Hochreiter and his team from NXAI and JKU have addressed these concerns with a significantly optimized xLSTM architecture. The new xLSTM 7B model was trained on the DCLM dataset using 128 H100 GPUs, processing 2.3 trillion tokens with a context length of 8192.

Our goal was to create an architecture that not only performs well but is also efficient and stable to train, explained a lead researcher on the project. The improvements we’ve made to the original xLSTM ensure both training efficiency and stability, while maintaining state-of-the-art task performance.

The core innovation lies in the mLSTM unit, a refined building block that enables the model to scale effectively. The team’s work demonstrates that xLSTM can now compete with and even outperform other leading architectures in terms of speed and efficiency.

The release of the open-source weights and code is expected to have a significant impact on the AI community. Researchers and developers can now leverage the optimized xLSTM architecture to build faster and more efficient LLMs for a wide range of applications.

The implications of this development are far-reaching:

  • Faster Inference: The 50% speed improvement over Mamba translates to faster response times for applications powered by LLMs, enhancing user experience and enabling real-time applications.
  • Reduced Computational Costs: The efficient architecture reduces the computational resources required for inference, making LLMs more accessible and cost-effective to deploy.
  • Innovation Catalyst: The open-source release empowers researchers to explore and build upon the xLSTM architecture, potentially leading to further breakthroughs in LLM technology.

The resurgence of LSTM-based architectures, exemplified by the enhanced xLSTM, signals a potential shift in the LLM landscape. As the demand for faster, more efficient models continues to grow, innovations like this are crucial for unlocking the full potential of AI. The future of LLMs may very well lie in the clever fusion of established principles with cutting-edge techniques, paving the way for a new generation of AI-powered applications.

References:

  • (Link to the original research paper or blog post announcing the xLSTM update – To be added when available)
  • (Link to the open-source repository containing the weights and code – To be added when available)


>>> Read more <<<

Views: 0

0

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注