In the rapidly evolving field of artificial intelligence, large language models (LLMs) have become increasingly prevalent, offering a wide range of applications such as language translation, content generation, and question-answering systems. However, one significant challenge that LLMs face is the accuracy and reliability of their outputs, particularly in long-text scenarios. To address this issue, Tsinghua University has developed LongCite, an open-source model designed to enhance the precision of LLMs and reduce the occurrence of hallucinations – the generation of information that is not supported by the original text.
LongCite: An Overview
LongCite is a project launched by Tsinghua University with the aim of improving the credibility and verifiability of LLMs in long-text question-answering tasks. By generating fine-grained sentence-level citations, LongCite enables users to verify the accuracy of the model’s answers. The core components of LongCite include LongBench-Cite evaluation benchmark, CoF automated data construction process, LongCite-45k dataset, and LongCite-8B and LongCite-9B models trained on this dataset.
Key Features of LongCite
1. Generation of Fine-Grained Citations
LongCite enables LLMs to generate precise sentence-level citations when answering long-text questions, allowing users to directly trace back to the specific information in the original text.
2. Increased Answer Faithfulness
LongCite helps ensure that the model’s answers are more faithful to the original text, reducing the occurrence of hallucinations.
3. Enhanced Verifiability
Users can verify the authenticity and accuracy of the model’s answers based on the fine-grained citations provided, thereby increasing the credibility of the model’s outputs.
4. Automated Data Construction
LongCite utilizes the CoF (Coarse to Fine) process to automatically generate high-quality long-text question-answering data with fine-grained citations, providing abundant labeled resources for model training.
5. Evaluation Benchmark
LongCite introduces the LongBench-Cite evaluation benchmark to measure the model’s ability to generate citations in long-text question-answering tasks, including correctness and citation quality.
Technical Principles of LongCite
1. Long-Text Processing Capability
LongCite supports large language models with ultra-long context windows, such as GLM-4-9B-1M and Gemini 1.5, enabling them to process and understand texts as long as several tens of thousands of words.
2. Fine-Grained Citation Generation
LongCite trains models to generate precise sentence-level citations, allowing each answer to be traced back to the specific sentence in the original text, thereby enhancing the verifiability of the answers.
3. Automated Data Construction Process (CoF)
Using the self-instruct method, LongCite automatically generates question-answer pairs from long texts. It retrieves sentence blocks related to the answers and generates block-level citations. Based on these block-level citations, it extracts specific sentences that support each statement and generates sentence-level citations.
4. Supervised Fine-Tuning (SFT)
The CoF process generates high-quality datasets with fine-grained citations to fine-tune large language models, improving their performance in long-text question-answering tasks.
LongCite’s Application Scenarios
LongCite has a wide range of application scenarios, including:
- Academic research
- Legal consultation
- Financial analysis
- Medical consultation
- News reporting
Conclusion
LongCite represents a significant advancement in the field of LLMs, addressing the challenges of accuracy and reliability in long-text scenarios. With its fine-grained citation generation, automated data construction process, and evaluation benchmark, LongCite is poised to enhance the performance of LLMs and enable more reliable and verifiable outputs in a variety of application domains.
Views: 0