清华大学创新长尾视觉识别技术，ProCo方法被TPAMI录用

在人工智能领域，对比学习作为一种自监督学习方法，已被证明在视觉特征表示的学习上具有显著的有效性。对比学习通过对比模型对数据的正样本与负样本的相似度，帮助模型学习到更有区分性的特征表示。然而，对比学习在长尾视觉识别任务中面临挑战，尤其是在长尾数据集上，类别分布极不均衡，导致大部分对比对由头部类别的样本构成，难以覆盖到尾部类别的样本，从而限制了对比学习的性能提升。

近期，清华大学自动化系的杜超群博士生及其导师黄高副教授提出了一种名为ProCo的新方法，旨在解决长尾对比学习中的问题。ProCo通过构建一种概率模型，对每个类别的数据分布进行建模和参数估计，从而能够从理论上推导出对比损失的期望解析解。这一方法的创新之处在于，它能够通过从理论上严格推导的解析解作为优化目标，实现无限数量的对比对的对比学习，而无需依赖于有限的batch大小或内存银行大小，从而有效解决了长尾数据集上对比学习性能受限的问题。

ProCo方法的核心在于，它对每类数据的分布进行建模，然后通过参数估计来生成对比对，以确保能够覆盖到所有的类别，尤其是那些样本数量较少的尾部类别。这不仅提高了对比学习在长尾视觉识别任务中的效果，还扩展了对比学习的应用范围，包括长尾半监督学习、长尾目标检测和平衡数据集等场景。

这一研究成果已被顶级学术期刊TPAMI（IEEE Transactions on Pattern Analysis and Machine Intelligence）接受，并且相关的代码已经开源。ProCo方法的提出不仅推动了长尾视觉识别领域的技术进步，也展示了对比学习在解决不平衡数据集问题上的潜力，对于未来的人工智能和机器学习研究具有重要意义。

### 关于ProCo与AIxiv专栏

机器之心的AIxiv专栏是一个专注于发布学术和技术内容的平台，已报道了超过2000篇全球顶级高校与企业实验室的论文。AIxiv专栏通过促进学术交流与传播，为人工智能领域的研究者提供了共享成果的渠道。对于希望分享其工作成果的研究者，AIxiv专栏欢迎投稿或联系报道。通过邮箱liyazhou@jiqizhixin.com或zhaoyunfeng@jiqizhixin.com，您就可以将您的学术成果与全球的同行分享。

### 研究背景与动机

对比学习的成功，尤其是在SimCLR和MoCo等方法的广泛应用，已经证明了其在视觉特征表示学习中的重要性。然而，对比学习的核心挑战在于如何有效地生成足够数量的对比对，以确保模型能够从负样本中学习到更有区分性的特征表示。特别是对于长尾数据集，这种不均衡的类别分布使得生成足够的对比对变得尤为困难，从而限制了对比学习的性能。

ProCo方法的提出正是基于解决这一挑战的动机。通过构建概率模型，ProCo能够有效地对数据分布进行建模和参数估计，从而生成无限数量的对比对，实现对比学习在长尾视觉识别任务中的优化。这一方法不仅提高了对比学习的效率和效果，也为未来的研究提供了新的视角和工具。

英语如下：

### TPAMI 2024 | ProCo: A Breakthrough in Long-tail Contrastive Learning

In the realm of artificial intelligence, contrastive learning, as a form of self-supervised learning, has proven to be highly effective in the learning of visual feature representations. This method works by comparing the similarity between a model’s positive and negative samples, thus assisting the model in learning more discriminative feature representations. However, contrastive learning faces challenges in long-tail visual recognition tasks, particularly on datasets with highly imbalanced class distributions, leading to most of the contrast pairs being composed of samples from the头部 categories, thereby failing to cover samples from the tail categories. This limitation restricts the performance enhancement of contrastive learning.

Recently, Dr. Chaoguan Du, a Ph.D. candidate at the Department of Automation, Tsinghua University, and his supervisor, Associate Professor Guang Huang, have proposed a new method called ProCo to address the issues in long-tail contrastive learning. ProCo constructs a probabilistic model to model and estimate the distribution of data for each category, thereby enabling the derivation of an analytical solution for the expected contrast loss theoretically. The innovation of this method lies in its ability to utilize an analytical solution as an optimization target for contrast learning, which can handle an infinite number of contrast pairs without relying on limited batch sizes or memory banks, thus effectively addressing the performance limitations of contrastive learning on long-tail datasets.

The core of the ProCo method involves modeling the distribution of each class’s data, followed by parameter estimation to generate contrast pairs, ensuring coverage of all categories, particularly those with fewer samples. This not only improves the performance of contrastive learning in long-tail visual recognition tasks but also expands its application scope, including long-tail semi-supervised learning, long-tail object detection, and balanced datasets.

This research has been accepted by the top academic journal TPAMI (IEEE Transactions on Pattern Analysis and Machine Intelligence), and the related code has been made open-source. The introduction of the ProCo method not only propels advancements in the field of long-tail visual recognition but also showcases the potential of contrastive learning in addressing issues with imbalanced datasets, holding significant importance for future research in artificial intelligence and machine learning.

### About ProCo and AIxiv Column

AIxiv, a column of Machine Intelligence, is a platform dedicated to publishing academic and technical content. It has reported on over 2000 papers from leading universities and industry laboratories worldwide. AIxiv facilitates academic exchanges and dissemination by providing a channel for researchers to share their findings. Researchers interested in sharing their work can contact AIxiv through the email addresses liyazhou@jiqizhixin.com or zhaoyunfeng@jiqizhixin.com.

### Research Background and Motivation

The success of contrastive learning, particularly with the widespread use of methods like SimCLR and MoCo, has underscored its importance in the learning of visual feature representations. However, a core challenge in contrastive learning is effectively generating a sufficient number of contrast pairs to ensure that the model can learn more discriminative features from negative samples. This is especially challenging for long-tail datasets, where the uneven class distribution makes generating enough contrast pairs particularly difficult, thereby limiting the performance of contrastive learning.

The motivation behind the ProCo method is to address this challenge. By constructing a probabilistic model, ProCo enables the effective modeling and estimation of data distribution, generating an infinite number of contrast pairs for optimized contrast learning. This not only improves the efficiency and effectiveness of contrastive learning but also provides new perspectives and tools for future research.

【来源】https://www.jiqizhixin.com/articles/2024-07-25-2