4 个月前

基于样本有效数量的类别平衡损失函数

Yin Cui; Menglin Jia; Tsung-Yi Lin; Yang Song; Serge Belongie

摘要

随着大规模、真实世界数据集的迅速增加，解决长尾数据分布问题（即少数类占据大多数数据，而多数类则代表性不足）变得至关重要。现有的解决方案通常采用基于每种类别观察数的重采样和重新加权等类别再平衡策略。在本研究中，我们认为随着样本数量的增加，新增数据点带来的额外收益将会逐渐减少。我们引入了一种新的理论框架，通过为每个样本关联一个小邻域区域而非单一数据点来测量数据重叠。有效样本数量被定义为样本的体积，并可以通过一个简单的公式计算得出：$(1-β^{n})/(1-β)$，其中$n$表示样本数量，$β\in [0,1)$是一个超参数。我们设计了一种重新加权方案，利用每个类别的有效样本数量来调整损失函数，从而实现类别平衡的损失。我们在人工诱导的长尾CIFAR数据集以及包括ImageNet和iNaturalist在内的大规模数据集上进行了全面实验。实验结果表明，当使用所提出的类别平衡损失进行训练时，网络能够在长尾数据集上取得显著的性能提升。

代码仓库

feidfoe/AdjustBnd4Imbalance

pytorch

GitHub 中提及

MindCode-4/code-11/tree/main/Class-balanced-loss-pytorch-master

mindspore

bazinga699/ncl

pytorch

GitHub 中提及

frgfm/Holocron

pytorch

GitHub 中提及

tiagoCuervo/JapaNet

GitHub 中提及

richardaecn/class-balanced-loss

官方

GitHub 中提及

vandit15/Class-balanced-loss-pytorch

pytorch

GitHub 中提及

MindSpore-scientific/code-3/tree/main/Class-balanced-loss-pytorch-master

mindspore

MindCode-4/code-6/tree/main/Class-balanced-loss-pytorch-master

mindspore

statsu1990/yoto_class_balanced_loss

pytorch

GitHub 中提及

lijian16/fcc

pytorch

GitHub 中提及

基准测试

基准	方法	指标
image-classification-on-inaturalist-2018	ResNet-152	Top-1 Accuracy: 69.05%
image-classification-on-inaturalist-2018	ResNet-101	Top-1 Accuracy: 67.98%
image-classification-on-inaturalist-2018	ResNet-50	Top-1 Accuracy: 64.16%
long-tail-learning-on-cifar-10-lt-r-10	Class-balanced Focal Loss	Error Rate: 12.90
long-tail-learning-on-cifar-10-lt-r-10	Class-balanced Reweighting	Error Rate: 13.46
long-tail-learning-on-cifar-100-lt-r-100	Cross-Entropy (CE)	Error Rate: 61.68
long-tail-learning-on-coco-mlt	CB Loss(ResNet-50)	Average mAP: 49.06
long-tail-learning-on-egtea	CB Loss	Average Precision: 63.39 Average Recall: 63.26
long-tail-learning-on-voc-mlt	CB Focal(ResNet-50)	Average mAP: 75.24

用 AI 构建 AI

从想法到上线——通过免费 AI 协同编程、开箱即用的环境和市场最优价格的 GPU 加速您的 AI 开发

AI 协同编程

即用型 GPU

最优价格

立即开始

Hyper Newsletters

订阅我们的最新资讯

我们会在北京时间 每周一的上午九点 向您的邮箱投递本周内的最新更新

邮件发送服务由 MailChimp 提供