A Decade's Battle on Dataset Bias: Are We There Yet?

TOP 文献データベース A Decade's Battle on Dataset Bias: Are We There Yet?

arxiv

AIセキュリティポータルbot

文献データベースの情報は、自動的に収集されています。

Source

https://arxiv.org/abs/2403.08632

PDF

https://arxiv.org/pdf/2403.08632

文献情報

作者: Zhuang Liu,Kaiming He
公開日: 2024-3-14
更新日: 2025-3-3
所属機関: Meta AI Research, FAIR
所属の国: United States of America
会議名

AIにより推定されたラベル

学習データのバイアス排除深層学習データキュレーション

※ こちらのラベルはAIによって自動的に追加されました。そのため、正確でないことがあります。
詳細は文献データベースについてをご覧ください。

Abstract

We revisit the "dataset classification" experiment suggested by Torralba & Efros (2011) a decade ago, in the new era with large-scale, diverse, and hopefully less biased datasets as well as more capable neural network architectures. Surprisingly, we observe that modern neural networks can achieve excellent accuracy in classifying which dataset an image is from: e.g., we report 84.7% accuracy on held-out validation data for the three-way classification problem consisting of the YFCC, CC, and DataComp datasets. Our further experiments show that such a dataset classifier could learn semantic features that are generalizable and transferable, which cannot be explained by memorization. We hope our discovery will inspire the community to rethink issues involving dataset bias.

外部データセット

YFCC

DataComp

WIT

LAION

ImageNet