University of Tokyo, Kashiwa, Chiba 277-8561, Japan
RIKEN Center for Advanced Intelligence Project, Chuo-ku, Tokyo 103-0027, Japan
Neural Comput. 2021 Jan;33(1):244-268. doi: 10.1162/neco_a_01337. Epub 2020 Oct 20.
Recent advances in weakly supervised classification allow us to train a classifier from only positive and unlabeled (PU) data. However, existing PU classification methods typically require an accurate estimate of the class-prior probability, a critical bottleneck particularly for high-dimensional data. This problem has been commonly addressed by applying principal component analysis in advance, but such unsupervised dimension reduction can collapse the underlying class structure. In this letter, we propose a novel representation learning method from PU data based on the information-maximization principle. Our method does not require class-prior estimation and thus can be used as a preprocessing method for PU classification. Through experiments, we demonstrate that our method, combined with deep neural networks, highly improves the accuracy of PU class-prior estimation, leading to state-of-the-art PU classification performance.
最近在弱监督分类方面的进展使得我们可以仅从正例和未标记数据(PU)中训练分类器。然而,现有的 PU 分类方法通常需要对类先验概率进行准确估计,这对于高维数据来说是一个关键的瓶颈。这个问题通常通过事先应用主成分分析来解决,但是这种无监督降维可能会破坏潜在的类别结构。在这封信中,我们提出了一种基于信息最大化原理的从 PU 数据中进行表示学习的新方法。我们的方法不需要类先验估计,因此可以作为 PU 分类的预处理方法。通过实验,我们证明我们的方法与深度神经网络相结合,可以极大地提高 PU 类先验估计的准确性,从而实现最先进的 PU 分类性能。