Fop Michael, Mattei Pierre-Alexandre, Bouveyron Charles, Murphy Thomas Brendan
School of Mathematics & Statistics, University College Dublin, Dublin, Ireland.
Université Côte d'Azur, Inria, CNRS, Laboratoire J.A. Dieudonné, Maasai team, Nice, France.
Adv Data Anal Classif. 2022;16(1):55-92. doi: 10.1007/s11634-021-00474-3. Epub 2022 Mar 1.
In supervised classification problems, the test set may contain data points belonging to classes not observed in the learning phase. Moreover, the same units in the test data may be measured on a set of additional variables recorded at a subsequent stage with respect to when the learning sample was collected. In this situation, the classifier built in the learning phase needs to adapt to handle potential unknown classes and the extra dimensions. We introduce a model-based discriminant approach, Dimension-Adaptive Mixture Discriminant Analysis (D-AMDA), which can detect unobserved classes and adapt to the increasing dimensionality. Model estimation is carried out via a full inductive approach based on an EM algorithm. The method is then embedded in a more general framework for adaptive variable selection and classification suitable for data of large dimensions. A simulation study and an artificial experiment related to classification of adulterated honey samples are used to validate the ability of the proposed framework to deal with complex situations.
在监督分类问题中,测试集可能包含属于学习阶段未观察到的类别的数据点。此外,测试数据中的相同单元可能是在相对于学习样本收集时间的后续阶段记录的一组附加变量上进行测量的。在这种情况下,在学习阶段构建的分类器需要进行调整,以处理潜在的未知类别和额外的维度。我们引入了一种基于模型的判别方法,即维度自适应混合判别分析(D-AMDA),它可以检测未观察到的类别并适应不断增加的维度。模型估计是通过基于期望最大化(EM)算法的全归纳方法进行的。然后,该方法被嵌入到一个更通用的框架中,用于适合大维度数据的自适应变量选择和分类。一项模拟研究以及与掺假蜂蜜样本分类相关的人工实验被用于验证所提出框架处理复杂情况的能力。