Nicora Giovanna, Bellazzi Riccardo
Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.
AMIA Annu Symp Proc. 2021 Jan 25;2020:925-932. eCollection 2020.
Machine Learning research applied to the medical field is increasing. However, few of the proposed approaches are actually deployed in clinical settings. One reason is that current methods may not be able to generalize on new unseen instances which differ from the training population, thus providing unreliable classifications. Approaches to measure classification reliability could be useful to assess whether to trust prediction on new cases. Here, we propose a new reliability measure based on the similarity of a new instance to the training set. In particular, we evaluate whether this example would be selected as informative by an instance selection method, in comparison with the available training set. We show that this method distinguishes reliable examples, for which we can trust the classifier's prediction, from unreliable ones, both on simulated data and in a real-case scenario, to distinguish tumor and normal cells in Acute Myeloid Leukemia patients.
应用于医学领域的机器学习研究正在不断增加。然而,很少有提出的方法实际应用于临床环境。一个原因是当前的方法可能无法对与训练群体不同的新的未见实例进行泛化,从而提供不可靠的分类。衡量分类可靠性的方法可能有助于评估是否要信任对新病例的预测。在此,我们基于新实例与训练集的相似度提出一种新的可靠性度量。具体而言,与可用训练集相比,我们评估此示例是否会被实例选择方法选为信息丰富的示例。我们表明,在模拟数据和实际案例场景中,该方法都能区分可靠的示例(对于这些示例我们可以信任分类器的预测)和不可靠的示例,以区分急性髓系白血病患者的肿瘤细胞和正常细胞。