IBIME, Instituto de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas (ITACA), Universitat Politècnica de València, València, Spain.
J Biomed Inform. 2011 Aug;44(4):677-87. doi: 10.1016/j.jbi.2011.02.009. Epub 2011 Mar 4.
In the last decade, machine learning (ML) techniques have been used for developing classifiers for automatic brain tumour diagnosis. However, the development of these ML models rely on a unique training set and learning stops once this set has been processed. Training these classifiers requires a representative amount of data, but the gathering, preprocess, and validation of samples is expensive and time-consuming. Therefore, for a classical, non-incremental approach to ML, it is necessary to wait long enough to collect all the required data. In contrast, an incremental learning approach may allow us to build an initial classifier with a smaller number of samples and update it incrementally when new data are collected. In this study, an incremental learning algorithm for Gaussian Discriminant Analysis (iGDA) based on the Graybill and Deal weighted combination of estimators is introduced. Each time a new set of data becomes available, a new estimation is carried out and a combination with a previous estimation is performed. iGDA does not require access to the previously used data and is able to include new classes that were not in the original analysis, thus allowing the customization of the models to the distribution of data at a particular clinical center. An evaluation using five benchmark databases has been used to evaluate the behaviour of the iGDA algorithm in terms of stability-plasticity, class inclusion and order effect. Finally, the iGDA algorithm has been applied to automatic brain tumour classification with magnetic resonance spectroscopy, and compared with two state-of-the-art incremental algorithms. The empirical results obtained show the ability of the algorithm to learn in an incremental fashion, improving the performance of the models when new information is available, and converging in the course of time. Furthermore, the algorithm shows a negligible instance and concept order effect, avoiding the bias that such effects could introduce.
在过去的十年中,机器学习 (ML) 技术已被用于开发用于自动脑肿瘤诊断的分类器。然而,这些 ML 模型的开发依赖于独特的训练集,并且一旦处理完这个数据集,学习就会停止。训练这些分类器需要相当数量的数据,但样本的收集、预处理和验证既昂贵又耗时。因此,对于经典的、非增量的 ML 方法,需要等待足够长的时间来收集所有需要的数据。相比之下,增量学习方法可以允许我们使用较少的样本构建初始分类器,并在收集到新数据时逐步更新。在这项研究中,介绍了一种基于 Graybill 和 Deal 加权估计量组合的增量高斯判别分析 (iGDA) 算法。每次有新的数据集可用时,都会进行新的估计,并对之前的估计进行组合。iGDA 不需要访问之前使用的数据,并且能够包含原始分析中没有的新类别,从而允许根据特定临床中心的数据分布定制模型。使用五个基准数据库进行了评估,以评估 iGDA 算法在稳定性-可塑性、类别包含和顺序效应方面的行为。最后,将 iGDA 算法应用于磁共振波谱的自动脑肿瘤分类,并与两种最先进的增量算法进行了比较。所得的经验结果表明,该算法具有以增量方式学习的能力,在有新信息可用时提高了模型的性能,并随着时间的推移而收敛。此外,该算法表现出可忽略不计的实例和概念顺序效应,避免了此类效应可能引入的偏差。