Department of Signal Processing, Tampere University of Technology, Tampere, Finland.
PLoS One. 2013 Aug 30;8(8):e72932. doi: 10.1371/journal.pone.0072932. eCollection 2013.
We describe a supervised prediction method for diagnosis of acute myeloid leukemia (AML) from patient samples based on flow cytometry measurements. We use a data driven approach with machine learning methods to train a computational model that takes in flow cytometry measurements from a single patient and gives a confidence score of the patient being AML-positive. Our solution is based on an [Formula: see text] regularized logistic regression model that aggregates AML test statistics calculated from individual test tubes with different cell populations and fluorescent markers. The model construction is entirely data driven and no prior biological knowledge is used. The described solution scored a 100% classification accuracy in the DREAM6/FlowCAP2 Molecular Classification of Acute Myeloid Leukaemia Challenge against a golden standard consisting of 20 AML-positive and 160 healthy patients. Here we perform a more extensive validation of the prediction model performance and further improve and simplify our original method showing that statistically equal results can be obtained by using simple average marker intensities as features in the logistic regression model. In addition to the logistic regression based model, we also present other classification models and compare their performance quantitatively. The key benefit in our prediction method compared to other solutions with similar performance is that our model only uses a small fraction of the flow cytometry measurements making our solution highly economical.
我们描述了一种基于流式细胞术测量的急性髓系白血病(AML)患者样本的有监督预测方法。我们使用数据驱动的方法和机器学习方法来训练一个计算模型,该模型接收来自单个患者的流式细胞术测量值,并给出患者 AML 阳性的置信度得分。我们的解决方案基于[公式:见正文]正则化逻辑回归模型,该模型聚合了从具有不同细胞群和荧光标记的单个试管中计算出的 AML 测试统计信息。模型构建完全是数据驱动的,不使用任何先验的生物学知识。在 DREAM6/FlowCAP2 急性髓系白血病分子分类挑战中,该描述的解决方案在由 20 个 AML 阳性和 160 个健康患者组成的黄金标准中实现了 100%的分类准确率。在这里,我们对预测模型性能进行了更广泛的验证,并进一步改进和简化了我们的原始方法,表明在逻辑回归模型中使用简单的平均标记强度作为特征可以获得统计学上等同的结果。除了基于逻辑回归的模型,我们还提出了其他分类模型,并对其性能进行了定量比较。与具有类似性能的其他解决方案相比,我们的预测方法的主要优势在于,我们的模型仅使用流式细胞术测量的一小部分,使得我们的解决方案具有很高的经济性。