Department of Automation, Xiamen University, Xiamen 361005, Fujian, China.
Department of Basic Science, Mississippi State University, Mississippi 39762, USA.
Analyst. 2017 Oct 7;142(19):3588-3597. doi: 10.1039/c7an00944e. Epub 2017 Aug 30.
The application of machine learning in cancer diagnostics has shown great promise and is of importance in clinic settings. Here we consider applying machine learning methods to transcriptomic data derived from tumor-educated platelets (TEPs) from individuals with different types of cancer. We aim to define a reliability measure for diagnostic purposes to increase the potential for facilitating personalized treatments. To this end, we present a novel classification method called MFRB (for Multiple Fitting Regression and Bayes decision), which integrates the process of multiple fitting regression (MFR) with Bayes decision theory. MFR is first used to map multidimensional features of the transcriptomic data into a one-dimensional feature. The probability density function of each class in the mapped space is then adjusted using the Gaussian probability density function. Finally, the Bayes decision theory is used to build a probabilistic classifier with the estimated probability density functions. The output of MFRB can be used to determine which class a sample belongs to, as well as to assign a reliability measure for a given class. The classical support vector machine (SVM) and probabilistic SVM (PSVM) are used to evaluate the performance of the proposed method with simulated and real TEP datasets. Our results indicate that the proposed MFRB method achieves the best performance compared to SVM and PSVM, mainly due to its strong generalization ability for limited, imbalanced, and noisy data.
机器学习在癌症诊断中的应用显示出巨大的潜力,在临床环境中具有重要意义。在这里,我们考虑将机器学习方法应用于源自不同类型癌症患者的肿瘤教育血小板(TEP)的转录组数据。我们旨在定义一种用于诊断目的的可靠性度量标准,以增加促进个性化治疗的潜力。为此,我们提出了一种称为 MFRB(用于多拟合回归和贝叶斯决策)的新分类方法,该方法将多拟合回归(MFR)过程与贝叶斯决策理论集成在一起。首先使用 MFR 将转录组数据的多维特征映射到一维特征。然后使用高斯概率密度函数调整映射空间中每个类的概率密度函数。最后,使用贝叶斯决策理论使用估计的概率密度函数构建概率分类器。MFRB 的输出可用于确定样本属于哪个类别,以及为给定类别分配可靠性度量。经典的支持向量机(SVM)和概率支持向量机(PSVM)用于使用模拟和真实 TEP 数据集评估所提出方法的性能。我们的结果表明,与 SVM 和 PSVM 相比,所提出的 MFRB 方法的性能最佳,这主要是由于其对有限、不平衡和嘈杂数据的强大泛化能力。