Division of Clinical Epidemiology, First Hospital of the Jilin University, Changchun, Jilin, China ; Center for Clinical and Translational Science, The Rockefeller University, New York, New York, United States of America.
PLoS One. 2013 Nov 19;8(11):e78302. doi: 10.1371/journal.pone.0078302. eCollection 2013.
As microarray technology has become mature and popular, the selection and use of a small number of relevant genes for accurate classification of samples has arisen as a hot topic in the circles of biostatistics and bioinformatics. However, most of the developed algorithms lack the ability to handle multiple classes, arguably a common application. Here, we propose an extension to an existing regularization algorithm, called Threshold Gradient Descent Regularization (TGDR), to specifically tackle multi-class classification of microarray data. When there are several microarray experiments addressing the same/similar objectives, one option is to use a meta-analysis version of TGDR (Meta-TGDR), which considers the classification task as a combination of classifiers with the same structure/model while allowing the parameters to vary across studies. However, the original Meta-TGDR extension did not offer a solution to the prediction on independent samples. Here, we propose an explicit method to estimate the overall coefficients of the biomarkers selected by Meta-TGDR. This extension permits broader applicability and allows a comparison between the predictive performance of Meta-TGDR and TGDR using an independent testing set.
Using real-world applications, we demonstrated the proposed multi-TGDR framework works well and the number of selected genes is less than the sum of all individualized binary TGDRs. Additionally, Meta-TGDR and TGDR on the batch-effect adjusted pooled data approximately provided same results. By adding Bagging procedure in each application, the stability and good predictive performance are warranted.
Compared with Meta-TGDR, TGDR is less computing time intensive, and requires no samples of all classes in each study. On the adjusted data, it has approximate same predictive performance with Meta-TGDR. Thus, it is highly recommended.
随着微阵列技术的成熟和普及,选择和使用少量相关基因来准确分类样本已成为生物统计学和生物信息学领域的热门话题。然而,大多数开发的算法缺乏处理多类别的能力,可以说是一种常见的应用。在这里,我们提出了一种扩展现有正则化算法的方法,称为阈值梯度下降正则化(TGDR),专门用于处理微阵列数据的多类分类。当有几个微阵列实验针对相同/相似的目标时,一种选择是使用 TGDR 的元分析版本(Meta-TGDR),它将分类任务视为具有相同结构/模型的分类器的组合,同时允许参数在研究中变化。然而,原始的 Meta-TGDR 扩展并没有为独立样本的预测提供解决方案。在这里,我们提出了一种显式方法来估计 Meta-TGDR 选择的生物标志物的总体系数。这种扩展允许更广泛的适用性,并允许使用独立测试集比较 Meta-TGDR 和 TGDR 的预测性能。
使用真实应用,我们证明了所提出的多 TGDR 框架运行良好,并且选择的基因数量少于所有个体化二元 TGDR 的总和。此外,经过批次效应调整后,Meta-TGDR 和 TGDR 在汇总数据上的结果大致相同。通过在每个应用程序中添加 Bagging 过程,可以保证稳定性和良好的预测性能。
与 Meta-TGDR 相比,TGDR 的计算时间更短,并且不需要每个研究中的所有类别的样本。在调整后的数据上,它具有与 Meta-TGDR 近似相同的预测性能。因此,强烈推荐使用。