Phan John H, Young Andrew N, Wang May D
Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, 313 Ferst Drive, Atlanta, GA 30332, USA.
ScientificWorldJournal. 2012;2012:989637. doi: 10.1100/2012/989637. Epub 2012 Dec 18.
Combining multiple microarray datasets increases sample size and leads to improved reproducibility in identification of informative genes and subsequent clinical prediction. Although microarrays have increased the rate of genomic data collection, sample size is still a major issue when identifying informative genetic biomarkers. Because of this, feature selection methods often suffer from false discoveries, resulting in poorly performing predictive models. We develop a simple meta-analysis-based feature selection method that captures the knowledge in each individual dataset and combines the results using a simple rank average. In a comprehensive study that measures robustness in terms of clinical application (i.e., breast, renal, and pancreatic cancer), microarray platform heterogeneity, and classifier (i.e., logistic regression, diagonal LDA, and linear SVM), we compare the rank average meta-analysis method to five other meta-analysis methods. Results indicate that rank average meta-analysis consistently performs well compared to five other meta-analysis methods.
合并多个微阵列数据集可增加样本量,并在识别信息基因及后续临床预测方面提高重现性。尽管微阵列提高了基因组数据的收集速度,但在识别信息性遗传生物标志物时,样本量仍是一个主要问题。因此,特征选择方法常常存在错误发现,导致预测模型性能不佳。我们开发了一种基于简单元分析的特征选择方法,该方法可获取每个单独数据集中的知识,并使用简单的排名均值来合并结果。在一项全面研究中,我们从临床应用(即乳腺癌、肾癌和胰腺癌)、微阵列平台异质性以及分类器(即逻辑回归、对角线性判别分析和线性支持向量机)等方面衡量稳健性,将排名均值元分析方法与其他五种元分析方法进行比较。结果表明,与其他五种元分析方法相比,排名均值元分析始终表现良好。