Yang Ze-Hui, Zheng Rui, Gao Yuan, Zhang Qiang
Department of Respiratory Medicine, Shengjing Hospital of China Medical University, Shenyang, China.
Clin Respir J. 2016 Sep;10(5):631-46. doi: 10.1111/crj.12271. Epub 2015 Mar 3.
With the widespread application of high-throughput technology, numerous meta-analysis methods have been proposed for differential expression profiling across multiple studies.
We identified the suitable differentially expressed (DE) genes that contributed to lung adenocarcinoma (ADC) clustering based on seven popular multiple meta-analysis methods.
Seven microarray expression profiles of ADC and normal controls were extracted from the ArrayExpress database. The Bioconductor was used to perform the data preliminary preprocessing. Then, DE genes across multiple studies were identified. Hierarchical clustering was applied to compare the classification performance for microarray data samples. The classification efficiency was compared based on accuracy, sensitivity and specificity.
Across seven datasets, 573 ADC cases and 222 normal controls were collected. After filtering out unexpressed and noninformative genes, 3688 genes were remained for further analysis. The classification efficiency analysis showed that DE genes identified by sum of ranks method separated ADC from normal controls with the best accuracy, sensitivity and specificity of 0.953, 0.969 and 0.932, respectively. The gene set with the highest classification accuracy mainly participated in the regulation of response to external stimulus (P = 7.97E-04), cyclic nucleotide-mediated signaling (P = 0.01), regulation of cell morphogenesis (P = 0.01) and regulation of cell proliferation (P = 0.01).
Evaluation of DE genes identified by different meta-analysis methods in classification efficiency provided a new perspective to the choice of the suitable method in a given application. Varying meta-analysis methods always present varying abilities, so synthetic consideration should be taken when providing meta-analysis methods for particular research.
随着高通量技术的广泛应用,已提出多种元分析方法用于跨多项研究的差异表达谱分析。
基于七种常用的多元元分析方法,我们鉴定出有助于肺腺癌(ADC)聚类的合适差异表达(DE)基因。
从ArrayExpress数据库中提取了七个ADC和正常对照的微阵列表达谱。使用Bioconductor进行数据初步预处理。然后,鉴定多项研究中的DE基因。应用层次聚类比较微阵列数据样本的分类性能。基于准确性、敏感性和特异性比较分类效率。
在七个数据集中,共收集了573例ADC病例和222例正常对照。在滤除未表达和无信息的基因后,保留3688个基因用于进一步分析。分类效率分析表明,通过秩和法鉴定的DE基因将ADC与正常对照分开,其准确性、敏感性和特异性分别为0.953、0.969和0.932,效果最佳。分类准确性最高的基因集主要参与对外界刺激的反应调节(P = 7.97E-04)、环核苷酸介导的信号传导(P = 0.01)、细胞形态发生调节(P = 0.01)和细胞增殖调节(P = 0.01)。
评估不同元分析方法鉴定的DE基因在分类效率方面,为在特定应用中选择合适方法提供了新视角。不同的元分析方法能力各异,因此在为特定研究提供元分析方法时应综合考虑。