Department of Computer Science, Wayne State University, Detroit, MI 48202, USA.
Department of Computer Science and Engineering, University of Nevada, Reno, NV 89557, USA.
Bioinformatics. 2020 Jan 15;36(2):487-495. doi: 10.1093/bioinformatics/btz561.
Recent advances in biomedical research have made massive amount of transcriptomic data available in public repositories from different sources. Due to the heterogeneity present in the individual experiments, identifying reproducible biomarkers for a given disease from multiple independent studies has become a major challenge. The widely used meta-analysis approaches, such as Fisher's method, Stouffer's method, minP and maxP, have at least two major limitations: (i) they are sensitive to outliers, and (ii) they perform only one statistical test for each individual study, and hence do not fully utilize the potential sample size to gain statistical power.
Here, we propose a gene-level meta-analysis framework that overcomes these limitations and identifies a gene signature that is reliable and reproducible across multiple independent studies of a given disease. The approach provides a comprehensive global signature that can be used to understand the underlying biological phenomena, and a smaller test signature that can be used to classify future samples of a given disease. We demonstrate the utility of the framework by constructing disease signatures for influenza and Alzheimer's disease using nine datasets including 1108 individuals. These signatures are then validated on 12 independent datasets including 912 individuals. The results indicate that the proposed approach performs better than the majority of the existing meta-analysis approaches in terms of both sensitivity as well as specificity. The proposed signatures could be further used in diagnosis, prognosis and identification of therapeutic targets.
Supplementary data are available at Bioinformatics online.
生物医学研究的最新进展使得大量转录组数据可从不同来源的公共存储库中获得。由于个体实验中存在异质性,因此从多个独立研究中确定给定疾病的可重复生物标志物已成为一个主要挑战。广泛使用的荟萃分析方法,如 Fisher 方法、Stouffer 方法、minP 和 maxP,至少存在两个主要局限性:(i)它们对离群值敏感,(ii)它们对每个单独的研究仅进行一次统计检验,因此不能充分利用潜在的样本量来获得统计功效。
在这里,我们提出了一种克服这些局限性的基因水平荟萃分析框架,该框架可识别出在给定疾病的多个独立研究中可靠且可重复的基因特征。该方法提供了一个全面的全局特征,可以用于理解潜在的生物学现象,以及一个更小的测试特征,可用于对给定疾病的未来样本进行分类。我们通过使用包括 1108 个人在内的九个数据集构建流感和阿尔茨海默病的疾病特征来证明该框架的实用性。然后,我们在包括 912 个人在内的 12 个独立数据集上验证了这些特征。结果表明,与大多数现有的荟萃分析方法相比,该方法在灵敏度和特异性方面都表现更好。所提出的特征可进一步用于诊断、预后和治疗靶点的鉴定。
补充数据可在《生物信息学》在线获取。