Lee Seunghak, Kong Soonho, Xing Eric P
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
Bioinformatics. 2016 Jun 15;32(12):i164-i173. doi: 10.1093/bioinformatics/btw270.
It remains a challenge to detect associations between genotypes and phenotypes because of insufficient sample sizes and complex underlying mechanisms involved in associations. Fortunately, it is becoming more feasible to obtain gene expression data in addition to genotypes and phenotypes, giving us new opportunities to detect true genotype-phenotype associations while unveiling their association mechanisms.
In this article, we propose a novel method, NETAM, that accurately detects associations between SNPs and phenotypes, as well as gene traits involved in such associations. We take a network-driven approach: NETAM first constructs an association network, where nodes represent SNPs, gene traits or phenotypes, and edges represent the strength of association between two nodes. NETAM assigns a score to each path from an SNP to a phenotype, and then identifies significant paths based on the scores. In our simulation study, we show that NETAM finds significantly more phenotype-associated SNPs than traditional genotype-phenotype association analysis under false positive control, taking advantage of gene expression data. Furthermore, we applied NETAM on late-onset Alzheimer's disease data and identified 477 significant path associations, among which we analyzed paths related to beta-amyloid, estrogen, and nicotine pathways. We also provide hypothetical biological pathways to explain our findings.
Software is available at http://www.sailing.cs.cmu.edu/
由于样本量不足以及关联中涉及的潜在机制复杂,检测基因型与表型之间的关联仍然是一项挑战。幸运的是,除了基因型和表型之外,获取基因表达数据变得越来越可行,这为我们检测真正的基因型 - 表型关联并揭示其关联机制提供了新的机会。
在本文中,我们提出了一种新颖的方法NETAM,它能够准确检测单核苷酸多态性(SNP)与表型之间的关联以及此类关联中涉及的基因特征。我们采用网络驱动的方法:NETAM首先构建一个关联网络,其中节点代表SNP、基因特征或表型,边代表两个节点之间的关联强度。NETAM为从一个SNP到一个表型的每条路径分配一个分数,然后根据这些分数识别出显著路径。在我们的模拟研究中,我们表明,在控制假阳性的情况下,NETAM利用基因表达数据比传统的基因型 - 表型关联分析发现了更多与表型相关的SNP。此外,我们将NETAM应用于迟发性阿尔茨海默病数据,并识别出477个显著的路径关联,其中我们分析了与β - 淀粉样蛋白、雌激素和尼古丁途径相关的路径。我们还提供了假设的生物学途径来解释我们的发现。
软件可在http://www.sailing.cs.cmu.edu/获取。