Suppr超能文献

一种结合基于随机森林的技术和通过潜在变量进行连锁不平衡建模的方法,用于进行多基因座全基因组关联研究。

A method combining a random forest-based technique with the modeling of linkage disequilibrium through latent variables, to run multilocus genome-wide association studies.

机构信息

LS2N, UMR CNRS 6004, Université de Nantes, 2 rue de la Houssinière, BP 92208, Nantes Cedex, 44322, France.

出版信息

BMC Bioinformatics. 2018 Mar 27;19(1):106. doi: 10.1186/s12859-018-2054-0.

Abstract

BACKGROUND

Genome-wide association studies (GWASs) have been widely used to discover the genetic basis of complex phenotypes. However, standard single-SNP GWASs suffer from lack of power. In particular, they do not directly account for linkage disequilibrium, that is the dependences between SNPs (Single Nucleotide Polymorphisms).

RESULTS

We present the comparative study of two multilocus GWAS strategies, in the random forest-based framework. The first method, T-Trees, was designed by Botta and collaborators (Botta et al., PLoS ONE 9(4):e93379, 2014). We designed the other method, which is an innovative hybrid method combining T-Trees with the modeling of linkage disequilibrium. Linkage disequilibrium is modeled through a collection of tree-shaped Bayesian networks with latent variables, following our former works (Mourad et al., BMC Bioinformatics 12(1):16, 2011). We compared the two methods, both on simulated and real data. For dominant and additive genetic models, in either of the conditions simulated, the hybrid approach always slightly performs better than T-Trees. We assessed predictive powers through the standard ROC technique on 14 real datasets. For 10 of the 14 datasets analyzed, the already high predicted power observed for T-Trees (0.910-0.946) can still be increased by up to 0.030. We also assessed whether the distributions of SNPs' scores obtained from T-Trees and the hybrid approach differed. Finally, we thoroughly analyzed the intersections of top 100 SNPs output by any two or the three methods amongst T-Trees, the hybrid approach, and the single-SNP method.

CONCLUSIONS

The sophistication of T-Trees through finer linkage disequilibrium modeling is shown beneficial. The distributions of SNPs' scores generated by T-Trees and the hybrid approach are shown statistically different, which suggests complementary of the methods. In particular, for 12 of the 14 real datasets, the distribution tail of highest SNPs' scores shows larger values for the hybrid approach. Thus are pinpointed more interesting SNPs than by T-Trees, to be provided as a short list of prioritized SNPs, for a further analysis by biologists. Finally, among the 211 top 100 SNPs jointly detected by the single-SNP method, T-Trees and the hybrid approach over the 14 datasets, we identified 72 and 38 SNPs respectively present in the top25s and top10s for each method.

摘要

背景

全基因组关联研究(GWAS)已被广泛用于发现复杂表型的遗传基础。然而,标准的单 SNP GWAS 存在效力不足的问题。特别是,它们不能直接解释连锁不平衡,即 SNP(单核苷酸多态性)之间的依赖关系。

结果

我们在基于随机森林的框架内对两种多基因座 GWAS 策略进行了比较研究。第一种方法 T-Trees 是由 Botta 及其合作者设计的(Botta 等人,PLoS ONE 9(4):e93379,2014)。我们设计了另一种方法,这是一种将 T-Trees 与连锁不平衡建模相结合的创新混合方法。连锁不平衡通过一组具有潜在变量的树状贝叶斯网络进行建模,这是我们之前工作的延续(Mourad 等人,BMC Bioinformatics 12(1):16,2011)。我们比较了这两种方法,包括模拟数据和真实数据。对于显性和加性遗传模型,在模拟的任何一种情况下,混合方法的性能始终略优于 T-Trees。我们通过 14 个真实数据集的标准 ROC 技术评估了预测能力。对于分析的 14 个数据集中的 10 个,已经很高的 T-Trees 预测能力(0.910-0.946)还可以通过高达 0.030 的方式提高。我们还评估了 T-Trees 和混合方法获得的 SNP 得分分布是否不同。最后,我们详细分析了 T-Trees、混合方法和单 SNP 方法之间任何两种或三种方法输出的前 100 个 SNP 的交集。

结论

通过更精细的连锁不平衡建模来完善 T-Trees 的复杂性是有益的。T-Trees 和混合方法生成的 SNP 得分分布在统计学上存在差异,这表明这两种方法是互补的。特别是,对于 14 个真实数据集的 12 个,最高 SNP 得分分布的尾部显示出混合方法的更大值。因此,与 T-Trees 相比,指出了更多有趣的 SNP,可以作为优先 SNP 的简短列表,供生物学家进一步分析。最后,在单 SNP 方法、T-Trees 和混合方法在 14 个数据集上共同检测到的 211 个前 100 个 SNP 中,我们分别确定了 72 个和 38 个 SNP 分别出现在两种方法的前 25 个和前 10 个中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3665/5870262/00493e860ba4/12859_2018_2054_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验