Suppr超能文献

基于无监督机器学习方法的家系数据基因-基因交互作用研究:EPISFA。

Exploring gene-gene interaction in family-based data with an unsupervised machine learning method: EPISFA.

机构信息

Department of Epidemiology and Biostatistics, School of Public Health, Peking University, Beijing, China.

Department of Epidemiology and Biostatistics, School of Public Health, Capital Medical University, Beijing, China.

出版信息

Genet Epidemiol. 2020 Nov;44(8):811-824. doi: 10.1002/gepi.22342. Epub 2020 Sep 1.

Abstract

Gene-gene interaction (G × G) is thought to fill the gap between the estimated heritability of complex diseases and the limited genetic proportion explained by identified single-nucleotide polymorphisms. The current tools for exploring G × G were often developed for case-control designs with less considerations for their applications in families. Family-based studies are robust against bias led from population stratification in genetic studies and helpful in understanding G × G. We proposed a new algorithm epistasis sparse factor analysis (EPISFA) and epistasis sparse factor analysis for linkage disequilibrium (EPISFA-LD) based on unsupervised machine learning to screen G × G. Extensive simulations were performed to compare EPISFA/EPISFA-LD with a classical family-based algorithm FAM-MDR (family-based multifactor dimensionality reduction). The results showed that EPISFA/EPISFA-LD is a tool of both high power and computational efficiency that could be applied in family designs and is applicable within high-dimensionality datasets. Finally, we applied EPISFA/EPISFA-LD to a real dataset drawn from the Fangshan/family-based Ischemic Stroke Study in China. Five pairs of G × G were discovered by EPISFA/EPISFA-LD, including three pairs verified by other algorithms (FAM-MDR and logistic), and an additional two pairs uniquely identified by EPISFA/EPISFA-LD only. The results from EPISFA might offer new insights for understanding the genetic etiology of complex diseases. EPISFA/EPISFA-LD was implemented in R. All relevant source code as well as simulated data could be freely downloaded from https://github.com/doublexism/episfa.

摘要

基因-基因相互作用(G×G)被认为填补了复杂疾病遗传率估计值与已识别的单核苷酸多态性所解释的遗传比例之间的空白。目前用于探索 G×G 的工具通常是为病例对照设计开发的,对其在家庭中的应用考虑较少。基于家庭的研究对于遗传研究中由于人群分层引起的偏倚具有稳健性,有助于理解 G×G。我们提出了一种新的算法——基于无监督机器学习的连锁不平衡的上位稀疏因子分析(EPISFA)和上位稀疏因子分析(EPISFA-LD),用于筛选 G×G。进行了广泛的模拟比较 EPISFA/EPISFA-LD 与经典的基于家庭的算法 FAM-MDR(基于家庭的多因子维度降低)。结果表明,EPISFA/EPISFA-LD 是一种具有高功效和计算效率的工具,可应用于家庭设计,并适用于高维数据集。最后,我们将 EPISFA/EPISFA-LD 应用于来自中国房山/基于家庭的缺血性中风研究的真实数据集。通过 EPISFA/EPISFA-LD 发现了五对 G×G,其中三对通过其他算法(FAM-MDR 和逻辑)验证,另外两对仅通过 EPISFA/EPISFA-LD 唯一识别。EPISFA 的结果可能为理解复杂疾病的遗传病因提供新的见解。EPISFA/EPISFA-LD 是用 R 实现的。所有相关的源代码以及模拟数据都可以从 https://github.com/doublexism/episfa 上免费下载。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验