Suppr超能文献

KLFDAPC:一种用于空间遗传结构分析的有监督机器学习方法。

KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis.

机构信息

Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife, KY16 9TF, UK.

Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine & Department of Quantitative and Computational Biology, University of Southern California, USA.

出版信息

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac202.

Abstract

Geographic patterns of human genetic variation provide important insights into human evolution and disease. A commonly used tool to detect and describe them is principal component analysis (PCA) or the supervised linear discriminant analysis of principal components (DAPC). However, genetic features produced from both approaches could fail to correctly characterize population structure for complex scenarios involving admixture. In this study, we introduce Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC), a supervised non-linear approach for inferring individual geographic genetic structure that could rectify the limitations of these approaches by preserving the multimodal space of samples. We tested the power of KLFDAPC to infer population structure and to predict individual geographic origin using neural networks. Simulation results showed that KLFDAPC has higher discriminatory power than PCA and DAPC. The application of our method to empirical European and East Asian genome-wide genetic datasets indicated that the first two reduced features of KLFDAPC correctly recapitulated the geography of individuals and significantly improved the accuracy of predicting individual geographic origin when compared to PCA and DAPC. Therefore, KLFDAPC can be useful for geographic ancestry inference, design of genome scans and correction for spatial stratification in GWAS that link genes to adaptation or disease susceptibility.

摘要

人类遗传变异的地理模式为人类进化和疾病提供了重要的见解。一种常用的检测和描述它们的工具是主成分分析(PCA)或主成分的监督线性判别分析(DAPC)。然而,这两种方法产生的遗传特征可能无法正确描述涉及混合的复杂情况下的人口结构。在这项研究中,我们引入了核局部 Fisher 判别主成分分析(KLFDAPC),这是一种用于推断个体地理遗传结构的监督非线性方法,通过保留样本的多峰空间,可以纠正这些方法的局限性。我们使用神经网络测试了 KLFDAPC 推断人口结构和预测个体地理起源的能力。模拟结果表明,KLFDAPC 比 PCA 和 DAPC 具有更高的判别能力。我们的方法在欧洲和东亚全基因组遗传数据集上的应用表明,KLFDAPC 的前两个降维特征正确地再现了个体的地理位置,并与 PCA 和 DAPC 相比,显著提高了预测个体地理起源的准确性。因此,KLFDAPC 可用于地理祖先推断、基因组扫描设计以及与适应或疾病易感性相关的基因关联 GWAS 中的空间分层校正。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe9a/9294434/da0e92877f90/bbac202f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验