Suppr超能文献

KLFDAPC:一种用于空间遗传结构分析的有监督机器学习方法。

KLFDAPC: a supervised machine learning approach for spatial genetic structure analysis.

机构信息

Centre for Biological Diversity, Sir Harold Mitchell Building, University of St Andrews, Fife, KY16 9TF, UK.

Center for Genetic Epidemiology, Department of Population and Public Health Sciences, Keck School of Medicine & Department of Quantitative and Computational Biology, University of Southern California, USA.

出版信息

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac202.

Abstract

Geographic patterns of human genetic variation provide important insights into human evolution and disease. A commonly used tool to detect and describe them is principal component analysis (PCA) or the supervised linear discriminant analysis of principal components (DAPC). However, genetic features produced from both approaches could fail to correctly characterize population structure for complex scenarios involving admixture. In this study, we introduce Kernel Local Fisher Discriminant Analysis of Principal Components (KLFDAPC), a supervised non-linear approach for inferring individual geographic genetic structure that could rectify the limitations of these approaches by preserving the multimodal space of samples. We tested the power of KLFDAPC to infer population structure and to predict individual geographic origin using neural networks. Simulation results showed that KLFDAPC has higher discriminatory power than PCA and DAPC. The application of our method to empirical European and East Asian genome-wide genetic datasets indicated that the first two reduced features of KLFDAPC correctly recapitulated the geography of individuals and significantly improved the accuracy of predicting individual geographic origin when compared to PCA and DAPC. Therefore, KLFDAPC can be useful for geographic ancestry inference, design of genome scans and correction for spatial stratification in GWAS that link genes to adaptation or disease susceptibility.

摘要

人类遗传变异的地理模式为人类进化和疾病提供了重要的见解。一种常用的检测和描述它们的工具是主成分分析(PCA)或主成分的监督线性判别分析(DAPC)。然而,这两种方法产生的遗传特征可能无法正确描述涉及混合的复杂情况下的人口结构。在这项研究中,我们引入了核局部 Fisher 判别主成分分析(KLFDAPC),这是一种用于推断个体地理遗传结构的监督非线性方法,通过保留样本的多峰空间,可以纠正这些方法的局限性。我们使用神经网络测试了 KLFDAPC 推断人口结构和预测个体地理起源的能力。模拟结果表明,KLFDAPC 比 PCA 和 DAPC 具有更高的判别能力。我们的方法在欧洲和东亚全基因组遗传数据集上的应用表明,KLFDAPC 的前两个降维特征正确地再现了个体的地理位置,并与 PCA 和 DAPC 相比,显著提高了预测个体地理起源的准确性。因此,KLFDAPC 可用于地理祖先推断、基因组扫描设计以及与适应或疾病易感性相关的基因关联 GWAS 中的空间分层校正。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fe9a/9294434/da0e92877f90/bbac202f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验