Suppr超能文献

pcadapt:一个基于主成分分析进行选择的基因组扫描的R软件包。

pcadapt: an R package to perform genome scans for selection based on principal component analysis.

作者信息

Luu Keurcien, Bazin Eric, Blum Michael G B

机构信息

Laboratoire TIMC-IMAG, UMR 5525, CNRS, Université Grenoble Alpes, Grenoble, France.

Laboratoire d'Ecologie Alpine UMR 5553, CNRS, Université Grenoble Alpes, Grenoble, France.

出版信息

Mol Ecol Resour. 2017 Jan;17(1):67-77. doi: 10.1111/1755-0998.12592. Epub 2016 Sep 7.

Abstract

The R package pcadapt performs genome scans to detect genes under selection based on population genomic data. It assumes that candidate markers are outliers with respect to how they are related to population structure. Because population structure is ascertained with principal component analysis, the package is fast and works with large-scale data. It can handle missing data and pooled sequencing data. By contrast to population-based approaches, the package handle admixed individuals and does not require grouping individuals into populations. Since its first release, pcadapt has evolved in terms of both statistical approach and software implementation. We present results obtained with robust Mahalanobis distance, which is a new statistic for genome scans available in the 2.0 and later versions of the package. When hierarchical population structure occurs, Mahalanobis distance is more powerful than the communality statistic that was implemented in the first version of the package. Using simulated data, we compare pcadapt to other computer programs for genome scans (BayeScan, hapflk, OutFLANK, sNMF). We find that the proportion of false discoveries is around a nominal false discovery rate set at 10% with the exception of BayeScan that generates 40% of false discoveries. We also find that the power of BayeScan is severely impacted by the presence of admixed individuals whereas pcadapt is not impacted. Last, we find that pcadapt and hapflk are the most powerful in scenarios of population divergence and range expansion. Because pcadapt handles next-generation sequencing data, it is a valuable tool for data analysis in molecular ecology.

摘要

R软件包pcadapt基于群体基因组数据进行全基因组扫描,以检测受到选择的基因。它假定候选标记在与群体结构的关系方面是异常值。由于群体结构是通过主成分分析确定的,该软件包速度快,可处理大规模数据。它可以处理缺失数据和混合测序数据。与基于群体的方法不同,该软件包可以处理混合个体,并且不需要将个体分组到不同群体中。自首次发布以来,pcadapt在统计方法和软件实现方面都有所发展。我们展示了使用稳健马氏距离获得的结果,这是该软件包2.0及更高版本中可用的一种新的全基因组扫描统计量。当出现分层群体结构时,马氏距离比该软件包第一版中实现的共性统计量更有效。使用模拟数据,我们将pcadapt与其他用于全基因组扫描的计算机程序(BayeScan、hapflk、OutFLANK、sNMF)进行了比较。我们发现,除了产生40%错误发现的BayeScan外,错误发现的比例约为设定的10%的名义错误发现率。我们还发现,BayeScan的功效受到混合个体存在的严重影响,而pcadapt则不受影响。最后,我们发现pcadapt和hapflk在群体分化和范围扩张的情况下最有效。由于pcadapt可以处理下一代测序数据,它是分子生态学数据分析的一个有价值的工具。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验