• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用主成分分析检测低覆盖高通量测序数据中的选择。

Detecting selection in low-coverage high-throughput sequencing data using principal component analysis.

机构信息

Department of Biology, The Bioinformatics Centre, University of Copenhagen, Copenhagen, Denmark.

出版信息

BMC Bioinformatics. 2021 Sep 29;22(1):470. doi: 10.1186/s12859-021-04375-2.

DOI:10.1186/s12859-021-04375-2
PMID:34587903
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8480091/
Abstract

BACKGROUND

Identification of selection signatures between populations is often an important part of a population genetic study. Leveraging high-throughput DNA sequencing larger sample sizes of populations with similar ancestries has become increasingly common. This has led to the need of methods capable of identifying signals of selection in populations with a continuous cline of genetic differentiation. Individuals from continuous populations are inherently challenging to group into meaningful units which is why existing methods rely on principal components analysis for inference of the selection signals. These existing methods require called genotypes as input which is problematic for studies based on low-coverage sequencing data.

MATERIALS AND METHODS

We have extended two principal component analysis based selection statistics to genotype likelihood data and applied them to low-coverage sequencing data from the 1000 Genomes Project for populations with European and East Asian ancestry to detect signals of selection in samples with continuous population structure.

RESULTS

Here, we present two selections statistics which we have implemented in the PCAngsd framework. These methods account for genotype uncertainty, opening for the opportunity to conduct selection scans in continuous populations from low and/or variable coverage sequencing data. To illustrate their use, we applied the methods to low-coverage sequencing data from human populations of East Asian and European ancestries and show that the implemented selection statistics can control the false positive rate and that they identify the same signatures of selection from low-coverage sequencing data as state-of-the-art software using high quality called genotypes.

CONCLUSION

We show that selection scans of low-coverage sequencing data of populations with similar ancestry perform on par with that obtained from high quality genotype data. Moreover, we demonstrate that PCAngsd outperform selection statistics obtained from called genotypes from low-coverage sequencing data without the need for ad-hoc filtering.

摘要

背景

在群体间识别选择信号通常是群体遗传学研究的重要组成部分。利用高通量 DNA 测序,对具有相似祖先的较大样本群体进行研究变得越来越普遍。这导致需要能够识别具有连续遗传分化渐变的群体中选择信号的方法。连续群体中的个体本身就难以分组为有意义的单位,这就是为什么现有的方法依赖于主成分分析来推断选择信号的原因。这些现有的方法需要输入已调用的基因型,这对于基于低覆盖测序数据的研究来说是有问题的。

材料和方法

我们已经将两种基于主成分分析的选择统计方法扩展到基因型似然数据,并将其应用于来自 1000 基因组计划的具有欧洲和东亚祖先的连续群体的低覆盖测序数据中,以检测具有连续群体结构的样本中的选择信号。

结果

在这里,我们提出了两种选择统计方法,我们已经在 PCAngsd 框架中实现了这些方法。这些方法考虑了基因型的不确定性,为在低覆盖度和/或可变覆盖度测序数据的连续群体中进行选择扫描提供了机会。为了说明它们的用途,我们将这些方法应用于东亚和欧洲祖先人群的低覆盖测序数据,并表明所实现的选择统计方法可以控制假阳性率,并且它们可以从低覆盖测序数据中识别与使用高质量已调用基因型的最新软件相同的选择信号。

结论

我们表明,具有相似祖先的低覆盖测序数据的选择扫描与从高质量基因型数据获得的扫描相当。此外,我们证明了 PCAngsd 优于从低覆盖测序数据的已调用基因型获得的选择统计数据,而无需进行特殊过滤。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1931/8480091/270a70009597/12859_2021_4375_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1931/8480091/b9f3977089e3/12859_2021_4375_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1931/8480091/f10f6690ff65/12859_2021_4375_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1931/8480091/4d9c4a2653a7/12859_2021_4375_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1931/8480091/270a70009597/12859_2021_4375_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1931/8480091/b9f3977089e3/12859_2021_4375_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1931/8480091/f10f6690ff65/12859_2021_4375_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1931/8480091/4d9c4a2653a7/12859_2021_4375_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1931/8480091/270a70009597/12859_2021_4375_Fig4_HTML.jpg

相似文献

1
Detecting selection in low-coverage high-throughput sequencing data using principal component analysis.使用主成分分析检测低覆盖高通量测序数据中的选择。
BMC Bioinformatics. 2021 Sep 29;22(1):470. doi: 10.1186/s12859-021-04375-2.
2
Detecting Genomic Signatures of Natural Selection with Principal Component Analysis: Application to the 1000 Genomes Data.利用主成分分析检测自然选择的基因组特征:应用于千人基因组数据
Mol Biol Evol. 2016 Apr;33(4):1082-93. doi: 10.1093/molbev/msv334. Epub 2015 Dec 29.
3
Detection of identity by descent using next-generation whole genome sequencing data.利用下一代全基因组测序数据进行血统身份检测。
BMC Bioinformatics. 2012 Jun 6;13:121. doi: 10.1186/1471-2105-13-121.
4
Fast and accurate estimation of multidimensional site frequency spectra from low-coverage high-throughput sequencing data.从低覆盖高通量测序数据中快速准确地估计多维位点频率谱。
Gigascience. 2022 May 17;11. doi: 10.1093/gigascience/giac032.
5
FastPop: a rapid principal component derived method to infer intercontinental ancestry using genetic data.FastPop:一种利用遗传数据推断洲际血统的快速主成分衍生方法。
BMC Bioinformatics. 2016 Mar 9;17:122. doi: 10.1186/s12859-016-0965-1.
6
Calculation of Tajima's D and other neutrality test statistics from low depth next-generation sequencing data.从低深度下一代测序数据中计算 Tajima's D 和其他中性检验统计量。
BMC Bioinformatics. 2013 Oct 2;14:289. doi: 10.1186/1471-2105-14-289.
7
Massively parallel sequencing of 165 ancestry-informative SNPs and forensic biogeographical ancestry inference in three southern Chinese Sinitic/Tai-Kadai populations.对 165 个具有族群遗传信息的 SNP 进行大规模平行测序,并对中国南方三个汉藏语系/台语族群进行法医学生物地理族群推断。
Forensic Sci Int Genet. 2021 May;52:102475. doi: 10.1016/j.fsigen.2021.102475. Epub 2021 Feb 2.
8
Generation of SNP datasets for orangutan population genomics using improved reduced-representation sequencing and direct comparisons of SNP calling algorithms.利用改良的简化代表性测序和 SNP 调用算法的直接比较,生成猩猩群体基因组学的 SNP 数据集。
BMC Genomics. 2014 Jan 10;15:16. doi: 10.1186/1471-2164-15-16.
9
An evaluation of sequencing coverage and genotyping strategies to assess neutral and adaptive diversity.评估测序覆盖度和基因分型策略,以评估中性和适应性多样性。
Mol Ecol Resour. 2019 Nov;19(6):1497-1515. doi: 10.1111/1755-0998.13070. Epub 2019 Sep 9.
10
Next generation sequencing of a set of ancestry-informative SNPs: ancestry assignment of three continental populations and estimating ancestry composition for Mongolians.一组与祖先相关的 SNPs 的下一代测序:三个大陆人群的祖先归属以及蒙古人的祖先成分估计。
Mol Genet Genomics. 2020 Jul;295(4):1027-1038. doi: 10.1007/s00438-020-01660-2. Epub 2020 Mar 23.

引用本文的文献

1
Metagenomic biodiversity assessment within an offshore wind farm.海上风电场内的宏基因组生物多样性评估。
Sci Rep. 2025 May 14;15(1):16786. doi: 10.1038/s41598-025-01541-x.
2
Museomics of an extinct European flat oyster population.一个已灭绝的欧洲扁牡蛎种群的博物馆组学研究
Sci Rep. 2025 Apr 22;15(1):13906. doi: 10.1038/s41598-025-96743-8.
3
Parallel and convergent evolution in genes underlying seasonal migration.季节性迁徙相关基因中的平行进化和趋同进化

本文引用的文献

1
Detecting Selection in Multiple Populations by Modeling Ancestral Admixture Components.通过构建祖先混合成分模型检测多群体中的选择。
Mol Biol Evol. 2022 Jan 7;39(1). doi: 10.1093/molbev/msab294.
2
Biases in Demographic Modeling Affect Our Understanding of Recent Divergence.人口建模中的偏差影响了我们对近期分歧的理解。
Mol Biol Evol. 2021 Jun 25;38(7):2967-2985. doi: 10.1093/molbev/msab047.
3
Footprints of local adaptation span hundreds of linked genes in the Atlantic silverside genome.局部适应性的印记跨越了大西洋银汉鱼基因组中数百个相互关联的基因。
Evol Lett. 2024 Nov 30;9(2):189-208. doi: 10.1093/evlett/qrae064. eCollection 2025 Apr.
4
Genetic and morphological shifts associated with climate change in a migratory bird.一种候鸟中与气候变化相关的遗传和形态变化。
BMC Biol. 2025 Jan 7;23(1):3. doi: 10.1186/s12915-024-02107-5.
5
Impact of putatively beneficial genomic loci on gene expression in little brown bats (, Le Conte, 1831) affected by white-nose syndrome.假定有益的基因组位点对受白鼻综合征影响的棕蝠(Le Conte,1831年)基因表达的影响。
Evol Appl. 2024 Sep 19;17(9):e13748. doi: 10.1111/eva.13748. eCollection 2024 Sep.
6
Unravelling reference bias in ancient DNA datasets.揭示古代DNA数据集中的参考偏差
Bioinformatics. 2024 Jul 1;40(7). doi: 10.1093/bioinformatics/btae436.
7
Variable parallelism in the genomic basis of age at maturity across spatial scales in Atlantic Salmon.大西洋鲑鱼成熟年龄基因组基础在空间尺度上的可变平行性。
Ecol Evol. 2024 Apr 5;14(4):e11068. doi: 10.1002/ece3.11068. eCollection 2024 Apr.
8
Panmixia in the American eel extends to its tropical range of distribution: Biological implications and policymaking challenges.美洲鳗鲡的随机交配延伸至其热带分布范围:生物学意义及政策制定挑战。
Evol Appl. 2023 Nov 17;16(12):1872-1888. doi: 10.1111/eva.13599. eCollection 2023 Dec.
9
Recent natural selection conferred protection against schizophrenia by non-antagonistic pleiotropy.近期自然选择通过非拮抗多效性赋予了个体对精神分裂症的保护。
Sci Rep. 2023 Sep 19;13(1):15500. doi: 10.1038/s41598-023-42578-0.
10
Convergent genomic signatures of local adaptation across a continental-scale environmental gradient.大陆尺度环境梯度上的局部适应的趋同基因组特征。
Sci Adv. 2023 May 19;9(20):eadd0560. doi: 10.1126/sciadv.add0560.
Evol Lett. 2020 Aug 19;4(5):430-443. doi: 10.1002/evl3.189. eCollection 2020 Oct.
4
Performing Highly Efficient Genome Scans for Local Adaptation with R Package pcadapt Version 4.使用 R 包 pcadapt 版本 4 进行高效的基因组扫描以检测局部适应
Mol Biol Evol. 2020 Jul 1;37(7):2153-2154. doi: 10.1093/molbev/msaa053.
5
Landscape drivers of genomic diversity and divergence in woodland Eucalyptus.林地桉树基因组多样性和分化的景观驱动因素。
Mol Ecol. 2019 Dec;28(24):5232-5247. doi: 10.1111/mec.15287. Epub 2019 Nov 17.
6
The global diversity of Haemonchus contortus is shaped by human intervention and climate.全球扭转血矛线虫的多样性是由人类干预和气候塑造的。
Nat Commun. 2019 Oct 22;10(1):4811. doi: 10.1038/s41467-019-12695-4.
7
Tracing the ancestry of modern bread wheats.追溯现代面包小麦的起源。
Nat Genet. 2019 May;51(5):905-911. doi: 10.1038/s41588-019-0393-z. Epub 2019 May 1.
8
Testing for Hardy-Weinberg equilibrium in structured populations using genotype or low-depth next generation sequencing data.使用基因型或低深度下一代测序数据检测结构群体中的哈迪-温伯格平衡。
Mol Ecol Resour. 2019 Sep;19(5):1144-1152. doi: 10.1111/1755-0998.13019. Epub 2019 Jun 12.
9
Genomic Analyses from Non-invasive Prenatal Testing Reveal Genetic Associations, Patterns of Viral Infections, and Chinese Population History.非侵入性产前检测的基因组分析揭示了遗传关联、病毒感染模式和中国人口历史。
Cell. 2018 Oct 4;175(2):347-359.e14. doi: 10.1016/j.cell.2018.08.016.
10
A Comprehensive Map of Genetic Variation in the World's Largest Ethnic Group-Han Chinese.世界上最大的族群——汉族的遗传变异综合图谱。
Mol Biol Evol. 2018 Nov 1;35(11):2736-2750. doi: 10.1093/molbev/msy170.