• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

模拟非洲和非非洲低覆盖度和高覆盖度全基因组序列数据,以评估变异调用方法。

Simulation of African and non-African low and high coverage whole genome sequence data to assess variant calling approaches.

机构信息

Faculty of Health Sciences, Division of Human Genetics, Department of Pathology, University of Cape Town, Cape Town, South Africa.

Department of Statistical Sciences, University of Cape Town, Cape Town, South Africa.

出版信息

Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa366.

DOI:10.1093/bib/bbaa366
PMID:33341897
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8294538/
Abstract

Current variant calling (VC) approaches have been designed to leverage populations of long-range haplotypes and were benchmarked using populations of European descent, whereas most genetic diversity is found in non-European such as Africa populations. Working with these genetically diverse populations, VC tools may produce false positive and false negative results, which may produce misleading conclusions in prioritization of mutations, clinical relevancy and actionability of genes. The most prominent question is which tool or pipeline has a high rate of sensitivity and precision when analysing African data with either low or high sequence coverage, given the high genetic diversity and heterogeneity of this data. Here, a total of 100 synthetic Whole Genome Sequencing (WGS) samples, mimicking the genetics profile of African and European subjects for different specific coverage levels (high/low), have been generated to assess the performance of nine different VC tools on these contrasting datasets. The performances of these tools were assessed in false positive and false negative call rates by comparing the simulated golden variants to the variants identified by each VC tool. Combining our results on sensitivity and positive predictive value (PPV), VarDict [PPV = 0.999 and Matthews correlation coefficient (MCC) = 0.832] and BCFtools (PPV = 0.999 and MCC = 0.813) perform best when using African population data on high and low coverage data. Overall, current VC tools produce high false positive and false negative rates when analysing African compared with European data. This highlights the need for development of VC approaches with high sensitivity and precision tailored for populations characterized by high genetic variations and low linkage disequilibrium.

摘要

当前的变异调用 (VC) 方法旨在利用长程单倍型群体,并使用欧洲血统的群体进行基准测试,而大多数遗传多样性存在于非欧洲群体中,如非洲人群。在处理这些具有遗传多样性的人群时,VC 工具可能会产生假阳性和假阴性结果,这可能会导致在突变优先级、基因的临床相关性和可操作性方面产生误导性结论。最突出的问题是,在分析具有低或高序列覆盖度的非洲数据时,哪种工具或管道具有高灵敏度和精度,考虑到这种数据的高度遗传多样性和异质性。在这里,总共生成了 100 个合成全基因组测序 (WGS) 样本,模拟了非洲和欧洲个体的遗传特征,用于不同特定覆盖度水平(高/低),以评估 9 种不同 VC 工具在这些对比数据集上的性能。通过将模拟的黄金变体与每个 VC 工具识别的变体进行比较,评估了这些工具的假阳性和假阴性调用率的性能。综合我们在敏感性和阳性预测值 (PPV) 上的结果,VarDict [PPV = 0.999 和 Matthews 相关系数 (MCC) = 0.832] 和 BCFtools(PPV = 0.999 和 MCC = 0.813)在使用高覆盖度和低覆盖度非洲人群数据时表现最佳。总体而言,与欧洲数据相比,当前的 VC 工具在分析非洲数据时会产生较高的假阳性和假阴性率。这突出表明需要开发具有高灵敏度和精度的 VC 方法,以适应具有高遗传变异和低连锁不平衡特征的人群。

相似文献

1
Simulation of African and non-African low and high coverage whole genome sequence data to assess variant calling approaches.模拟非洲和非非洲低覆盖度和高覆盖度全基因组序列数据,以评估变异调用方法。
Brief Bioinform. 2021 Jul 20;22(4). doi: 10.1093/bib/bbaa366.
2
Comparative analysis of whole-genome sequencing pipelines to minimize false negative findings.全基因组测序流程的比较分析,以尽量减少假阴性发现。
Sci Rep. 2019 Mar 1;9(1):3219. doi: 10.1038/s41598-019-39108-2.
3
Linkage disequilibrium maps for European and African populations constructed from whole genome sequence data.基于全基因组序列数据构建的欧洲和非洲人群连锁不平衡图谱。
Sci Data. 2019 Oct 17;6(1):208. doi: 10.1038/s41597-019-0227-y.
4
Selected neuropeptide genes show genetic differentiation between Africans and non-Africans.选定的神经肽基因在非洲人和非非洲人之间表现出基因分化。
BMC Genet. 2020 Mar 14;21(1):31. doi: 10.1186/s12863-020-0835-8.
5
Interleukin-8 genetic diversity, haplotype structure and production differ in two ethnically distinct South African populations.白细胞介素-8 的遗传多样性、单倍型结构和产生在两个具有不同种族背景的南非人群中存在差异。
Cytokine. 2021 Jul;143:155489. doi: 10.1016/j.cyto.2021.155489. Epub 2021 Apr 1.
6
Evaluating the performance of tools used to call minority variants from whole genome short-read data.评估用于从全基因组短读数据中检测罕见变异的工具的性能。
Wellcome Open Res. 2018 Sep 13;3:21. doi: 10.12688/wellcomeopenres.13538.2. eCollection 2018.
7
Linkage disequilibrium analysis reveals an albuminuria risk haplotype containing three missense mutations in the cubilin gene with striking differences among European and African ancestry populations.连锁不平衡分析揭示了一个载脂蛋白风险单倍型,包含 cubilin 基因中的三个错义突变,在欧洲和非洲血统人群中存在显著差异。
BMC Nephrol. 2012 Oct 31;13:142. doi: 10.1186/1471-2369-13-142.
8
LPA and PLG sequence variation and kringle IV-2 copy number in two populations.两个群体中LPA和PLG序列变异及kringle IV-2拷贝数
Hum Hered. 2008;66(4):199-209. doi: 10.1159/000143403. Epub 2008 Jul 9.
9
Genomic Analyses of Human European Diversity at the Southwestern Edge: Isolation, African Influence and Disease Associations in the Canary Islands.人类欧洲多样性在西南边缘的基因组分析:加那利群岛的隔离、非洲影响和疾病关联。
Mol Biol Evol. 2018 Dec 1;35(12):3010-3026. doi: 10.1093/molbev/msy190.
10
Exploring the occurrence of classic selective sweeps in humans using whole-genome sequencing data sets.利用全基因组测序数据集探索人类经典选择清除的发生。
Mol Biol Evol. 2014 Jul;31(7):1850-68. doi: 10.1093/molbev/msu118. Epub 2014 Apr 1.

引用本文的文献

1
Detecting known neoepitopes, gene fusions, transposable elements, and circular RNAs in cell-free RNA.检测游离RNA中的已知新表位、基因融合、转座元件和环状RNA。
Bioinformatics. 2025 May 6;41(5). doi: 10.1093/bioinformatics/btaf138.
2
A comprehensive catalog of single nucleotide polymorphisms (SNPs) from the black pepper (Piper nigrum L.) genome.来自黑胡椒(Piper nigrum L.)基因组的单核苷酸多态性(SNP)综合目录。
BMC Genomics. 2025 Mar 17;26(1):256. doi: 10.1186/s12864-025-11414-2.
3
The evaluation of Bcftools mpileup and GATK HaplotypeCaller for variant calling in non-human species.Bcftools mpileup 和 GATK HaplotypeCaller 在非人类物种变异调用中的评估。
Sci Rep. 2022 Jul 5;12(1):11331. doi: 10.1038/s41598-022-15563-2.
4
High-throughput estimation of allele frequencies using combined pooled-population sequencing and haplotype-based data processing.利用合并群体测序和基于单倍型的数据处理进行等位基因频率的高通量估计。
Plant Methods. 2022 Mar 21;18(1):34. doi: 10.1186/s13007-022-00852-8.
5
Human OMICs and Computational Biology Research in Africa: Current Challenges and Prospects.非洲人类组学和计算生物学研究:当前的挑战与展望。
OMICS. 2021 Apr;25(4):213-233. doi: 10.1089/omi.2021.0004. Epub 2021 Apr 1.

本文引用的文献

1
A broad survey of DNA sequence data simulation tools.DNA 序列数据模拟工具的广泛调查。
Brief Funct Genomics. 2020 Jan 22;19(1):49-59. doi: 10.1093/bfgp/elz033.
2
Dissecting Mutation Prediction of Variants in African Genomes: Challenges and Perspectives.剖析非洲基因组中变异的突变预测:挑战与展望
Front Genet. 2019 Jun 25;10:601. doi: 10.3389/fgene.2019.00601. eCollection 2019.
3
Performance assessment of variant calling pipelines using human whole exome sequencing and simulated data.使用人类全外显子组测序和模拟数据评估变异调用管道的性能。
BMC Bioinformatics. 2019 Jun 17;20(1):342. doi: 10.1186/s12859-019-2928-9.
4
The critical needs and challenges for genetic architecture studies in Africa.非洲遗传结构研究的关键需求和挑战。
Curr Opin Genet Dev. 2018 Dec;53:113-120. doi: 10.1016/j.gde.2018.08.005. Epub 2018 Sep 18.
5
Whole-Exome Sequencing Reveals Uncaptured Variation and Distinct Ancestry in the Southern African Population of Botswana.全外显子组测序揭示博茨瓦纳南部非洲人群中的未捕获变异和独特的血统。
Am J Hum Genet. 2018 May 3;102(5):731-743. doi: 10.1016/j.ajhg.2018.03.010. Epub 2018 Apr 26.
6
A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data.用于下一代测序数据的体细胞单核苷酸变异检测算法综述。
Comput Struct Biotechnol J. 2018 Feb 6;16:15-24. doi: 10.1016/j.csbj.2018.01.003. eCollection 2018.
7
The genomic landscape of African populations in health and disease.非洲人群健康与疾病中的基因组格局。
Hum Mol Genet. 2017 Oct 1;26(R2):R225-R236. doi: 10.1093/hmg/ddx253.
8
Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data.评估用于非配对下一代测序数据的变异调用工具。
Sci Rep. 2017 Feb 24;7:43169. doi: 10.1038/srep43169.
9
A Survey of Computational Tools to Analyze and Interpret Whole Exome Sequencing Data.分析和解读全外显子组测序数据的计算工具综述
Int J Genomics. 2016;2016:7983236. doi: 10.1155/2016/7983236. Epub 2016 Dec 14.
10
Genomics is failing on diversity.基因组学在多样性方面表现不佳。
Nature. 2016 Oct 13;538(7624):161-164. doi: 10.1038/538161a.