针对草图参考基因组组装的变异位点调用中的位置偏差。

Positional bias in variant calls against draft reference assemblies.

作者信息

Briskine Roman V, Shimizu Kentaro K

机构信息

Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, Zurich, CH-8057, Switzerland.

Functional Genomics Center Zurich, Winterthurerstrasse 190, Zurich, CH-8057, Switzerland.

出版信息

BMC Genomics. 2017 Mar 28;18(1):263. doi: 10.1186/s12864-017-3637-2.

DOI:10.1186/s12864-017-3637-2

PMID:28351369

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5368935/

Abstract

BACKGROUND

Whole genome resequencing projects may implement variant calling using draft reference genomes assembled de novo from short-read libraries. Despite lower quality of such assemblies, they allowed researchers to extend a wide range of population genetic and genome-wide association analyses to non-model species. As the variant calling pipelines are complex and involve many software packages, it is important to understand inherent biases and limitations at each step of the analysis.

RESULTS

In this article, we report a positional bias present in variant calling performed against draft reference assemblies constructed from de Bruijn or string overlap graphs. We assessed how frequently variants appeared at each position counted from ends of a contig or scaffold sequence, and discovered unexpectedly high number of variants at the positions related to the length of either k-mers or reads used for the assembly. We detected the bias in both publicly available draft assemblies from Assemblathon 2 competition as well as in the assemblies we generated from our simulated short-read data. Simulations confirmed that the bias causing variants are predominantly false positives induced by reads from spatially distant repeated sequences. The bias is particularly strong in contig assemblies. Scaffolding does not eliminate the bias but tends to mitigate it because of the changes in variants' relative positions and alterations in read alignments. The bias can be effectively reduced by filtering out the variants that reside in repetitive elements.

CONCLUSIONS

Draft genome sequences generated by several popular assemblers appear to be susceptible to the positional bias potentially affecting many resequencing projects in non-model species. The bias is inherent to the assembly algorithms and arises from their particular handling of repeated sequences. It is recommended to reduce the bias by filtering especially if higher-quality genome assembly cannot be achieved. Our findings can help other researchers to improve the quality of their variant data sets and reduce artefactual findings in downstream analyses.

摘要

背景

全基因组重测序项目可以使用从短读长文库中从头组装的草图参考基因组来进行变异检测。尽管此类组装的质量较低，但它们使研究人员能够将广泛的群体遗传学和全基因组关联分析扩展到非模式物种。由于变异检测流程复杂且涉及许多软件包，了解分析每个步骤中固有的偏差和局限性非常重要。

结果

在本文中，我们报告了在针对由de Bruijn或字符串重叠图构建的草图参考组装进行变异检测时存在的位置偏差。我们评估了变异在从重叠群或支架序列末端计数的每个位置出现的频率，并发现与用于组装的k-mer或读长长度相关的位置上出现了意外高数量的变异。我们在Assemblathon 2竞赛的公开可用草图组装以及我们从模拟短读长数据生成的组装中都检测到了这种偏差。模拟证实，导致变异的偏差主要是由来自空间上遥远的重复序列的读长诱导的假阳性。这种偏差在重叠群组装中尤为强烈。构建支架并不能消除偏差，但由于变异的相对位置变化和读长比对的改变，往往会减轻偏差。通过过滤掉位于重复元件中的变异，可以有效地减少这种偏差。

结论

由几种流行的组装器生成的草图基因组序列似乎容易受到位置偏差的影响，这可能会影响许多非模式物种的重测序项目。这种偏差是组装算法固有的，源于它们对重复序列的特殊处理。建议通过过滤来减少偏差，特别是在无法获得更高质量的基因组组装时。我们的发现可以帮助其他研究人员提高其变异数据集的质量，并减少下游分析中的伪发现。

相似文献

Positional bias in variant calls against draft reference assemblies.针对草图参考基因组组装的变异位点调用中的位置偏差。

BMC Genomics. 2017 Mar 28;18(1):263. doi: 10.1186/s12864-017-3637-2.

An improved genome reference for the African cichlid, Metriaclima zebra.对非洲丽鱼科鱼类斑马拟丽鱼的一种改进的基因组参考。

BMC Genomics. 2015 Sep 22;16(1):724. doi: 10.1186/s12864-015-1930-5.

Benchmarking hybrid assemblies of Giardia and prediction of widespread intra-isolate structural variation.原核生物和预测广泛的种内结构变异的混合组装的基准测试。

Parasit Vectors. 2020 Feb 28;13(1):108. doi: 10.1186/s13071-020-3968-8.

Challenges and advances for transcriptome assembly in non-model species.非模式物种转录组组装面临的挑战与进展

PLoS One. 2017 Sep 20;12(9):e0185020. doi: 10.1371/journal.pone.0185020. eCollection 2017.

The complex task of choosing a de novo assembly: lessons from fungal genomes.选择从头组装的复杂任务：来自真菌基因组的经验教训。

Comput Biol Chem. 2014 Dec;53 Pt A:97-107. doi: 10.1016/j.compbiolchem.2014.08.014. Epub 2014 Aug 29.

Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies.清除单倍型：三代二倍体基因组组装的等位基因 contig 重新分配。

BMC Bioinformatics. 2018 Nov 29;19(1):460. doi: 10.1186/s12859-018-2485-7.

Direct comparison of performance of single nucleotide variant calling in human genome with alignment-based and assembly-based approaches.直接比较基于比对和组装的方法在人类基因组中单核苷酸变异calling 的性能。

Sci Rep. 2017 Sep 8;7(1):10963. doi: 10.1038/s41598-017-10826-9.

Comparative performance of transcriptome assembly methods for non-model organisms.非模式生物转录组组装方法的比较性能

BMC Genomics. 2016 Jul 27;17:523. doi: 10.1186/s12864-016-2923-8.

FastEtch: A Fast Sketch-Based Assembler for Genomes.FastEtch：一种基于草图的快速基因组装配器。

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1091-1106. doi: 10.1109/TCBB.2017.2737999. Epub 2017 Sep 11.

GAPPadder: a sensitive approach for closing gaps on draft genomes with short sequence reads.GAPPadder：一种使用短序列读长来闭合草图基因组缺口的灵敏方法。

BMC Genomics. 2019 Jun 6;20(Suppl 5):426. doi: 10.1186/s12864-019-5703-4.

引用本文的文献

Polygenic plague resistance in the great gerbil uncovered by population sequencing.通过群体测序揭示大沙鼠的多基因鼠疫抗性

PNAS Nexus. 2022 Oct 5;1(5):pgac211. doi: 10.1093/pnasnexus/pgac211. eCollection 2022 Nov.

Evolution of Colistin Resistance in the Klebsiella pneumoniae Complex Follows Multiple Evolutionary Trajectories with Variable Effects on Fitness and Virulence Characteristics.肺炎克雷伯菌复合体中多粘菌素耐药的进化遵循多种进化轨迹，对其适应性和毒力特征的影响各不相同。

Antimicrob Agents Chemother. 2020 Dec 16;65(1). doi: 10.1128/AAC.01958-20.

Pan-Genomic and Polymorphic Driven Prediction of Antibiotic Resistance in .泛基因组和多态性驱动的抗生素耐药性预测

Front Microbiol. 2019 Jul 4;10:1446. doi: 10.3389/fmicb.2019.01446. eCollection 2019.

Genomic exploration of sequential clinical isolates reveals a distinctive molecular signature of persistent Staphylococcus aureus bacteraemia.对连续临床分离株的基因组探索揭示了持续性金黄色葡萄球菌菌血症的独特分子特征。

Genome Med. 2018 Aug 23;10(1):65. doi: 10.1186/s13073-018-0574-x.

Reference-guided de novo assembly approach improves genome reconstruction for related species.参考引导的从头组装方法改进了相关物种的基因组重建。

BMC Bioinformatics. 2017 Nov 10;18(1):474. doi: 10.1186/s12859-017-1911-6.

本文引用的文献

Systematic comparison of variant calling pipelines using gold standard personal exome variants.使用金标准个人外显子变体对变异检测流程进行系统比较。

Sci Rep. 2015 Dec 7;5:17875. doi: 10.1038/srep17875.

A Comparison of Variant Calling Pipelines Using Genome in a Bottle as a Reference.使用“瓶中基因组”作为参考的变异检测流程比较

Biomed Res Int. 2015;2015:456479. doi: 10.1155/2015/456479. Epub 2015 Oct 11.

Large-scale whole-genome sequencing of the Icelandic population.大规模全基因组测序的冰岛人口。

Nat Genet. 2015 May;47(5):435-44. doi: 10.1038/ng.3247. Epub 2015 Mar 25.

An analytical framework for optimizing variant discovery from personal genomes.用于优化从个人基因组中发现变异的分析框架。

Nat Commun. 2015 Feb 25;6:6275. doi: 10.1038/ncomms7275.

The draft genome of Primula veris yields insights into the molecular basis of heterostyly.报春花的基因组草图为花柱异长的分子基础提供了见解。

Genome Biol. 2015 Jan 24;16(1):12. doi: 10.1186/s13059-014-0567-z.

From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline.从FastQ数据到高可信度变异检测：基因组分析工具包最佳实践流程

Curr Protoc Bioinformatics. 2013;43(1110):11.10.1-11.10.33. doi: 10.1002/0471250953.bi1110s43.

SAGE: String-overlap Assembly of GEnomes.SAGE：基因组的字符串重叠组装。

BMC Bioinformatics. 2014 Sep 15;15(1):302. doi: 10.1186/1471-2105-15-302.

The genome and linkage map of the northern pike (Esox lucius): conserved synteny revealed between the salmonid sister group and the Neoteleostei.白斑狗鱼（Esox lucius）的基因组和连锁图谱：鲑形目姊妹类群与新真骨鱼之间保守的同线性关系

PLoS One. 2014 Jul 28;9(7):e102089. doi: 10.1371/journal.pone.0102089. eCollection 2014.

Toward better understanding of artifacts in variant calling from high-coverage samples.为了更好地理解高覆盖样本中变体调用中的伪影。

Bioinformatics. 2014 Oct 15;30(20):2843-51. doi: 10.1093/bioinformatics/btu356. Epub 2014 Jun 27.

NextGenMap: fast and accurate read mapping in highly polymorphic genomes.NextGenMap：在高度多态基因组中快速准确的读取映射。

Bioinformatics. 2013 Nov 1;29(21):2790-1. doi: 10.1093/bioinformatics/btt468. Epub 2013 Aug 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

针对草图参考基因组组装的变异位点调用中的位置偏差。

Positional bias in variant calls against draft reference assemblies.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献