• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从头组装程序在个人基因组变异发现中的比较分析。

Comparative analysis of de novo assemblers for variation discovery in personal genomes.

机构信息

Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.

Center for Individualized Medicine Bioinformatics Program, Mayo Clinic, USA.

出版信息

Brief Bioinform. 2018 Sep 28;19(5):893-904. doi: 10.1093/bib/bbx037.

DOI:10.1093/bib/bbx037
PMID:28407084
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6169673/
Abstract

Current variant discovery approaches often rely on an initial read mapping to the reference sequence. Their effectiveness is limited by the presence of gaps, potential misassemblies, regions of duplicates with a high-sequence similarity and regions of high-sequence divergence in the reference. Also, mapping-based approaches are less sensitive to large INDELs and complex variations and provide little phase information in personal genomes. A few de novo assemblers have been developed to identify variants through direct variant calling from the assembly graph, micro-assembly and whole-genome assembly, but mainly for whole-genome sequencing (WGS) data. We developed SGVar, a de novo assembly workflow for haplotype-based variant discovery from whole-exome sequencing (WES) data. Using simulated human exome data, we compared SGVar with five variation-aware de novo assemblers and with BWA-MEM together with three haplotype- or local de novo assembly-based callers. SGVar outperforms the other assemblers in sensitivity and tolerance of sequencing errors. We recapitulated the findings on whole-genome and exome data from a Utah residents with Northern and Western European ancestry (CEU) trio, showing that SGVar had high sensitivity both in the highly divergent human leukocyte antigen (HLA) region and in non-HLA regions of chromosome 6. In particular, SGVar is robust to sequencing error, k-mer selection, divergence level and coverage depth. Unlike mapping-based approaches, SGVar is capable of resolving long-range phase and identifying large INDELs from WES, more prominently from WGS. We conclude that SGVar represents an ideal platform for WES-based variant discovery in highly divergent regions and across the whole genome.

摘要

目前的变异发现方法通常依赖于对参考序列的初始读取映射。它们的有效性受到参考序列中存在的间隙、潜在的错误组装、具有高序列相似性的重复区域和高序列差异区域的限制。此外,基于映射的方法对大型 INDEL 和复杂变异的敏感性较低,并且在个人基因组中提供的相位信息较少。已经开发了一些从头组装程序,通过从组装图、微组装和全基因组组装中直接进行变体调用来识别变体,但主要用于全基因组测序 (WGS) 数据。我们开发了 SGVar,这是一种从头组装工作流程,用于从全外显子组测序 (WES) 数据中发现单倍型变体。使用模拟的人类外显子组数据,我们将 SGVar 与五个变体感知的从头组装程序以及 BWA-MEM 与三种基于单倍型或局部从头组装的调用器进行了比较。SGVar 在敏感性和对测序错误的容忍度方面优于其他组装程序。我们在具有北和西欧血统的犹他州居民的全基因组和外显子组数据上重现了这些发现,结果表明 SGVar 在高度分化的人类白细胞抗原 (HLA) 区域和 6 号染色体的非 HLA 区域均具有很高的敏感性。特别是,SGVar 对测序错误、k-mer 选择、分化水平和覆盖深度具有鲁棒性。与基于映射的方法不同,SGVar 能够从 WES 解析长程相位并识别大型 INDEL,从 WGS 更为显著。我们得出结论,SGVar 代表了在高度分化区域和整个基因组中进行基于 WES 的变体发现的理想平台。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f84/6169673/95156f871d20/bbx037f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f84/6169673/44f013be3b4f/bbx037f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f84/6169673/03d4232c344d/bbx037f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f84/6169673/239b55288df8/bbx037f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f84/6169673/1a421ca84c66/bbx037f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f84/6169673/95156f871d20/bbx037f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f84/6169673/44f013be3b4f/bbx037f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f84/6169673/03d4232c344d/bbx037f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f84/6169673/239b55288df8/bbx037f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f84/6169673/1a421ca84c66/bbx037f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4f84/6169673/95156f871d20/bbx037f5.jpg

相似文献

1
Comparative analysis of de novo assemblers for variation discovery in personal genomes.从头组装程序在个人基因组变异发现中的比较分析。
Brief Bioinform. 2018 Sep 28;19(5):893-904. doi: 10.1093/bib/bbx037.
2
An analytical workflow for accurate variant discovery in highly divergent regions.一种用于在高度分化区域进行准确变异发现的分析流程。
BMC Genomics. 2016 Sep 2;17(1):703. doi: 10.1186/s12864-016-3045-z.
3
Impact of post-alignment processing in variant discovery from whole exome data.全外显子数据变异发现中比对后处理的影响
BMC Bioinformatics. 2016 Oct 3;17(1):403. doi: 10.1186/s12859-016-1279-z.
4
Robust identification of deletions in exome and genome sequence data based on clustering of Mendelian errors.基于孟德尔错误聚类的外显子和基因组序列数据中缺失的稳健识别。
Hum Mutat. 2018 Jun;39(6):870-881. doi: 10.1002/humu.23419. Epub 2018 Mar 22.
5
Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications.整合基于图谱、组装和单倍型的方法以在临床测序应用中进行变异检测。
Nat Genet. 2014 Aug;46(8):912-918. doi: 10.1038/ng.3036. Epub 2014 Jul 13.
6
Reducing INDEL calling errors in whole genome and exome sequencing data.降低全基因组和外显子组测序数据中 INDEL 调用错误。
Genome Med. 2014 Oct 28;6(10):89. doi: 10.1186/s13073-014-0089-z. eCollection 2014.
7
State-of-the-art genome inference in the human MHC.人类 MHC 中的最新基因组推断。
Int J Biochem Cell Biol. 2021 Feb;131:105882. doi: 10.1016/j.biocel.2020.105882. Epub 2020 Nov 12.
8
Deep whole-genome sequencing of 90 Han Chinese genomes.对 90 个汉族个体的全基因组深度测序。
Gigascience. 2017 Sep 1;6(9):1-7. doi: 10.1093/gigascience/gix067.
9
Systematic dissection of biases in whole-exome and whole-genome sequencing reveals major determinants of coding sequence coverage.系统剖析全外显子组测序和全基因组测序中的偏倚揭示了编码序列覆盖的主要决定因素。
Sci Rep. 2020 Feb 6;10(1):2057. doi: 10.1038/s41598-020-59026-y.
10
JWES: a new pipeline for whole genome/exome sequence data processing, management, and gene-variant discovery, annotation, prediction, and genotyping.JWES:一个用于全基因组/外显子组序列数据处理、管理以及基因变异发现、注释、预测和基因分型的新管道。
FEBS Open Bio. 2021 Sep;11(9):2441-2452. doi: 10.1002/2211-5463.13261. Epub 2021 Aug 11.

引用本文的文献

1
Impact and characterization of serial structural variations across humans and great apes.人类和大型类人猿中连续结构变异的影响和特征。
Nat Commun. 2024 Sep 13;15(1):8007. doi: 10.1038/s41467-024-52027-9.
2
A survey of k-mer methods and applications in bioinformatics.生物信息学中k-mer方法及其应用综述。
Comput Struct Biotechnol J. 2024 May 21;23:2289-2303. doi: 10.1016/j.csbj.2024.05.025. eCollection 2024 Dec.
3
Genomic variant benchmark: if you cannot measure it, you cannot improve it.基因组变异基准:如果无法衡量,就无法改进。

本文引用的文献

1
Impact of post-alignment processing in variant discovery from whole exome data.全外显子数据变异发现中比对后处理的影响
BMC Bioinformatics. 2016 Oct 3;17(1):403. doi: 10.1186/s12859-016-1279-z.
2
An analytical workflow for accurate variant discovery in highly divergent regions.一种用于在高度分化区域进行准确变异发现的分析流程。
BMC Genomics. 2016 Sep 2;17(1):703. doi: 10.1186/s12864-016-3045-z.
3
Genetic variation and the de novo assembly of human genomes.人类基因组的遗传变异与从头组装
Genome Biol. 2023 Oct 5;24(1):221. doi: 10.1186/s13059-023-03061-1.
4
De Novo Structural Variations of Escherichia coli Detected by Nanopore Long-Read Sequencing.纳米孔长读测序检测大肠杆菌的从头结构变异。
Genome Biol Evol. 2023 Jun 1;15(6). doi: 10.1093/gbe/evad106.
5
Characteristics and potential functional effects of long insertions in Asian butternuts.亚洲葫芦的长插入特征及其潜在功能影响。
BMC Genomics. 2022 Oct 28;23(1):732. doi: 10.1186/s12864-022-08961-3.
6
The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms.第三届国际黑客马拉松,旨在将大规模基因组构成的见解应用于广泛生物的用例中。
F1000Res. 2022 May 16;11:530. doi: 10.12688/f1000research.110194.1. eCollection 2022.
7
From genome structure to function: insights into structural variation in microbiology.从基因组结构到功能:微生物学结构变异的见解。
Curr Opin Microbiol. 2022 Oct;69:102192. doi: 10.1016/j.mib.2022.102192. Epub 2022 Aug 26.
8
On Variant Discovery in Genomes of Fungal Plant Pathogens.关于真菌植物病原体基因组中的变异发现
Front Microbiol. 2020 Apr 16;11:626. doi: 10.3389/fmicb.2020.00626. eCollection 2020.
9
Structural variant identification and characterization.结构变异的识别与表征
Chromosome Res. 2020 Mar;28(1):31-47. doi: 10.1007/s10577-019-09623-z. Epub 2020 Jan 6.
10
Structural variant calling: the long and the short of it.结构变异 calling:长与短。
Genome Biol. 2019 Nov 20;20(1):246. doi: 10.1186/s13059-019-1828-7.
Nat Rev Genet. 2015 Nov;16(11):627-40. doi: 10.1038/nrg3933. Epub 2015 Oct 7.
4
An integrated map of structural variation in 2,504 human genomes.2504个人类基因组结构变异的整合图谱。
Nature. 2015 Oct 1;526(7571):75-81. doi: 10.1038/nature15394.
5
Read clouds uncover variation in complex regions of the human genome.读取云图揭示了人类基因组复杂区域的变异。
Genome Res. 2015 Oct;25(10):1570-80. doi: 10.1101/gr.191189.115. Epub 2015 Aug 18.
6
FermiKit: assembly-based variant calling for Illumina resequencing data.FermiKit:用于Illumina重测序数据的基于组装的变异检测
Bioinformatics. 2015 Nov 15;31(22):3694-6. doi: 10.1093/bioinformatics/btv440. Epub 2015 Jul 27.
7
Best practices for evaluating single nucleotide variant calling methods for microbial genomics.评估微生物基因组学单核苷酸变异检测方法的最佳实践。
Front Genet. 2015 Jul 7;6:235. doi: 10.3389/fgene.2015.00235. eCollection 2015.
8
De novo meta-assembly of ultra-deep sequencing data.从头组装超深度测序数据。
Bioinformatics. 2015 Jun 15;31(12):i9-16. doi: 10.1093/bioinformatics/btv226.
9
Assembling short reads from jumping libraries with large insert sizes.利用带有较大插入片段的跳跃文库进行短读序列组装。
Bioinformatics. 2015 Oct 15;31(20):3262-8. doi: 10.1093/bioinformatics/btv337. Epub 2015 Jun 3.
10
When less is more: 'slicing' sequencing data improves read decoding accuracy and de novo assembly quality.少即是多:“切片”测序数据可提高读段解码准确性和从头组装质量。
Bioinformatics. 2015 Sep 15;31(18):2972-80. doi: 10.1093/bioinformatics/btv311. Epub 2015 May 20.