用于重复基因组区域查询的遗传变异词库。

A thesaurus of genetic variation for interrogation of repetitive genomic regions.

作者信息

Kerzendorfer Claudia, Konopka Tomasz, Nijman Sebastian M B

机构信息

Research Center for Molecular Medicine of the Austrian Academy of Sciences (CeMM), Vienna, Austria.

Research Center for Molecular Medicine of the Austrian Academy of Sciences (CeMM), Vienna, Austria

出版信息

Nucleic Acids Res. 2015 May 26;43(10):e68. doi: 10.1093/nar/gkv178. Epub 2015 Mar 27.

DOI:10.1093/nar/gkv178

PMID:25820428

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4446415/

Abstract

Detecting genetic variation is one of the main applications of high-throughput sequencing, but is still challenging wherever aligning short reads poses ambiguities. Current state-of-the-art variant calling approaches avoid such regions, arguing that it is necessary to sacrifice detection sensitivity to limit false discovery. We developed a method that links candidate variant positions within repetitive genomic regions into clusters. The technique relies on a resource, a thesaurus of genetic variation, that enumerates genomic regions with similar sequence. The resource is computationally intensive to generate, but once compiled can be applied efficiently to annotate and prioritize variants in repetitive regions. We show that thesaurus annotation can reduce the rate of false variant calls due to mappability by up to three orders of magnitude. We apply the technique to whole genome datasets and establish that called variants in low mappability regions annotated using the thesaurus can be experimentally validated. We then extend the analysis to a large panel of exomes to show that the annotation technique opens possibilities to study variation in hereto hidden and under-studied parts of the genome.

摘要

检测基因变异是高通量测序的主要应用之一，但在短读长比对存在歧义的任何地方，这仍然具有挑战性。当前最先进的变异检测方法会避开这些区域，认为有必要牺牲检测灵敏度以限制错误发现。我们开发了一种方法，将重复基因组区域内的候选变异位置链接成簇。该技术依赖于一种资源，即基因变异词库，它枚举了具有相似序列的基因组区域。生成该资源计算量很大，但一旦编译完成，就可以有效地应用于注释重复区域中的变异并对其进行优先级排序。我们表明，词库注释可将由于可映射性导致的错误变异调用率降低多达三个数量级。我们将该技术应用于全基因组数据集，并确定使用词库注释的低可映射性区域中调用的变异可以通过实验验证。然后，我们将分析扩展到一大组外显子组，以表明注释技术为研究基因组中迄今隐藏且研究不足的部分的变异开辟了可能性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5ab2/4446415/fe9b25354cc6/gkv178fig1.jpg

相似文献

A thesaurus of genetic variation for interrogation of repetitive genomic regions.用于重复基因组区域查询的遗传变异词库。

Nucleic Acids Res. 2015 May 26;43(10):e68. doi: 10.1093/nar/gkv178. Epub 2015 Mar 27.

Comparison of genetic variants in matched samples using thesaurus annotation.使用同义词库注释对匹配样本中的基因变异进行比较。

Bioinformatics. 2016 Mar 1;32(5):657-63. doi: 10.1093/bioinformatics/btv654. Epub 2015 Nov 5.

Structural variation analysis with strobe reads.使用 strobe reads 进行结构变异分析。

Bioinformatics. 2010 May 15;26(10):1291-8. doi: 10.1093/bioinformatics/btq153. Epub 2010 Apr 8.

Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR.使用ANNOVAR和wANNOVAR进行基因组变异注释和优先级排序。

Nat Protoc. 2015 Oct;10(10):1556-66. doi: 10.1038/nprot.2015.105. Epub 2015 Sep 17.

India Allele Finder: a web-based annotation tool for identifying common alleles in next-generation sequencing data of Indian origin.印度等位基因查找器：一种基于网络的注释工具，用于识别印度裔人群下一代测序数据中的常见等位基因。

BMC Res Notes. 2017 Jun 27;10(1):233. doi: 10.1186/s13104-017-2556-2.

Jannovar: a java library for exome annotation.Jannovar：一个用于外显子注释的Java库。

Hum Mutat. 2014 May;35(5):548-55. doi: 10.1002/humu.22531. Epub 2014 Apr 9.

Accurately annotate compound effects of genetic variants using a context-sensitive framework.使用上下文敏感框架准确注释基因变异的复合效应。

Nucleic Acids Res. 2017 Jun 2;45(10):e82. doi: 10.1093/nar/gkx041.

Recurrent miscalling of missense variation from short-read genome sequence data.短读基因组序列数据中错义变异的反复误判。

BMC Genomics. 2019 Jul 16;20(Suppl 8):546. doi: 10.1186/s12864-019-5863-2.

Utilizing mapping targets of sequences underrepresented in the reference assembly to reduce false positive alignments.利用参考组装中代表性不足的序列的映射目标来减少假阳性比对。

Nucleic Acids Res. 2015 Nov 16;43(20):e133. doi: 10.1093/nar/gkv671. Epub 2015 Jul 10.

[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]

Yi Chuan Xue Bao. 2004 May;31(5):431-43.

引用本文的文献

A multilocus approach for accurate variant calling in low-copy repeats using whole-genome sequencing.采用全基因组测序的多位点方法在低拷贝重复中进行准确的变异调用。

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i279-i287. doi: 10.1093/bioinformatics/btad268.

A pan-cancer landscape of somatic mutations in non-unique regions of the human genome.人类基因组非唯一区域体细胞突变的泛癌图谱。

Nat Biotechnol. 2021 Dec;39(12):1589-1596. doi: 10.1038/s41587-021-00971-y. Epub 2021 Jul 19.

Comparison of genetic variants in matched samples using thesaurus annotation.使用同义词库注释对匹配样本中的基因变异进行比较。

Bioinformatics. 2016 Mar 1;32(5):657-63. doi: 10.1093/bioinformatics/btv654. Epub 2015 Nov 5.

本文引用的文献

Validation and assessment of variant calling pipelines for next-generation sequencing.下一代测序变异检测流程的验证与评估

Hum Genomics. 2014 Jul 30;8(1):14. doi: 10.1186/1479-7364-8-14.

Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications.整合基于图谱、组装和单倍型的方法以在临床测序应用中进行变异检测。

Nat Genet. 2014 Aug;46(8):912-918. doi: 10.1038/ng.3036. Epub 2014 Jul 13.

Whole-genome haplotyping using long reads and statistical methods.使用长读段和统计方法进行全基因组单倍型分型。

Nat Biotechnol. 2014 Mar;32(3):261-266. doi: 10.1038/nbt.2833. Epub 2014 Feb 23.

Comprehensive analysis to improve the validation rate for single nucleotide variants detected by next-generation sequencing.综合分析以提高通过下一代测序检测到的单核苷酸变异的验证率。

PLoS One. 2014 Jan 29;9(1):e86664. doi: 10.1371/journal.pone.0086664. eCollection 2014.

Whole-genome haplotype reconstruction using proximity-ligation and shotgun sequencing.利用邻近连接和鸟枪法测序进行全基因组单倍型重建。

Nat Biotechnol. 2013 Dec;31(12):1111-8. doi: 10.1038/nbt.2728. Epub 2013 Nov 3.

Modeling precision treatment of breast cancer.乳腺癌精准治疗建模

Genome Biol. 2013;14(10):R110. doi: 10.1186/gb-2013-14-10-r110.

A reversible gene trap collection empowers haploid genetics in human cells.一个可反转的基因捕获库使人类细胞中的单倍体遗传学成为可能。

Nat Methods. 2013 Oct;10(10):965-71. doi: 10.1038/nmeth.2609. Epub 2013 Aug 25.

Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing.基于大规模平行DNA测序的临床癌症基因组分析检测方法的开发与验证

Nat Biotechnol. 2013 Nov;31(11):1023-31. doi: 10.1038/nbt.2696. Epub 2013 Oct 20.

Mutational landscape and significance across 12 major cancer types.12 种主要癌症类型的突变特征及意义。

Nature. 2013 Oct 17;502(7471):333-339. doi: 10.1038/nature12634.

Clinical whole-exome sequencing for the diagnosis of mendelian disorders.临床全外显子测序用于孟德尔疾病的诊断。

N Engl J Med. 2013 Oct 17;369(16):1502-11. doi: 10.1056/NEJMoa1306555. Epub 2013 Oct 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

用于重复基因组区域查询的遗传变异词库。

A thesaurus of genetic variation for interrogation of repetitive genomic regions.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献