• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

VarSCAT:一个用于基因组变异序列上下文注释的计算工具。

VarSCAT: A computational tool for sequence context annotations of genomic variants.

机构信息

Turku Bioscience Centre, University of Turku and Åbo Akademi University, Turku, Finland.

InFLAMES Research Flagship Center, University of Turku, Turku, Finland.

出版信息

PLoS Comput Biol. 2023 Aug 11;19(8):e1010727. doi: 10.1371/journal.pcbi.1010727. eCollection 2023 Aug.

DOI:10.1371/journal.pcbi.1010727
PMID:37566612
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10446208/
Abstract

The sequence contexts of genomic variants play important roles in understanding biological significances of variants and potential sequencing related variant calling issues. However, methods for assessing the diverse sequence contexts of genomic variants such as tandem repeats and unambiguous annotations have been limited. Herein, we describe the Variant Sequence Context Annotation Tool (VarSCAT) for annotating the sequence contexts of genomic variants, including breakpoint ambiguities, flanking bases of variants, wildtype/mutated DNA sequences, variant nomenclatures, distances between adjacent variants, tandem repeat regions, and custom annotation with user customizable options. Our analyses demonstrate that VarSCAT is more versatile and customizable than the currently available methods or strategies for annotating variants in short tandem repeat (STR) regions or insertions and deletions (indels) with breakpoint ambiguity. Variant sequence context annotations of high-confidence human variant sets with VarSCAT revealed that more than 75% of all human individual germline and clinically relevant indels have breakpoint ambiguities. Moreover, we illustrate that more than 80% of human individual germline small variants in STR regions are indels and that the sizes of these indels correlated with STR motif sizes. VarSCAT is available from https://github.com/elolab/VarSCAT.

摘要

基因组变异的序列背景在理解变异的生物学意义和潜在的测序相关变异调用问题方面起着重要作用。然而,评估基因组变异(如串联重复和明确注释)的各种序列背景的方法一直受到限制。在此,我们描述了用于注释基因组变异序列背景的变体序列上下文注释工具(VarSCAT),包括断点模糊性、变异侧翼碱基、野生型/突变 DNA 序列、变体命名法、相邻变体之间的距离、串联重复区域以及用户可自定义选项的自定义注释。我们的分析表明,VarSCAT 比目前可用的方法或策略更灵活、更具可定制性,可用于注释短串联重复(STR)区域或具有断点模糊性的插入和缺失(indels)中的变体。使用 VarSCAT 对高可信度人类变异集进行变体序列上下文注释表明,超过 75%的所有人类个体种系和临床相关 indels 具有断点模糊性。此外,我们表明,超过 80%的 STR 区域中的人类个体种系小变异是 indels,并且这些 indels 的大小与 STR 基序大小相关。VarSCAT 可从 https://github.com/elolab/VarSCAT 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/e24a6ec5822f/pcbi.1010727.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/23bd2873db3a/pcbi.1010727.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/3002350475e9/pcbi.1010727.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/1008b0f4d8bf/pcbi.1010727.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/337b69a4110d/pcbi.1010727.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/c7fa86425be5/pcbi.1010727.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/f413ea9242ea/pcbi.1010727.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/846ff053eafc/pcbi.1010727.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/b2fbfd7d3993/pcbi.1010727.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/e24a6ec5822f/pcbi.1010727.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/23bd2873db3a/pcbi.1010727.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/3002350475e9/pcbi.1010727.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/1008b0f4d8bf/pcbi.1010727.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/337b69a4110d/pcbi.1010727.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/c7fa86425be5/pcbi.1010727.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/f413ea9242ea/pcbi.1010727.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/846ff053eafc/pcbi.1010727.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/b2fbfd7d3993/pcbi.1010727.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1cb5/10446208/e24a6ec5822f/pcbi.1010727.g009.jpg

相似文献

1
VarSCAT: A computational tool for sequence context annotations of genomic variants.VarSCAT:一个用于基因组变异序列上下文注释的计算工具。
PLoS Comput Biol. 2023 Aug 11;19(8):e1010727. doi: 10.1371/journal.pcbi.1010727. eCollection 2023 Aug.
2
INDELseek: detection of complex insertions and deletions from next-generation sequencing data.INDELseek:从下一代测序数据中检测复杂插入和缺失
BMC Genomics. 2017 Jan 5;18(1):16. doi: 10.1186/s12864-016-3449-9.
3
Comparative assessments of indel annotations in healthy and cancer genomes with next-generation sequencing data.基于下一代测序数据的健康和癌症基因组中插入缺失注释的比较评估。
BMC Med Genomics. 2020 Nov 10;13(1):170. doi: 10.1186/s12920-020-00818-6.
4
LUSTR: a new customizable tool for calling genome-wide germline and somatic short tandem repeat variants.LUSTR:一种用于检测全基因组种系和体细胞短串联重复序列变异的新型可定制工具。
BMC Genomics. 2024 Jan 26;25(1):115. doi: 10.1186/s12864-023-09935-9.
5
indelPost: harmonizing ambiguities in simple and complex indel alignments.indelPost:协调简单和复杂插入缺失比对中的模糊性。
Bioinformatics. 2022 Jan 3;38(2):549-551. doi: 10.1093/bioinformatics/btab601.
6
SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data.SInC:一种准确且快速的基于错误模型的 SNP、Indel 和 CNV 模拟器,结合了用于短读序列数据的读取生成器。
BMC Bioinformatics. 2014 Feb 5;15:40. doi: 10.1186/1471-2105-15-40.
7
Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications.Manta:用于种系和癌症测序应用的结构变异和插入缺失的快速检测。
Bioinformatics. 2016 Apr 15;32(8):1220-2. doi: 10.1093/bioinformatics/btv710. Epub 2015 Dec 8.
8
Tool evaluation for the detection of variably sized indels from next generation whole genome and targeted sequencing data.工具评估用于检测下一代全基因组和靶向测序数据中的可变大小插入缺失。
PLoS Comput Biol. 2022 Feb 17;18(2):e1009269. doi: 10.1371/journal.pcbi.1009269. eCollection 2022 Feb.
9
An integrative approach to predicting the functional effects of small indels in non-coding regions of the human genome.一种预测人类基因组非编码区域小插入缺失功能效应的综合方法。
BMC Bioinformatics. 2017 Oct 6;18(1):442. doi: 10.1186/s12859-017-1862-y.
10
Cruxome: a powerful tool for annotating, interpreting and reporting genetic variants.Cruxome:注释、解释和报告遗传变异的有力工具。
BMC Genomics. 2021 Jun 3;22(1):407. doi: 10.1186/s12864-021-07728-6.

引用本文的文献

1
Unravelling mutational signatures with plasma circulating tumour DNA.利用血浆循环肿瘤 DNA 揭示突变特征。
Nat Commun. 2024 Nov 14;15(1):9876. doi: 10.1038/s41467-024-54193-2.

本文引用的文献

1
Benchmarking challenging small variants with linked and long reads.使用连锁读段和长读段对具有挑战性的小变异进行基准测试。
Cell Genom. 2022 May;2(5). doi: 10.1016/j.xgen.2022.100128.
2
Tool evaluation for the detection of variably sized indels from next generation whole genome and targeted sequencing data.工具评估用于检测下一代全基因组和靶向测序数据中的可变大小插入缺失。
PLoS Comput Biol. 2022 Feb 17;18(2):e1009269. doi: 10.1371/journal.pcbi.1009269. eCollection 2022 Feb.
3
Landscape of somatic single nucleotide variants and indels in colorectal cancer and impact on survival.
结直肠癌中体细胞单核苷酸变异和插入缺失的全景及其对生存的影响。
Nat Commun. 2020 Jul 20;11(1):3644. doi: 10.1038/s41467-020-17386-z.
4
Variant calling on the GRCh38 assembly with the data from phase three of the 1000 Genomes Project.使用千人基因组计划第三阶段的数据在GRCh38装配上进行变异检测。
Wellcome Open Res. 2019 Dec 30;4:50. doi: 10.12688/wellcomeopenres.15126.2. eCollection 2019.
5
Microsatellite instability: a review of what the oncologist should know.微卫星不稳定性:肿瘤学家应了解的内容综述
Cancer Cell Int. 2020 Jan 13;20:16. doi: 10.1186/s12935-019-1091-8. eCollection 2020.
6
Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases.串联重复导致序列组装错误,并对基因组和蛋白质数据库提出了多层次的挑战。
Nucleic Acids Res. 2019 Dec 2;47(21):10994-11006. doi: 10.1093/nar/gkz841.
7
Profiling the genome-wide landscape of tandem repeat expansions.全基因组串联重复扩展图谱分析。
Nucleic Acids Res. 2019 Sep 5;47(15):e90. doi: 10.1093/nar/gkz501.
8
SeqTailor: a user-friendly webserver for the extraction of DNA or protein sequences from next-generation sequencing data.SeqTailor:一个用户友好的网络服务器,用于从下一代测序数据中提取 DNA 或蛋白质序列。
Nucleic Acids Res. 2019 Jul 2;47(W1):W623-W631. doi: 10.1093/nar/gkz326.
9
An open resource for accurately benchmarking small variant and reference calls.用于准确基准测试小型变体和参考调用的开放资源。
Nat Biotechnol. 2019 May;37(5):561-566. doi: 10.1038/s41587-019-0074-6. Epub 2019 Apr 1.
10
Best practices for benchmarking germline small-variant calls in human genomes.人类基因组中小变异calls 的基准测试最佳实践。
Nat Biotechnol. 2019 May;37(5):555-560. doi: 10.1038/s41587-019-0054-x. Epub 2019 Mar 11.