• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

UPS-indel:一种用于插入缺失变异的通用定位系统。

UPS-indel: a Universal Positioning System for Indels.

作者信息

Hasan Mohammad Shabbir, Wu Xiaowei, Watson Layne T, Zhang Liqing

机构信息

Department of Computer Science, Virginia Tech, Blacksburg, VA, 24061, USA.

Department of Statistics, Virginia Tech, Blacksburg, VA, 24061, USA.

出版信息

Sci Rep. 2017 Oct 26;7(1):14106. doi: 10.1038/s41598-017-14400-1.

DOI:10.1038/s41598-017-14400-1
PMID:29074871
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5658412/
Abstract

Storing biologically equivalent indels as distinct entries in databases causes data redundancy, and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent indels. Moreover, a unified system is also desirable to compare the indel calling results produced by different tools. This paper describes UPS-indel, a utility tool that creates a universal positioning system for indels so that equivalent indels can be uniquely determined by their coordinates in the new system, which also can be used to compare different indel calling results. UPS-indel identifies 15% redundant indels in dbSNP, 29% in COSMIC coding, and 13% in COSMIC noncoding datasets across all human chromosomes, higher than previously reported. Comparing the performance of UPS-indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-indel is able to identify 456,352 more redundant indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-indel to state-of-the-art approaches for indel call set comparison demonstrates its clear superiority in finding common indels among call sets. UPS-indel is theoretically proven to find all equivalent indels, and thus exhaustive.

摘要

将生物学上等效的插入缺失作为不同条目存储在数据库中会导致数据冗余,并误导下游分析。因此,需要一个统一的系统来识别和表示等效的插入缺失。此外,还需要一个统一的系统来比较不同工具产生的插入缺失调用结果。本文介绍了UPS-indel,这是一个实用工具,它为插入缺失创建了一个通用定位系统,以便等效的插入缺失可以通过它们在新系统中的坐标唯一确定,该系统还可用于比较不同的插入缺失调用结果。UPS-indel在所有人类染色体的dbSNP中识别出15%的冗余插入缺失,在COSMIC编码中为29%,在COSMIC非编码数据集中为13%,高于先前报道的比例。将UPS-indel与现有的变异标准化工具vt normalize、BCFtools和GATK LeftAlignAndTrimVariants的性能进行比较,结果表明,除了这些工具共同报告的冗余插入缺失外,UPS-indel在dbSNP中还能够识别出多456,352个冗余插入缺失;在COSMIC编码中多2,118个,在COSMIC非编码插入缺失数据集中多553个。此外,将UPS-indel与用于插入缺失调用集比较的最先进方法进行比较,证明了它在查找调用集之间的常见插入缺失方面具有明显优势。理论上证明,UPS-indel可以找到所有等效的插入缺失,因此是详尽无遗的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e296/5658412/127c6e65921a/41598_2017_14400_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e296/5658412/62d8d4b45c8c/41598_2017_14400_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e296/5658412/24c167e55069/41598_2017_14400_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e296/5658412/8af687e62a6d/41598_2017_14400_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e296/5658412/161f0c83737f/41598_2017_14400_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e296/5658412/d2e4b5ee2ea1/41598_2017_14400_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e296/5658412/112c75f37a6a/41598_2017_14400_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e296/5658412/127c6e65921a/41598_2017_14400_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e296/5658412/62d8d4b45c8c/41598_2017_14400_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e296/5658412/24c167e55069/41598_2017_14400_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e296/5658412/8af687e62a6d/41598_2017_14400_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e296/5658412/161f0c83737f/41598_2017_14400_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e296/5658412/d2e4b5ee2ea1/41598_2017_14400_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e296/5658412/112c75f37a6a/41598_2017_14400_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e296/5658412/127c6e65921a/41598_2017_14400_Fig7_HTML.jpg

相似文献

1
UPS-indel: a Universal Positioning System for Indels.UPS-indel:一种用于插入缺失变异的通用定位系统。
Sci Rep. 2017 Oct 26;7(1):14106. doi: 10.1038/s41598-017-14400-1.
2
Vindel: a simple pipeline for checking indel redundancy.Vindel:一个用于检查插入缺失冗余的简单流程。
BMC Bioinformatics. 2014 Nov 19;15(1):359. doi: 10.1186/s12859-014-0359-1.
3
Performance evaluation of indel calling tools using real short-read data.使用真实短读长数据对插入缺失(Indel)检测工具进行性能评估。
Hum Genomics. 2015 Aug 19;9(1):20. doi: 10.1186/s40246-015-0042-2.
4
Comparison of insertion/deletion calling algorithms on human next-generation sequencing data.人类下一代测序数据中插入/缺失检测算法的比较。
BMC Res Notes. 2014 Dec 1;7:864. doi: 10.1186/1756-0500-7-864.
5
Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data.全外显子组测序数据中插入/缺失(INDELs)的优化检测
PLoS One. 2017 Aug 9;12(8):e0182272. doi: 10.1371/journal.pone.0182272. eCollection 2017.
6
Reducing INDEL calling errors in whole genome and exome sequencing data.降低全基因组和外显子组测序数据中 INDEL 调用错误。
Genome Med. 2014 Oct 28;6(10):89. doi: 10.1186/s13073-014-0089-z. eCollection 2014.
7
Equivalent indels--ambiguous functional classes and redundancy in databases.等效缺失-插入突变--数据库中功能类别不明确和冗余。
PLoS One. 2013 May 2;8(5):e62803. doi: 10.1371/journal.pone.0062803. Print 2013.
8
SNVSniffer: an integrated caller for germline and somatic single-nucleotide and indel mutations.SNVSniffer:一种用于种系和体细胞单核苷酸及插入缺失突变的综合检测工具。
BMC Syst Biol. 2016 Aug 1;10 Suppl 2(Suppl 2):47. doi: 10.1186/s12918-016-0300-5.
9
mInDel: a high-throughput and efficient pipeline for genome-wide InDel marker development.mInDel:一种用于全基因组插入缺失标记开发的高通量高效流程
BMC Genomics. 2016 Apr 14;17:290. doi: 10.1186/s12864-016-2614-5.
10
INDELseek: detection of complex insertions and deletions from next-generation sequencing data.INDELseek:从下一代测序数据中检测复杂插入和缺失
BMC Genomics. 2017 Jan 5;18(1):16. doi: 10.1186/s12864-016-3449-9.

引用本文的文献

1
The mutagenic forces shaping the genomes of lung cancer in never smokers.塑造非吸烟者肺癌基因组的诱变力量。
Nature. 2025 Jul 2. doi: 10.1038/s41586-025-09219-0.
2
The mutagenic forces shaping the genomic landscape of lung cancer in never smokers.塑造从不吸烟者肺癌基因组格局的诱变因素。
medRxiv. 2024 May 17:2024.05.15.24307318. doi: 10.1101/2024.05.15.24307318.
3
VarSCAT: A computational tool for sequence context annotations of genomic variants.VarSCAT:一个用于基因组变异序列上下文注释的计算工具。

本文引用的文献

1
VarMatch: robust matching of small variant datasets using flexible scoring schemes.VarMatch:使用灵活评分方案对小变异数据集进行稳健匹配。
Bioinformatics. 2017 May 1;33(9):1301-1308. doi: 10.1093/bioinformatics/btw797.
2
Indel variant analysis of short-read sequencing data with Scalpel.使用 Scalpel 对短读测序数据进行插入缺失变异分析。
Nat Protoc. 2016 Dec;11(12):2529-2548. doi: 10.1038/nprot.2016.150. Epub 2016 Nov 17.
3
Performance evaluation of indel calling tools using real short-read data.使用真实短读长数据对插入缺失(Indel)检测工具进行性能评估。
PLoS Comput Biol. 2023 Aug 11;19(8):e1010727. doi: 10.1371/journal.pcbi.1010727. eCollection 2023 Aug.
4
Mutation in BrGGL7 gene encoding a GDSL esterase / lipase causes male sterility in Chinese cabbage (Brassica rapa L. ssp. pekinensis).编码GDSL酯酶/脂肪酶的BrGGL7基因突变导致大白菜(Brassica rapa L. ssp. pekinensis)雄性不育。
Theor Appl Genet. 2022 Oct;135(10):3323-3335. doi: 10.1007/s00122-022-04165-1. Epub 2022 Jul 15.
5
Genomic and evolutionary classification of lung cancer in never smokers.非吸烟人群肺癌的基因组和进化分类。
Nat Genet. 2021 Sep;53(9):1348-1359. doi: 10.1038/s41588-021-00920-0. Epub 2021 Sep 6.
6
Uncovering missed indels by leveraging unmapped reads.利用未映射的读取来揭示遗漏的插入缺失。
Sci Rep. 2019 Jul 31;9(1):11093. doi: 10.1038/s41598-019-47405-z.
7
Best practices for benchmarking germline small-variant calls in human genomes.人类基因组中小变异calls 的基准测试最佳实践。
Nat Biotechnol. 2019 May;37(5):555-560. doi: 10.1038/s41587-019-0054-x. Epub 2019 Mar 11.
8
SPAI: an interactive platform for indel analysis.SPAI:一个用于插入缺失分析的交互式平台。
BMC Genomics. 2016 Aug 31;17 Suppl 5(Suppl 5):496. doi: 10.1186/s12864-016-2824-x.
Hum Genomics. 2015 Aug 19;9(1):20. doi: 10.1186/s40246-015-0042-2.
4
Repeat- and error-aware comparison of deletions.缺失的重复与错误感知比较
Bioinformatics. 2015 Sep 15;31(18):2947-54. doi: 10.1093/bioinformatics/btv304. Epub 2015 May 15.
5
An analytical framework for optimizing variant discovery from personal genomes.用于优化从个人基因组中发现变异的分析框架。
Nat Commun. 2015 Feb 25;6:6275. doi: 10.1038/ncomms7275.
6
Unified representation of genetic variants.基因变异的统一表示
Bioinformatics. 2015 Jul 1;31(13):2202-4. doi: 10.1093/bioinformatics/btv112. Epub 2015 Feb 19.
7
Vindel: a simple pipeline for checking indel redundancy.Vindel:一个用于检查插入缺失冗余的简单流程。
BMC Bioinformatics. 2014 Nov 19;15(1):359. doi: 10.1186/s12859-014-0359-1.
8
COSMIC: exploring the world's knowledge of somatic mutations in human cancer.COSMIC:探索全球关于人类癌症体细胞突变的知识。
Nucleic Acids Res. 2015 Jan;43(Database issue):D805-11. doi: 10.1093/nar/gku1075. Epub 2014 Oct 29.
9
Consensus Genotyper for Exome Sequencing (CGES): improving the quality of exome variant genotypes.外显子组测序一致性基因分型器(CGES):提高外显子组变异基因型的质量
Bioinformatics. 2015 Jan 15;31(2):187-93. doi: 10.1093/bioinformatics/btu591. Epub 2014 Sep 29.
10
Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications.整合基于图谱、组装和单倍型的方法以在临床测序应用中进行变异检测。
Nat Genet. 2014 Aug;46(8):912-918. doi: 10.1038/ng.3036. Epub 2014 Jul 13.