Suppr超能文献

UPS-indel:一种用于插入缺失变异的通用定位系统。

UPS-indel: a Universal Positioning System for Indels.

作者信息

Hasan Mohammad Shabbir, Wu Xiaowei, Watson Layne T, Zhang Liqing

机构信息

Department of Computer Science, Virginia Tech, Blacksburg, VA, 24061, USA.

Department of Statistics, Virginia Tech, Blacksburg, VA, 24061, USA.

出版信息

Sci Rep. 2017 Oct 26;7(1):14106. doi: 10.1038/s41598-017-14400-1.

Abstract

Storing biologically equivalent indels as distinct entries in databases causes data redundancy, and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent indels. Moreover, a unified system is also desirable to compare the indel calling results produced by different tools. This paper describes UPS-indel, a utility tool that creates a universal positioning system for indels so that equivalent indels can be uniquely determined by their coordinates in the new system, which also can be used to compare different indel calling results. UPS-indel identifies 15% redundant indels in dbSNP, 29% in COSMIC coding, and 13% in COSMIC noncoding datasets across all human chromosomes, higher than previously reported. Comparing the performance of UPS-indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-indel is able to identify 456,352 more redundant indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-indel to state-of-the-art approaches for indel call set comparison demonstrates its clear superiority in finding common indels among call sets. UPS-indel is theoretically proven to find all equivalent indels, and thus exhaustive.

摘要

将生物学上等效的插入缺失作为不同条目存储在数据库中会导致数据冗余,并误导下游分析。因此,需要一个统一的系统来识别和表示等效的插入缺失。此外,还需要一个统一的系统来比较不同工具产生的插入缺失调用结果。本文介绍了UPS-indel,这是一个实用工具,它为插入缺失创建了一个通用定位系统,以便等效的插入缺失可以通过它们在新系统中的坐标唯一确定,该系统还可用于比较不同的插入缺失调用结果。UPS-indel在所有人类染色体的dbSNP中识别出15%的冗余插入缺失,在COSMIC编码中为29%,在COSMIC非编码数据集中为13%,高于先前报道的比例。将UPS-indel与现有的变异标准化工具vt normalize、BCFtools和GATK LeftAlignAndTrimVariants的性能进行比较,结果表明,除了这些工具共同报告的冗余插入缺失外,UPS-indel在dbSNP中还能够识别出多456,352个冗余插入缺失;在COSMIC编码中多2,118个,在COSMIC非编码插入缺失数据集中多553个。此外,将UPS-indel与用于插入缺失调用集比较的最先进方法进行比较,证明了它在查找调用集之间的常见插入缺失方面具有明显优势。理论上证明,UPS-indel可以找到所有等效的插入缺失,因此是详尽无遗的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e296/5658412/62d8d4b45c8c/41598_2017_14400_Fig1_HTML.jpg

相似文献

引用本文的文献

5
Genomic and evolutionary classification of lung cancer in never smokers.非吸烟人群肺癌的基因组和进化分类。
Nat Genet. 2021 Sep;53(9):1348-1359. doi: 10.1038/s41588-021-00920-0. Epub 2021 Sep 6.
8
SPAI: an interactive platform for indel analysis.SPAI:一个用于插入缺失分析的交互式平台。
BMC Genomics. 2016 Aug 31;17 Suppl 5(Suppl 5):496. doi: 10.1186/s12864-016-2824-x.

本文引用的文献

4
Repeat- and error-aware comparison of deletions.缺失的重复与错误感知比较
Bioinformatics. 2015 Sep 15;31(18):2947-54. doi: 10.1093/bioinformatics/btv304. Epub 2015 May 15.
6
Unified representation of genetic variants.基因变异的统一表示
Bioinformatics. 2015 Jul 1;31(13):2202-4. doi: 10.1093/bioinformatics/btv112. Epub 2015 Feb 19.
8
COSMIC: exploring the world's knowledge of somatic mutations in human cancer.COSMIC:探索全球关于人类癌症体细胞突变的知识。
Nucleic Acids Res. 2015 Jan;43(Database issue):D805-11. doi: 10.1093/nar/gku1075. Epub 2014 Oct 29.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验