Suppr超能文献

INDELseek:从下一代测序数据中检测复杂插入和缺失

INDELseek: detection of complex insertions and deletions from next-generation sequencing data.

作者信息

Au Chun Hang, Leung Anskar Y H, Kwong Ava, Chan Tsun Leung, Ma Edmond S K

机构信息

Division of Molecular Pathology, Department of Pathology, Hong Kong Sanatorium & Hospital, Happy Valley, Hong Kong SAR.

Department of Medicine, The University of Hong Kong, Pok Fu Lam, Hong Kong SAR.

出版信息

BMC Genomics. 2017 Jan 5;18(1):16. doi: 10.1186/s12864-016-3449-9.

Abstract

BACKGROUND

Complex insertions and deletions (indels) from next-generation sequencing (NGS) data were prone to escape detection by currently available variant callers as shown by large-scale human genomics studies. Somatic and germline complex indels in key disease driver genes could be missed in NGS-based genomics studies.

RESULTS

INDELseek is an open-source complex indel caller designed for NGS data of random fragments and PCR amplicons. The key differentiating factor of INDELseek is that each NGS read alignment was examined as a whole instead of "pileup" of each reference position across multiple alignments. In benchmarking against the reference material NA12878 genome (n = 160 derived from high-confidence variant calls), GATK, SAMtools and INDELseek showed complex indel detection sensitivities of 0%, 0% and 100%, respectively. INDELseek also detected all known germline (BRCA1 and BRCA2) and somatic (CALR and JAK2) complex indels in human clinical samples (n = 8). Further experiments validated all 10 detected KIT complex indels in a discovery cohort of clinical samples. In silico semi-simulation showed sensitivities of 93.7-96.2% based on 8671 unique complex indels in >5000 genes from dbSNP and COSMIC. We also demonstrated the importance of complex indel detection in accurately annotating BRCA1, BRCA2 and TP53 mutations with gained or rescued protein-truncating effects.

CONCLUSIONS

INDELseek is an accurate and versatile tool for complex indel detection in NGS data. It complements other variant callers in NGS-based genomics studies targeting a wide spectrum of genetic variations.

摘要

背景

大规模人类基因组学研究表明,下一代测序(NGS)数据中的复杂插入和缺失(indel)容易逃脱当前可用变异检测工具的检测。基于NGS的基因组学研究可能会遗漏关键疾病驱动基因中的体细胞和种系复杂indel。

结果

INDELseek是一款为随机片段和PCR扩增子的NGS数据设计的开源复杂indel检测工具。INDELseek的关键区别因素在于,每个NGS读段比对都作为一个整体进行检查,而不是对多个比对中每个参考位置的“堆积”进行检查。在针对参考材料NA12878基因组(n = 160,源自高可信度变异调用)进行的基准测试中,GATK、SAMtools和INDELseek的复杂indel检测灵敏度分别为0%、0%和100%。INDELseek还检测出了人类临床样本(n = 8)中所有已知的种系(BRCA1和BRCA2)和体细胞(CALR和JAK2)复杂indel。进一步的实验验证了临床样本发现队列中检测到的所有10个KIT复杂indel。基于来自dbSNP和COSMIC的5000多个基因中的8671个独特复杂indel,计算机模拟半模拟显示灵敏度为93.7 - 96.2%。我们还证明了复杂indel检测在准确注释具有获得性或挽救性蛋白质截短效应的BRCA1、BRCA2和TP53突变中的重要性。

结论

INDELseek是一种用于检测NGS数据中复杂indel的准确且通用的工具。它在基于NGS的基因组学研究中针对广泛的遗传变异补充了其他变异检测工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1dc6/5217656/dd95aaa234e5/12864_2016_3449_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验