下一代测序数据中插入缺失的鉴定。

Identification of indels in next-generation sequencing data.

作者信息

Ratan Aakrosh, Olson Thomas L, Loughran Thomas P, Miller Webb

机构信息

Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, 506, Wartik Laboratory, University Park, PA, 16802, USA.

Department of Public Health Sciences and Center for Public Health Genomics, University of Virginia, Charlottesville, VA, 22908, USA.

出版信息

BMC Bioinformatics. 2015 Feb 13;16(1):42. doi: 10.1186/s12859-015-0483-6.

DOI:10.1186/s12859-015-0483-6

PMID:25879703

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4339746/

Abstract

BACKGROUND

The discovery and mapping of genomic variants is an essential step in most analysis done using sequencing reads. There are a number of mature software packages and associated pipelines that can identify single nucleotide polymorphisms (SNPs) with a high degree of concordance. However, the same cannot be said for tools that are used to identify the other types of variants. Indels represent the second most frequent class of variants in the human genome, after single nucleotide polymorphisms. The reliable detection of indels is still a challenging problem, especially for variants that are longer than a few bases.

RESULTS

We have developed a set of algorithms and heuristics collectively called indelMINER to identify indels from whole genome resequencing datasets using paired-end reads. indelMINER uses a split-read approach to identify the precise breakpoints for indels of size less than a user specified threshold, and supplements that with a paired-end approach to identify larger variants that are frequently missed with the split-read approach. We use simulated and real datasets to show that an implementation of the algorithm performs favorably when compared to several existing tools.

CONCLUSIONS

indelMINER can be used effectively to identify indels in whole-genome resequencing projects. The output is provided in the VCF format along with additional information about the variant, including information about its presence or absence in another sample. The source code and documentation for indelMINER can be freely downloaded from www.bx.psu.edu/miller_lab/indelMINER.tar.gz .

摘要

背景

基因组变异的发现与定位是大多数基于测序 reads 进行的分析中的关键步骤。有许多成熟的软件包及相关流程能够高度一致地识别单核苷酸多态性（SNP）。然而，用于识别其他类型变异的工具却并非如此。插入缺失（Indel）是人类基因组中仅次于单核苷酸多态性的第二大常见变异类型。可靠地检测插入缺失仍然是一个具有挑战性的问题，尤其是对于长度超过几个碱基的变异。

结果

我们开发了一组统称为 indelMINER 的算法和启发式方法，用于使用双末端 reads 从全基因组重测序数据集中识别插入缺失。indelMINER 使用分裂 reads 方法来识别大小小于用户指定阈值的插入缺失的精确断点，并辅以双末端方法来识别分裂 reads 方法经常遗漏的较大变异。我们使用模拟和真实数据集表明，与几个现有工具相比，该算法的实现表现良好。

结论

indelMINER 可有效地用于全基因组重测序项目中识别插入缺失。输出以 VCF 格式提供，并附带有关变异的其他信息，包括其在另一个样本中是否存在的信息。indelMINER 的源代码和文档可从 www.bx.psu.edu/miller_lab/indelMINER.tar.gz 免费下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44d6/4339746/e8981321f7b6/12859_2015_483_Fig1_HTML.jpg

相似文献

Identification of indels in next-generation sequencing data.下一代测序数据中插入缺失的鉴定。

BMC Bioinformatics. 2015 Feb 13;16(1):42. doi: 10.1186/s12859-015-0483-6.

SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data.SInC：一种准确且快速的基于错误模型的 SNP、Indel 和 CNV 模拟器，结合了用于短读序列数据的读取生成器。

BMC Bioinformatics. 2014 Feb 5;15:40. doi: 10.1186/1471-2105-15-40.

Optimized detection of insertions/deletions (INDELs) in whole-exome sequencing data.全外显子组测序数据中插入/缺失（INDELs）的优化检测

PLoS One. 2017 Aug 9;12(8):e0182272. doi: 10.1371/journal.pone.0182272. eCollection 2017.

The challenge of detecting indels in bacterial genomes from short-read sequencing data.从短读长测序数据中检测细菌基因组插入缺失的挑战。

J Biotechnol. 2017 May 20;250:11-15. doi: 10.1016/j.jbiotec.2017.02.026. Epub 2017 Mar 4.

Leveraging known genomic variants to improve detection of variants, especially close-by Indels.利用已知的基因组变异来提高变异的检测能力，特别是附近的 Indels。

Bioinformatics. 2018 Sep 1;34(17):2918-2926. doi: 10.1093/bioinformatics/bty183.

Performance evaluation of indel calling tools using real short-read data.使用真实短读长数据对插入缺失（Indel）检测工具进行性能评估。

Hum Genomics. 2015 Aug 19;9(1):20. doi: 10.1186/s40246-015-0042-2.

ReliableGenome: annotation of genomic regions with high/low variant calling concordance.可靠基因组：具有高/低变异检测一致性的基因组区域注释。

Bioinformatics. 2017 Jan 15;33(2):155-160. doi: 10.1093/bioinformatics/btw587. Epub 2016 Sep 7.

A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree.通过对一个包含17名成员的三代家系进行测序，经遗传继承验证的540万个定相人类变异的参考数据集。

Genome Res. 2017 Jan;27(1):157-164. doi: 10.1101/gr.210500.116. Epub 2016 Nov 30.

mInDel: a high-throughput and efficient pipeline for genome-wide InDel marker development.mInDel：一种用于全基因组插入缺失标记开发的高通量高效流程

BMC Genomics. 2016 Apr 14;17:290. doi: 10.1186/s12864-016-2614-5.

Comparative analysis of algorithms for next-generation sequencing read alignment.下一代测序读段比对算法的比较分析。

Bioinformatics. 2011 Oct 15;27(20):2790-6. doi: 10.1093/bioinformatics/btr477. Epub 2011 Aug 19.

引用本文的文献

VISTA: an integrated framework for structural variant discovery.VISTA：一个用于结构变异发现的集成框架。

Brief Bioinform. 2024 Jul 25;25(5). doi: 10.1093/bib/bbae462.

Assisting the analysis of insertions and deletions using regional allele frequencies.利用区域等位基因频率辅助插入和缺失分析。

Funct Integr Genomics. 2024 May 20;24(3):104. doi: 10.1007/s10142-024-01358-3.

High-resolution mapping reveals the mechanism and contribution of genome insertions and deletions to RNA virus evolution.高分辨率图谱揭示了基因组插入和缺失对 RNA 病毒进化的机制和贡献。

Proc Natl Acad Sci U S A. 2023 Aug;120(31):e2304667120. doi: 10.1073/pnas.2304667120. Epub 2023 Jul 24.

A comprehensive benchmarking of WGS-based deletion structural variant callers.基于 WGS 的缺失结构变异调用器的综合基准测试。

Brief Bioinform. 2022 Jul 18;23(4). doi: 10.1093/bib/bbac221.

gene resequencing in the Malagasy, a population at the crossroads between Asia and Africa: a pilot study.马达加斯加的基因重测序：一个连接亚非大陆的人群的初步研究。

Pharmacogenomics. 2022 Apr;23(5):315-325. doi: 10.2217/pgs-2021-0146. Epub 2022 Mar 1.

Phase I study of the PARP inhibitor talazoparib with radiation therapy for locally recurrent gynecologic cancers.PARP抑制剂他拉唑帕尼联合放射治疗局部复发性妇科癌症的I期研究。

Clin Transl Radiat Oncol. 2019 Dec 28;21:56-61. doi: 10.1016/j.ctro.2019.12.005. eCollection 2020 Mar.

Mutational Landscape of Spontaneous Base Substitutions and Small Indels in Experimental Populations of Differing Size.自发碱基替换和小插入缺失在不同大小实验群体中的突变景观。

Genetics. 2019 Jul;212(3):837-854. doi: 10.1534/genetics.119.302054. Epub 2019 May 20.

Evolutionary journey of the retroviral restriction gene .逆转录病毒限制基因的进化历程。

Proc Natl Acad Sci U S A. 2018 Oct 2;115(40):10130-10135. doi: 10.1073/pnas.1808516115. Epub 2018 Sep 17.

Comparative Molecular Analysis of Gastrointestinal Adenocarcinomas.胃肠道腺癌的比较分子分析。

Cancer Cell. 2018 Apr 9;33(4):721-735.e8. doi: 10.1016/j.ccell.2018.03.010. Epub 2018 Apr 2.

A Comprehensive Pan-Cancer Molecular Study of Gynecologic and Breast Cancers.妇科和乳腺癌的全面泛癌分子研究。

Cancer Cell. 2018 Apr 9;33(4):690-705.e9. doi: 10.1016/j.ccell.2018.03.014. Epub 2018 Apr 2.

本文引用的文献

Comparison of sequencing platforms for single nucleotide variant calls in a human sample.比较人类样本中单核苷酸变异调用的测序平台。

PLoS One. 2013;8(2):e55089. doi: 10.1371/journal.pone.0055089. Epub 2013 Feb 6.

SOAPindel: efficient identification of indels from short paired reads.SOAPindel：从短配对读取中有效识别插入缺失。

Genome Res. 2013 Jan;23(1):195-200. doi: 10.1101/gr.132480.111. Epub 2012 Sep 12.

The Human Gene Mutation Database (HGMD) and its exploitation in the fields of personalized genomics and molecular evolution.人类基因突变数据库（HGMD）及其在个性化基因组学和分子进化领域的应用。

Curr Protoc Bioinformatics. 2012 Sep;Chapter 1:1.13.1-1.13.20. doi: 10.1002/0471250953.bi0113s39.

PRISM: pair-read informed split-read mapping for base-pair level detection of insertion, deletion and structural variants.PRISM：基于双读信息的分读比对算法，用于检测插入、缺失和结构变异的碱基对水平。

Bioinformatics. 2012 Oct 15;28(20):2576-83. doi: 10.1093/bioinformatics/bts484. Epub 2012 Jul 31.

Sequence analysis of mutations and translocations across breast cancer subtypes.乳腺癌亚型突变和易位的序列分析。

Nature. 2012 Jun 20;486(7403):405-9. doi: 10.1038/nature11154.

pIRS: Profile-based Illumina pair-end reads simulator.pIRS：基于谱的 Illumina 双端读取模拟器。

Bioinformatics. 2012 Jun 1;28(11):1533-5. doi: 10.1093/bioinformatics/bts187. Epub 2012 Apr 15.

Detecting and annotating genetic variations using the HugeSeq pipeline.使用HugeSeq流程检测和注释基因变异。

Nat Biotechnol. 2012 Mar 7;30(3):226-9. doi: 10.1038/nbt.2134.

NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy.NCBI 参考序列（RefSeq）：现状、新特性和基因组注释政策。

Nucleic Acids Res. 2012 Jan;40(Database issue):D130-5. doi: 10.1093/nar/gkr1079. Epub 2011 Nov 24.

Natural genetic variation caused by small insertions and deletions in the human genome.人类基因组中小的插入和缺失引起的自然遗传变异。

Genome Res. 2011 Jun;21(6):830-9. doi: 10.1101/gr.115907.110. Epub 2011 Apr 1.

CNVnator: an approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing.CNVnator：一种从家族和人群基因组测序中发现、基因分型和表征典型和非典型 CNV 的方法。

Genome Res. 2011 Jun;21(6):974-84. doi: 10.1101/gr.114876.110. Epub 2011 Feb 7.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

下一代测序数据中插入缺失的鉴定。

Identification of indels in next-generation sequencing data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献