Suppr超能文献

下一代测序数据中插入缺失的鉴定。

Identification of indels in next-generation sequencing data.

作者信息

Ratan Aakrosh, Olson Thomas L, Loughran Thomas P, Miller Webb

机构信息

Center for Comparative Genomics and Bioinformatics, Pennsylvania State University, 506, Wartik Laboratory, University Park, PA, 16802, USA.

Department of Public Health Sciences and Center for Public Health Genomics, University of Virginia, Charlottesville, VA, 22908, USA.

出版信息

BMC Bioinformatics. 2015 Feb 13;16(1):42. doi: 10.1186/s12859-015-0483-6.

Abstract

BACKGROUND

The discovery and mapping of genomic variants is an essential step in most analysis done using sequencing reads. There are a number of mature software packages and associated pipelines that can identify single nucleotide polymorphisms (SNPs) with a high degree of concordance. However, the same cannot be said for tools that are used to identify the other types of variants. Indels represent the second most frequent class of variants in the human genome, after single nucleotide polymorphisms. The reliable detection of indels is still a challenging problem, especially for variants that are longer than a few bases.

RESULTS

We have developed a set of algorithms and heuristics collectively called indelMINER to identify indels from whole genome resequencing datasets using paired-end reads. indelMINER uses a split-read approach to identify the precise breakpoints for indels of size less than a user specified threshold, and supplements that with a paired-end approach to identify larger variants that are frequently missed with the split-read approach. We use simulated and real datasets to show that an implementation of the algorithm performs favorably when compared to several existing tools.

CONCLUSIONS

indelMINER can be used effectively to identify indels in whole-genome resequencing projects. The output is provided in the VCF format along with additional information about the variant, including information about its presence or absence in another sample. The source code and documentation for indelMINER can be freely downloaded from www.bx.psu.edu/miller_lab/indelMINER.tar.gz .

摘要

背景

基因组变异的发现与定位是大多数基于测序 reads 进行的分析中的关键步骤。有许多成熟的软件包及相关流程能够高度一致地识别单核苷酸多态性(SNP)。然而,用于识别其他类型变异的工具却并非如此。插入缺失(Indel)是人类基因组中仅次于单核苷酸多态性的第二大常见变异类型。可靠地检测插入缺失仍然是一个具有挑战性的问题,尤其是对于长度超过几个碱基的变异。

结果

我们开发了一组统称为 indelMINER 的算法和启发式方法,用于使用双末端 reads 从全基因组重测序数据集中识别插入缺失。indelMINER 使用分裂 reads 方法来识别大小小于用户指定阈值的插入缺失的精确断点,并辅以双末端方法来识别分裂 reads 方法经常遗漏的较大变异。我们使用模拟和真实数据集表明,与几个现有工具相比,该算法的实现表现良好。

结论

indelMINER 可有效地用于全基因组重测序项目中识别插入缺失。输出以 VCF 格式提供,并附带有关变异的其他信息,包括其在另一个样本中是否存在的信息。indelMINER 的源代码和文档可从 www.bx.psu.edu/miller_lab/indelMINER.tar.gz 免费下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/44d6/4339746/e8981321f7b6/12859_2015_483_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验