快速且耐受 SNP 的短读长中复杂变体和剪接检测

Fast and SNP-tolerant detection of complex variants and splicing in short reads.

机构信息

Department of Bioinformatics, Genentech, Inc., 1 DNA Way, South San Francisco, CA, USA.

出版信息

Bioinformatics. 2010 Apr 1;26(7):873-81. doi: 10.1093/bioinformatics/btq057. Epub 2010 Feb 10.

DOI:10.1093/bioinformatics/btq057

PMID:20147302

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2844994/

Abstract

MOTIVATION

Next-generation sequencing captures sequence differences in reads relative to a reference genome or transcriptome, including splicing events and complex variants involving multiple mismatches and long indels. We present computational methods for fast detection of complex variants and splicing in short reads, based on a successively constrained search process of merging and filtering position lists from a genomic index. Our methods are implemented in GSNAP (Genomic Short-read Nucleotide Alignment Program), which can align both single- and paired-end reads as short as 14 nt and of arbitrarily long length. It can detect short- and long-distance splicing, including interchromosomal splicing, in individual reads, using probabilistic models or a database of known splice sites. Our program also permits SNP-tolerant alignment to a reference space of all possible combinations of major and minor alleles, and can align reads from bisulfite-treated DNA for the study of methylation state.

RESULTS

In comparison testing, GSNAP has speeds comparable to existing programs, especially in reads of > or=70 nt and is fastest in detecting complex variants with four or more mismatches or insertions of 1-9 nt and deletions of 1-30 nt. Although SNP tolerance does not increase alignment yield substantially, it affects alignment results in 7-8% of transcriptional reads, typically by revealing alternate genomic mappings for a read. Simulations of bisulfite-converted DNA show a decrease in identifying genomic positions uniquely in 6% of 36 nt reads and 3% of 70 nt reads.

AVAILABILITY

Source code in C and utility programs in Perl are freely available for download as part of the GMAP package at http://share.gene.com/gmap.

摘要

动机

下一代测序技术可捕获相对于参考基因组或转录组的读取序列差异，包括剪接事件和涉及多个错配和长插入缺失的复杂变体。我们提出了一种基于从基因组索引中合并和过滤位置列表的连续约束搜索过程的快速检测短读中复杂变体和剪接的计算方法。我们的方法在 GSNAP（基因组短读核苷酸对齐程序）中实现，它可以对齐短至 14 个核苷酸且长度任意的单端和双端读取。它可以使用概率模型或已知剪接位点数据库在单个读取中检测短距离和长距离剪接，包括染色体间剪接。我们的程序还允许对主要和次要等位基因的所有可能组合的参考空间进行 SNP 容忍对齐，并可以对齐经亚硫酸氢盐处理的 DNA 的读取，以研究甲基化状态。

结果

在比较测试中，GSNAP 的速度与现有程序相当，尤其是在 > = 70 个核苷酸的读取中，并且在检测具有四个或更多错配或 1-9 个核苷酸插入和 1-30 个核苷酸缺失的复杂变体时速度最快。尽管 SNP 容忍度不会大大增加对齐产量，但它会影响 7-8%的转录读取的对齐结果，通常通过为读取揭示替代的基因组映射来实现。亚硫酸氢盐转化 DNA 的模拟显示，在 36 个核苷酸读取的 6%和 70 个核苷酸读取的 3%中，识别唯一基因组位置的能力下降。

可用性

C 语言源代码和 Perl 实用程序作为 GMAP 包的一部分免费提供下载，网址为 http://share.gene.com/gmap。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f2b/2844994/9bf2b9b394dd/btq057f1.jpg

相似文献

Fast and SNP-tolerant detection of complex variants and splicing in short reads.快速且耐受 SNP 的短读长中复杂变体和剪接检测

Bioinformatics. 2010 Apr 1;26(7):873-81. doi: 10.1093/bioinformatics/btq057. Epub 2010 Feb 10.

GMAP and GSNAP for Genomic Sequence Alignment: Enhancements to Speed, Accuracy, and Functionality.用于基因组序列比对的GMAP和GSNAP：速度、准确性及功能的提升

Methods Mol Biol. 2016;1418:283-334. doi: 10.1007/978-1-4939-3578-9_15.

Fast and SNP-aware short read alignment with SALT.基于 SALT 的快速 SNP 感知短读序列比对。

BMC Bioinformatics. 2021 Aug 25;22(Suppl 9):172. doi: 10.1186/s12859-021-04088-6.

Fast and accurate short read alignment with Burrows-Wheeler transform.使用Burrows-Wheeler变换进行快速准确的短读比对。

Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.

Read-Split-Run: an improved bioinformatics pipeline for identification of genome-wide non-canonical spliced regions using RNA-Seq data.读取-分割-运行：一种利用RNA测序数据识别全基因组非经典剪接区域的改进型生物信息学流程。

BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):503. doi: 10.1186/s12864-016-2896-7.

SRmapper: a fast and sensitive genome-hashing alignment tool.SRmapper：一种快速且灵敏的基因组哈希比对工具。

Bioinformatics. 2013 Feb 1;29(3):316-21. doi: 10.1093/bioinformatics/bts712. Epub 2012 Dec 24.

TAP: a targeted clinical genomics pipeline for detecting transcript variants using RNA-seq data.TAP：一种使用 RNA-seq 数据检测转录变体的靶向临床基因组学管道。

BMC Med Genomics. 2018 Sep 10;11(1):79. doi: 10.1186/s12920-018-0402-6.

Accurate estimation of short read mapping quality for next-generation genome sequencing.准确估计下一代基因组测序中短读测序数据的映射质量。

Bioinformatics. 2012 Sep 15;28(18):i349-i355. doi: 10.1093/bioinformatics/bts408.

ABMapper: a suffix array-based tool for multi-location searching and splice-junction mapping.ABMapper：一个基于后缀数组的多位置搜索和剪接连接映射工具。

Bioinformatics. 2011 Feb 1;27(3):421-2. doi: 10.1093/bioinformatics/btq656. Epub 2010 Dec 17.

MapNext: a software tool for spliced and unspliced alignments and SNP detection of short sequence reads.MapNext：一款用于短序列读取的拼接和未拼接比对及 SNP 检测的软件工具。

BMC Genomics. 2009 Dec 3;10 Suppl 3(Suppl 3):S13. doi: 10.1186/1471-2164-10-S3-S13.

引用本文的文献

Interferon-gamma 1b-induced gene expression alters neutrophil function in patients with chronic granulomatous disease.干扰素-γ1b诱导的基因表达改变慢性肉芽肿病患者的中性粒细胞功能。

PLoS One. 2025 Sep 8;20(9):e0331657. doi: 10.1371/journal.pone.0331657. eCollection 2025.

Knock Out of miRNA-30a-5p and Reconstitution of the Actin Network Dynamics Partly Restores the Impaired Terminal Erythroid Differentiation during Blood Pharming.敲除miRNA-30a-5p并重建肌动蛋白网络动力学部分恢复了血液制药过程中受损的终末红细胞分化。

Stem Cell Rev Rep. 2025 Aug 26. doi: 10.1007/s12015-025-10957-x.

Rapid emergence of non-autonomous elements may stop P-element invasions in the absence of a piRNA-based host defence.在缺乏基于piRNA的宿主防御的情况下，非自主元件的快速出现可能会阻止P元件的入侵。

PLoS Genet. 2025 Aug 20;21(8):e1011649. doi: 10.1371/journal.pgen.1011649. eCollection 2025 Aug.

Chikungunya masquerading as dengue infection in Sri Lanka uncovered by metagenomics.宏基因组学揭示在斯里兰卡被误诊为登革热感染的基孔肯雅热病例

PLoS One. 2025 Jul 7;20(7):e0326995. doi: 10.1371/journal.pone.0326995. eCollection 2025.

Deploying Metagenomics to Characterize Microbial Pathogens During Outbreak of Acute Febrile Illness Among Children in Tanzania.利用宏基因组学对坦桑尼亚儿童急性发热性疾病暴发期间的微生物病原体进行特征分析。

Pathogens. 2025 Jun 19;14(6):601. doi: 10.3390/pathogens14060601.

APOE protects against severe infection with Mycobacterium tuberculosis by restraining production of neutrophil extracellular traps.载脂蛋白E通过抑制中性粒细胞胞外诱捕网的产生来预防严重的结核分枝杆菌感染。

PLoS Pathog. 2025 Jun 16;21(6):e1013267. doi: 10.1371/journal.ppat.1013267. eCollection 2025 Jun.

Genetic Inactivation of the Serotonin Transporter Dysregulates Expression of Neurotransmission Genes and Genome-Wide DNA Methylation Levels in the Medial Prefrontal Cortex of Male Rats During Postnatal Development.在出生后发育过程中，雄性大鼠内侧前额叶皮质中血清素转运体的基因失活会失调神经传递基因的表达和全基因组DNA甲基化水平。

Dev Neurobiol. 2025 Jul;85(3):e22973. doi: 10.1002/dneu.22973.

A haplotype-resolved reference genome for Eucalyptus grandis.一个单倍型解析的巨桉参考基因组。

G3 (Bethesda). 2025 Jul 9;15(7). doi: 10.1093/g3journal/jkaf112.

Wastewater Metavirome Diversity: Exploring Replicate Inconsistencies and Bioinformatic Tool Disparities.废水宏病毒组多样性：探索重复实验的不一致性和生物信息学工具的差异

Int J Environ Res Public Health. 2025 Apr 30;22(5):707. doi: 10.3390/ijerph22050707.

Trajectories from single-cells to PAX5-driven leukemia reveal PAX5-MYC interplay in vivo.从单细胞到PAX5驱动的白血病的轨迹揭示了PAX5与MYC在体内的相互作用。

Leukemia. 2025 May 20. doi: 10.1038/s41375-025-02626-2.

本文引用的文献

RazerS--fast read mapping with sensitivity control.RazerS——具有灵敏度控制的快速读取映射。

Genome Res. 2009 Sep;19(9):1646-54. doi: 10.1101/gr.088823.108. Epub 2009 Jul 10.

SNP-o-matic.SNP-o-matic.

Bioinformatics. 2009 Sep 15;25(18):2434-5. doi: 10.1093/bioinformatics/btp403. Epub 2009 Jul 2.

SOAP2: an improved ultrafast tool for short read alignment.SOAP2：一种用于短读序列比对的改进型超快速工具。

Bioinformatics. 2009 Aug 1;25(15):1966-7. doi: 10.1093/bioinformatics/btp336. Epub 2009 Jun 3.

SHRiMP: accurate mapping of short color-space reads.SHRiMP：短颜色空间读数的精确映射

PLoS Comput Biol. 2009 May;5(5):e1000386. doi: 10.1371/journal.pcbi.1000386. Epub 2009 May 22.

Fast and accurate short read alignment with Burrows-Wheeler transform.使用Burrows-Wheeler变换进行快速准确的短读比对。

Bioinformatics. 2009 Jul 15;25(14):1754-60. doi: 10.1093/bioinformatics/btp324. Epub 2009 May 18.

Targeted bisulfite sequencing reveals changes in DNA methylation associated with nuclear reprogramming.靶向亚硫酸氢盐测序揭示了与核重编程相关的DNA甲基化变化。

Nat Biotechnol. 2009 Apr;27(4):353-60. doi: 10.1038/nbt.1530. Epub 2009 Mar 29.

TopHat: discovering splice junctions with RNA-Seq.TopHat：利用RNA测序发现剪接接头

Bioinformatics. 2009 May 1;25(9):1105-11. doi: 10.1093/bioinformatics/btp120. Epub 2009 Mar 16.

Finding the fifth base: genome-wide sequencing of cytosine methylation.寻找第五种碱基：全基因组胞嘧啶甲基化测序

Genome Res. 2009 Jun;19(6):959-66. doi: 10.1101/gr.083451.108. Epub 2009 Mar 9.

Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.短DNA序列与人类基因组的超快速且内存高效比对。

Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. Epub 2009 Mar 4.

A sequence-level map of chromosomal breakpoints in the MCF-7 breast cancer cell line yields insights into the evolution of a cancer genome.MCF-7乳腺癌细胞系中染色体断点的序列水平图谱为癌症基因组的进化提供了见解。

Genome Res. 2009 Feb;19(2):167-77. doi: 10.1101/gr.080259.108. Epub 2008 Dec 3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

快速且耐受 SNP 的短读长中复杂变体和剪接检测

Fast and SNP-tolerant detection of complex variants and splicing in short reads.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献