文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

准确估计下一代基因组测序中短读测序数据的映射质量。

Accurate estimation of short read mapping quality for next-generation genome sequencing.

机构信息

Department of Electrical Engineering & Computer Science, Case Western Reserve University, Cleveland, OH 44106, USA.

出版信息

Bioinformatics. 2012 Sep 15;28(18):i349-i355. doi: 10.1093/bioinformatics/bts408.


DOI:10.1093/bioinformatics/bts408
PMID:22962451
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3436835/
Abstract

MOTIVATION: Several software tools specialize in the alignment of short next-generation sequencing reads to a reference sequence. Some of these tools report a mapping quality score for each alignment-in principle, this quality score tells researchers the likelihood that the alignment is correct. However, the reported mapping quality often correlates weakly with actual accuracy and the qualities of many mappings are underestimated, encouraging the researchers to discard correct mappings. Further, these low-quality mappings tend to correlate with variations in the genome (both single nucleotide and structural), and such mappings are important in accurately identifying genomic variants. APPROACH: We develop a machine learning tool, LoQuM (LOgistic regression tool for calibrating the Quality of short read mappings, to assign reliable mapping quality scores to mappings of Illumina reads returned by any alignment tool. LoQuM uses statistics on the read (base quality scores reported by the sequencer) and the alignment (number of matches, mismatches and deletions, mapping quality score returned by the alignment tool, if available, and number of mappings) as features for classification and uses simulated reads to learn a logistic regression model that relates these features to actual mapping quality. RESULTS: We test the predictions of LoQuM on an independent dataset generated by the ART short read simulation software and observe that LoQuM can 'resurrect' many mappings that are assigned zero quality scores by the alignment tools and are therefore likely to be discarded by researchers. We also observe that the recalibration of mapping quality scores greatly enhances the precision of called single nucleotide polymorphisms. AVAILABILITY: LoQuM is available as open source at http://compbio.case.edu/loqum/. CONTACT: matthew.ruffalo@case.edu.

摘要

动机:有几个软件工具专门用于将短的下一代测序读取与参考序列对齐。其中一些工具为每个比对报告一个比对质量评分——原则上,这个质量评分可以告诉研究人员比对正确的可能性。然而,报告的比对质量通常与实际准确性弱相关,并且许多比对的质量被低估,这鼓励研究人员丢弃正确的比对。此外,这些低质量的比对往往与基因组的变异(包括单核苷酸和结构变异)相关,这些比对对于准确识别基因组变异非常重要。

方法:我们开发了一种机器学习工具 LoQuM(用于校准短读映射质量的逻辑回归工具),用于为任何比对工具返回的 Illumina 读取的映射分配可靠的映射质量评分。LoQuM 使用读取(测序仪报告的碱基质量评分)和比对(匹配、错配和缺失的数量、比对工具返回的映射质量评分(如果有)以及映射的数量)的统计信息作为分类的特征,并使用模拟读取来学习一个逻辑回归模型,该模型将这些特征与实际的映射质量联系起来。

结果:我们在由 ART 短读模拟软件生成的独立数据集上测试了 LoQuM 的预测,观察到 LoQuM 可以“恢复”许多比对质量评分被比对工具分配为零的映射,因此很可能被研究人员丢弃。我们还观察到,重新校准映射质量评分大大提高了单核苷酸多态性的准确性。

可用性:LoQuM 可在 http://compbio.case.edu/loqum/ 上作为开源使用。

联系:matthew.ruffalo@case.edu。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a705/3436835/0805afd2c711/bts408f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a705/3436835/8eeac2005830/bts408f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a705/3436835/5ae3074ecb27/bts408f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a705/3436835/6cc9ec630da4/bts408f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a705/3436835/066fc9c075a9/bts408f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a705/3436835/6d83664cbcc6/bts408f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a705/3436835/9429405fe5ca/bts408f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a705/3436835/0805afd2c711/bts408f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a705/3436835/8eeac2005830/bts408f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a705/3436835/5ae3074ecb27/bts408f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a705/3436835/6cc9ec630da4/bts408f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a705/3436835/066fc9c075a9/bts408f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a705/3436835/6d83664cbcc6/bts408f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a705/3436835/9429405fe5ca/bts408f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a705/3436835/0805afd2c711/bts408f7.jpg

相似文献

[1]
Accurate estimation of short read mapping quality for next-generation genome sequencing.

Bioinformatics. 2012-9-15

[2]
Re-alignment of the unmapped reads with base quality score.

BMC Bioinformatics. 2015

[3]
Comparative analysis of algorithms for next-generation sequencing read alignment.

Bioinformatics. 2011-8-19

[4]
Ψ-RA: a parallel sparse index for genomic read alignment.

BMC Genomics. 2011-7-27

[5]
Fast and accurate read alignment for resequencing.

Bioinformatics. 2012-7-18

[6]
SRmapper: a fast and sensitive genome-hashing alignment tool.

Bioinformatics. 2012-12-24

[7]
SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data.

BMC Bioinformatics. 2014-2-5

[8]
Long read alignment based on maximal exact match seeds.

Bioinformatics. 2012-9-15

[9]
MOSAIK: a hash-based algorithm for accurate next-generation sequencing short-read mapping.

PLoS One. 2014-3-5

[10]
Performance evaluation method for read mapping tool in clinical panel sequencing.

Genes Genomics. 2018

引用本文的文献

[1]
SigAlign: an alignment algorithm guided by explicit similarity criteria.

Nucleic Acids Res. 2024-8-27

[2]
Short-read aligner performance in germline variant identification.

Bioinformatics. 2023-8-1

[3]
Full-Length Transcriptome Sequencing of the Scleractinian Coral Reveals the Gene Expression Profile of Coral-Zooxanthellae Holobiont.

Biology (Basel). 2021-12-5

[4]
Genome-Wide Association Studies in Indian Buffalo Revealed Genomic Regions for Lactation and Fertility.

Front Genet. 2021-9-20

[5]
Recalibration of mapping quality scores in Illumina short-read alignments improves SNP detection results in low-coverage sequencing data.

PeerJ. 2020-12-7

[6]
Comparison of single-nucleotide variants identified by Illumina and Oxford Nanopore technologies in the context of a potential outbreak of Shiga toxin-producing Escherichia coli.

Gigascience. 2019-8-1

[7]
Joint Estimates of Heterozygosity and Runs of Homozygosity for Modern and Ancient Samples.

Genetics. 2019-5-14

[8]
Identification of Genes Involved in Lipid Biosynthesis through de novo Transcriptome Assembly from Cocos nucifera Developing Endosperm.

Plant Cell Physiol. 2019-5-1

[9]
A tandem simulation framework for predicting mapping quality.

Genome Biol. 2017-8-10

[10]
Epigenomic profiling of primary gastric adenocarcinoma reveals super-enhancer heterogeneity.

Nat Commun. 2016-9-28

本文引用的文献

[1]
ART: a next-generation sequencing read simulator.

Bioinformatics. 2011-12-23

[2]
Comparative analysis of algorithms for next-generation sequencing read alignment.

Bioinformatics. 2011-8-19

[3]
Sequencing breakthroughs for genomic ecology and evolutionary biology.

Mol Ecol Resour. 2008-1

[4]
Integrated analysis of gene expression, CpG island methylation, and gene copy number in breast cancer cells by deep sequencing.

PLoS One. 2011-2-25

[5]
Improved variant discovery through local re-alignment of short-read next-generation sequencing data using SRMA.

Genome Biol. 2010-10-8

[6]
Advances in understanding cancer genomes through second-generation sequencing.

Nat Rev Genet. 2010-10

[7]
mrsFAST: a cache-oblivious algorithm for short-read mapping.

Nat Methods. 2010-8

[8]
De novo assembly of human genomes with massively parallel short read sequencing.

Genome Res. 2009-12-17

[9]
Personalized copy number and segmental duplication maps using next-generation sequencing.

Nat Genet. 2009-10

[10]
The Sequence Alignment/Map format and SAMtools.

Bioinformatics. 2009-6-8

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索