• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基因组暗物质:基因组可映射分数所说明的短读映射可靠性。

Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score.

机构信息

Department of Computer Science, Stony Brook University, Stony Brook, NY, USA.

出版信息

Bioinformatics. 2012 Aug 15;28(16):2097-105. doi: 10.1093/bioinformatics/bts330. Epub 2012 Jun 4.

DOI:10.1093/bioinformatics/bts330
PMID:22668792
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3413383/
Abstract

MOTIVATION

Genome resequencing and short read mapping are two of the primary tools of genomics and are used for many important applications. The current state-of-the-art in mapping uses the quality values and mapping quality scores to evaluate the reliability of the mapping. These attributes, however, are assigned to individual reads and do not directly measure the problematic repeats across the genome. Here, we present the Genome Mappability Score (GMS) as a novel measure of the complexity of resequencing a genome. The GMS is a weighted probability that any read could be unambiguously mapped to a given position and thus measures the overall composition of the genome itself.

RESULTS

We have developed the Genome Mappability Analyzer to compute the GMS of every position in a genome. It leverages the parallelism of cloud computing to analyze large genomes, and enabled us to identify the 5-14% of the human, mouse, fly and yeast genomes that are difficult to analyze with short reads. We examined the accuracy of the widely used BWA/SAMtools polymorphism discovery pipeline in the context of the GMS, and found discovery errors are dominated by false negatives, especially in regions with poor GMS. These errors are fundamental to the mapping process and cannot be overcome by increasing coverage. As such, the GMS should be considered in every resequencing project to pinpoint the 'dark matter' of the genome, including of known clinically relevant variations in these regions.

AVAILABILITY

The source code and profiles of several model organisms are available at http://gma-bio.sourceforge.net

摘要

动机

基因组重测序和短读序列映射是基因组学的两种主要工具,可用于许多重要的应用。当前的映射技术使用质量值和映射质量分数来评估映射的可靠性。然而,这些属性是分配给单个读取的,并没有直接测量基因组中存在的问题重复。在这里,我们提出了基因组可映射性评分(GMS)作为重新测序基因组复杂性的新度量。GMS 是一个能够明确映射到给定位置的任何读取的加权概率,因此可以测量基因组本身的总体组成。

结果

我们开发了基因组可映射性分析器来计算基因组中每个位置的 GMS。它利用云计算的并行性来分析大型基因组,并使我们能够识别人类、老鼠、苍蝇和酵母基因组中 5-14%的难以用短读序列进行分析的区域。我们在 GMS 背景下检查了广泛使用的 BWA/SAMtools 多态性发现管道的准确性,发现发现错误主要是假阴性,尤其是在 GMS 较差的区域。这些错误是映射过程的基础,不能通过增加覆盖度来克服。因此,在每个重测序项目中都应考虑 GMS,以查明基因组的“暗物质”,包括这些区域中已知的临床相关变异。

可用性

几种模型生物的源代码和图谱可在 http://gma-bio.sourceforge.net 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50bf/3413383/99cbac0c01ba/bts330f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50bf/3413383/66fe9f0daa0e/bts330f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50bf/3413383/b43e553e7a33/bts330f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50bf/3413383/127131fe6b85/bts330f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50bf/3413383/a47eca0127ee/bts330f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50bf/3413383/25a730f3179b/bts330f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50bf/3413383/735504f41910/bts330f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50bf/3413383/99cbac0c01ba/bts330f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50bf/3413383/66fe9f0daa0e/bts330f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50bf/3413383/b43e553e7a33/bts330f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50bf/3413383/127131fe6b85/bts330f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50bf/3413383/a47eca0127ee/bts330f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50bf/3413383/25a730f3179b/bts330f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50bf/3413383/735504f41910/bts330f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/50bf/3413383/99cbac0c01ba/bts330f7.jpg

相似文献

1
Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score.基因组暗物质:基因组可映射分数所说明的短读映射可靠性。
Bioinformatics. 2012 Aug 15;28(16):2097-105. doi: 10.1093/bioinformatics/bts330. Epub 2012 Jun 4.
2
Umap and Bismap: quantifying genome and methylome mappability.Umap 和 Bismap:量化基因组和甲基组的可映射性。
Nucleic Acids Res. 2018 Nov 16;46(20):e120. doi: 10.1093/nar/gky677.
3
MaxSSmap: a GPU program for mapping divergent short reads to genomes with the maximum scoring subsequence.MaxSSmap:一种用于通过最大得分子序列将发散短读段映射到基因组的GPU程序。
BMC Genomics. 2014 Nov 15;15(1):969. doi: 10.1186/1471-2164-15-969.
4
CloudMap: a cloud-based pipeline for analysis of mutant genome sequences.CloudMap:一种基于云的突变基因组序列分析流水线。
Genetics. 2012 Dec;192(4):1249-69. doi: 10.1534/genetics.112.144204. Epub 2012 Oct 10.
5
CloudBurst: highly sensitive read mapping with MapReduce.CloudBurst:使用MapReduce进行高灵敏度读段比对
Bioinformatics. 2009 Jun 1;25(11):1363-9. doi: 10.1093/bioinformatics/btp236. Epub 2009 Apr 8.
6
Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data.高通量测序中使用的映射算法比较:应用于Ion Torrent数据
BMC Genomics. 2014 Apr 5;15:264. doi: 10.1186/1471-2164-15-264.
7
QuorUM: An Error Corrector for Illumina Reads.QuorUM:Illumina测序读数的纠错工具
PLoS One. 2015 Jun 17;10(6):e0130821. doi: 10.1371/journal.pone.0130821. eCollection 2015.
8
Using quality scores and longer reads improves accuracy of Solexa read mapping.使用质量得分和更长的读段可提高Solexa读段比对的准确性。
BMC Bioinformatics. 2008 Feb 28;9:128. doi: 10.1186/1471-2105-9-128.
9
Diminishing return for increased Mappability with longer sequencing reads: implications of the k-mer distributions in the human genome.测序读长增加导致可测性提高的收益递减:人类基因组中 k-mer 分布的意义。
BMC Bioinformatics. 2014 Jan 3;15:2. doi: 10.1186/1471-2105-15-2.
10
Accurate estimation of short read mapping quality for next-generation genome sequencing.准确估计下一代基因组测序中短读测序数据的映射质量。
Bioinformatics. 2012 Sep 15;28(18):i349-i355. doi: 10.1093/bioinformatics/bts408.

引用本文的文献

1
Human-specific gene expansions contribute to brain evolution.人类特有的基因扩增促进大脑进化。
Cell. 2025 Jul 18. doi: 10.1016/j.cell.2025.06.037.
2
Haplotype-Resolved DNA Methylation at the Locus identifies Allele-Specific Epigenetic Signatures Relevant to Alzheimer's Disease Risk.单倍型解析的基因座DNA甲基化揭示与阿尔茨海默病风险相关的等位基因特异性表观遗传特征。
bioRxiv. 2025 Jul 2:2025.07.01.662592. doi: 10.1101/2025.07.01.662592.
3
Genomic Anomaly Detection with Functional Data Analysis.基于功能数据分析的基因组异常检测

本文引用的文献

1
Hybrid error correction and de novo assembly of single-molecule sequencing reads.单分子测序reads 的混合纠错与从头组装。
Nat Biotechnol. 2012 Jul 1;30(7):693-700. doi: 10.1038/nbt.2280.
2
Genomic epidemiology of the Escherichia coli O104:H4 outbreaks in Europe, 2011.2011 年欧洲肠出血性大肠杆菌 O104:H4 暴发的基因组流行病学研究。
Proc Natl Acad Sci U S A. 2012 Feb 21;109(8):3065-70. doi: 10.1073/pnas.1121491109. Epub 2012 Feb 6.
3
An integrated semiconductor device enabling non-optical genome sequencing.一种用于非光学基因组测序的集成半导体设备。
Genes (Basel). 2025 Jun 15;16(6):710. doi: 10.3390/genes16060710.
4
Sequencing the gaps: dark genomic regions persist in CHM13 despite long-read advances.填补空白:尽管长读长测序技术取得了进展,但CHM13基因组中的暗区仍然存在。
bioRxiv. 2025 May 28:2025.05.23.655776. doi: 10.1101/2025.05.23.655776.
5
Deep learning for genomic selection of aquatic animals.用于水生动物基因组选择的深度学习
Mar Life Sci Technol. 2024 Sep 27;6(4):631-650. doi: 10.1007/s42995-024-00252-y. eCollection 2024 Nov.
6
The Functional Comparison of Eukaryotic Proteomes: Implications for Choosing an Appropriate Model Organism to Probe Human Biology.真核生物蛋白质组的功能比较:选择合适的模式生物来探究人类生物学的意义。
Methods Mol Biol. 2025;2859:163-179. doi: 10.1007/978-1-0716-4152-1_9.
7
Gene expansions contributing to human brain evolution.有助于人类大脑进化的基因扩增。
bioRxiv. 2024 Sep 26:2024.09.26.615256. doi: 10.1101/2024.09.26.615256.
8
Splice_sim: a nucleotide conversion-enabled RNA-seq simulation and evaluation framework.Splice_sim:一种支持核苷酸转换的 RNA-seq 模拟和评估框架。
Genome Biol. 2024 Jun 25;25(1):166. doi: 10.1186/s13059-024-03313-8.
9
Performance analysis of conventional and AI-based variant callers using short and long reads.使用短读长读对常规和基于人工智能的变异调用程序进行性能分析。
BMC Bioinformatics. 2023 Dec 14;24(1):472. doi: 10.1186/s12859-023-05596-3.
10
Exome-wide benchmark of difficult-to-sequence regions using short-read next-generation DNA sequencing.利用短读长下一代 DNA 测序对难测序区域进行外显子组基准测试。
Nucleic Acids Res. 2024 Jan 11;52(1):114-124. doi: 10.1093/nar/gkad1140.
Nature. 2011 Jul 20;475(7356):348-52. doi: 10.1038/nature10242.
4
A novel and well-defined benchmarking method for second generation read mapping.第二代读段映射的新颖而明确的基准测试方法。
BMC Bioinformatics. 2011 May 26;12:210. doi: 10.1186/1471-2105-12-210.
5
Accuracy and quality assessment of 454 GS-FLX Titanium pyrosequencing.454 GS-FLX Titanium 焦磷酸测序准确性和质量评估。
BMC Genomics. 2011 May 19;12:245. doi: 10.1186/1471-2164-12-245.
6
Identification of functional elements and regulatory circuits by Drosophila modENCODE.通过 Drosophila modENCODE 鉴定功能元件和调控回路。
Science. 2010 Dec 24;330(6012):1787-97. doi: 10.1126/science.1198374. Epub 2010 Dec 22.
7
The uniqueome: a mappability resource for short-tag sequencing.独特组学:短标签测序的可作图资源。
Bioinformatics. 2011 Jan 15;27(2):272-4. doi: 10.1093/bioinformatics/btq640. Epub 2010 Nov 12.
8
A map of human genome variation from population-scale sequencing.人类基因组变异的图谱来自于基于人群的测序。
Nature. 2010 Oct 28;467(7319):1061-73. doi: 10.1038/nature09534.
9
Cloud computing and the DNA data race.云计算与DNA数据竞赛。
Nat Biotechnol. 2010 Jul;28(7):691-3. doi: 10.1038/nbt0710-691.
10
The case for cloud computing in genome informatics.云计算在基因组信息学中的应用。
Genome Biol. 2010;11(5):207. doi: 10.1186/gb-2010-11-5-207. Epub 2010 May 5.