• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于集合多重覆盖优化的计算读归一化。

In silico read normalization using set multi-cover optimization.

机构信息

Cluster of Excellence on Multimodal Computing and Interaction, Saarland University, Saarbrücken, Germany.

Department of Computational Biology and Applied Algorithmics, Max Planck Institute for Informatics, Saarbrücken, Germany.

出版信息

Bioinformatics. 2018 Oct 1;34(19):3273-3280. doi: 10.1093/bioinformatics/bty307.

DOI:10.1093/bioinformatics/bty307
PMID:29912280
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6157080/
Abstract

MOTIVATION

De Bruijn graphs are a common assembly data structure for sequencing datasets. But with the advances in sequencing technologies, assembling high coverage datasets has become a computational challenge. Read normalization, which removes redundancy in datasets, is widely applied to reduce resource requirements. Current normalization algorithms, though efficient, provide no guarantee to preserve important k-mers that form connections between regions in the graph.

RESULTS

Here, normalization is phrased as a set multi-cover problem on reads and a heuristic algorithm, Optimized Read Normalization Algorithm (ORNA), is proposed. ORNA normalizes to the minimum number of reads required to retain all k-mers and their relative k-mer abundances from the original dataset. Hence, all connections from the original graph are preserved. ORNA was tested on various RNA-seq datasets with different coverage values. It was compared to the current normalization algorithms and was found to be performing better. Normalizing error corrected data allows for more accurate assemblies compared to the normalized uncorrected dataset. Further, an application is proposed in which multiple datasets are combined and normalized to predict novel transcripts that would have been missed otherwise. Finally, ORNA is a general purpose normalization algorithm that is fast and significantly reduces datasets with loss of assembly quality in between [1, 30]% depending on reduction stringency.

AVAILABILITY AND IMPLEMENTATION

ORNA is available at https://github.com/SchulzLab/ORNA.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

De Bruijn 图是测序数据集的常用组装数据结构。但是,随着测序技术的进步,组装高覆盖率数据集已成为一个计算挑战。读段归一化(read normalization),用于去除数据集的冗余,被广泛应用于降低资源需求。虽然当前的归一化算法效率很高,但不能保证保留重要的 k-mer,这些 k-mer 构成了图中区域之间的连接。

结果

在这里,归一化被表述为一个关于读段的集多重覆盖问题,并提出了一种启发式算法,即优化读段归一化算法(Optimized Read Normalization Algorithm,ORNA)。ORNA 将归一化到保留原始数据集的所有 k-mer 及其相对 k-mer 丰度所需的最少读段数。因此,保留了原始图的所有连接。ORNA 在具有不同覆盖值的各种 RNA-seq 数据集上进行了测试。与当前的归一化算法相比,它的性能更好。与归一化未纠错数据相比,对纠错后的数据进行归一化可以得到更准确的组装结果。此外,还提出了一个应用程序,它可以组合和归一化多个数据集,以预测可能会错过的新转录本。最后,ORNA 是一种通用的归一化算法,速度快,在 [1, 30]%的缩减严格性之间显著减少数据集,而不会损失组装质量。

可用性和实现

ORNA 可在 https://github.com/SchulzLab/ORNA 上获得。

补充信息

补充数据可在 Bioinformatics 在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ab1/6157080/596eeaa59fd1/bty307f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ab1/6157080/14aeb8afd490/bty307f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ab1/6157080/f6095a36ec94/bty307f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ab1/6157080/9b96efab33c8/bty307f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ab1/6157080/596eeaa59fd1/bty307f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ab1/6157080/14aeb8afd490/bty307f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ab1/6157080/f6095a36ec94/bty307f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ab1/6157080/9b96efab33c8/bty307f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9ab1/6157080/596eeaa59fd1/bty307f4.jpg

相似文献

1
In silico read normalization using set multi-cover optimization.基于集合多重覆盖优化的计算读归一化。
Bioinformatics. 2018 Oct 1;34(19):3273-3280. doi: 10.1093/bioinformatics/bty307.
2
Improving in-silico normalization using read weights.利用读取权重提高计算机模拟中的归一化效果。
Sci Rep. 2019 Mar 26;9(1):5133. doi: 10.1038/s41598-019-41502-9.
3
deBGR: an efficient and near-exact representation of the weighted de Bruijn graph.deBGR:一种高效且近乎精确的加权 de Bruijn 图表示方法。
Bioinformatics. 2017 Jul 15;33(14):i133-i141. doi: 10.1093/bioinformatics/btx261.
4
A space and time-efficient index for the compacted colored de Bruijn graph.一种用于压缩彩色 de Bruijn 图的空间和时间高效索引。
Bioinformatics. 2018 Jul 1;34(13):i169-i177. doi: 10.1093/bioinformatics/bty292.
5
Integration of string and de Bruijn graphs for genome assembly.用于基因组组装的弦图与德布鲁因图整合
Bioinformatics. 2016 May 1;32(9):1301-7. doi: 10.1093/bioinformatics/btw011. Epub 2016 Jan 10.
6
REINDEER: efficient indexing of k-mer presence and abundance in sequencing datasets.驯鹿:测序数据集中小段序列存在和丰度的高效索引。
Bioinformatics. 2020 Jul 1;36(Suppl_1):i177-i185. doi: 10.1093/bioinformatics/btaa487.
7
Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs.迈向完美读段:通过在 De Bruijn 图上进行映射来自我纠正短读段。
Bioinformatics. 2020 Mar 1;36(5):1374-1381. doi: 10.1093/bioinformatics/btz102.
8
Information-optimal genome assembly via sparse read-overlap graphs.通过稀疏读段重叠图实现信息最优的基因组组装
Bioinformatics. 2016 Sep 1;32(17):i494-i502. doi: 10.1093/bioinformatics/btw450.
9
Informed kmer selection for de novo transcriptome assembly.用于从头转录组组装的信息性k-mer选择
Bioinformatics. 2016 Jun 1;32(11):1670-7. doi: 10.1093/bioinformatics/btw217. Epub 2016 Apr 28.
10
NeatFreq: reference-free data reduction and coverage normalization for De Novo sequence assembly.NeatFreq:用于从头序列组装的无参考数据缩减和覆盖度归一化
BMC Bioinformatics. 2014 Nov 19;15(1):357. doi: 10.1186/s12859-014-0357-3.

引用本文的文献

1
Identification of novel fructo-oligosaccharide bacterial consumers by pulse metatranscriptomics in a human stool sample.通过脉冲宏转录组学在一份人类粪便样本中鉴定新型低聚果糖细菌消费者。
mSphere. 2025 Jan 28;10(1):e0066824. doi: 10.1128/msphere.00668-24. Epub 2024 Dec 19.
2
Spaceflight alters host-gut microbiota interactions.太空飞行改变了宿主-肠道微生物群的相互作用。
NPJ Biofilms Microbiomes. 2024 Aug 29;10(1):71. doi: 10.1038/s41522-024-00545-1.
3
NDRindex: a method for the quality assessment of single-cell RNA-Seq preprocessing data.

本文引用的文献

1
Salmon provides fast and bias-aware quantification of transcript expression.鲑鱼提供快速且无偏倚的转录本表达定量。
Nat Methods. 2017 Apr;14(4):417-419. doi: 10.1038/nmeth.4197. Epub 2017 Mar 6.
2
Compacting de Bruijn graphs from sequencing data quickly and in low memory.从测序数据中快速且低内存地压缩德布鲁因图。
Bioinformatics. 2016 Jun 15;32(12):i201-i208. doi: 10.1093/bioinformatics/btw279.
3
RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes.RapMap:一种用于将RNA测序读数映射到转录组的快速、灵敏且准确的工具。
NDRindex:一种用于评估单细胞 RNA-Seq 预处理数据质量的方法。
BMC Bioinformatics. 2020 Dec 16;21(Suppl 16):540. doi: 10.1186/s12859-020-03883-x.
4
Raw transcriptomics data to gene specific SSRs: a validated free bioinformatics workflow for biologists.原始转录组学数据到基因特异性 SSRs:生物学家验证的免费生物信息学工作流程。
Sci Rep. 2020 Oct 26;10(1):18236. doi: 10.1038/s41598-020-75270-8.
5
The Utility of Genomic and Transcriptomic Data in the Construction of Proxy Protein Sequence Databases for Unsequenced Tree Nuts.基因组和转录组数据在构建未测序坚果类替代蛋白质序列数据库中的应用
Biology (Basel). 2020 May 19;9(5):104. doi: 10.3390/biology9050104.
6
Improving in-silico normalization using read weights.利用读取权重提高计算机模拟中的归一化效果。
Sci Rep. 2019 Mar 26;9(1):5133. doi: 10.1038/s41598-019-41502-9.
Bioinformatics. 2016 Jun 15;32(12):i192-i200. doi: 10.1093/bioinformatics/btw277.
4
Informed kmer selection for de novo transcriptome assembly.用于从头转录组组装的信息性k-mer选择
Bioinformatics. 2016 Jun 1;32(11):1670-7. doi: 10.1093/bioinformatics/btw217. Epub 2016 Apr 28.
5
Assembly, Assessment, and Availability of De novo Generated Eukaryotic Transcriptomes.从头合成的真核转录组的组装、评估与可用性
Front Genet. 2016 Jan 11;6:361. doi: 10.3389/fgene.2015.00361. eCollection 2015.
6
The khmer software package: enabling efficient nucleotide sequence analysis.高棉软件包:实现高效的核苷酸序列分析
F1000Res. 2015 Sep 25;4:900. doi: 10.12688/f1000research.6924.1. eCollection 2015.
7
Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads.Rcorrector:对Illumina RNA测序读数进行高效准确的纠错。
Gigascience. 2015 Oct 19;4:48. doi: 10.1186/s13742-015-0089-y. eCollection 2015.
8
Evaluation of de novo transcriptome assemblies from RNA-Seq data.基于RNA测序数据的从头转录组组装评估。
Genome Biol. 2014 Dec 21;15(12):553. doi: 10.1186/s13059-014-0553-5.
9
NeatFreq: reference-free data reduction and coverage normalization for De Novo sequence assembly.NeatFreq:用于从头序列组装的无参考数据缩减和覆盖度归一化
BMC Bioinformatics. 2014 Nov 19;15(1):357. doi: 10.1186/s12859-014-0357-3.
10
Ensembl 2015.Ensembl 2015.
Nucleic Acids Res. 2015 Jan;43(Database issue):D662-9. doi: 10.1093/nar/gku1010. Epub 2014 Oct 28.