• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用读取权重提高计算机模拟中的归一化效果。

Improving in-silico normalization using read weights.

机构信息

Cluster of Excellence on Multimodal Computing and Interaction (MMCI) and Max Planck Insitute for Informatics (MPII), Saarland University, Saarbrücken, Germany.

Saarbrücken Graduate School for Computer Science, Saarland University and International Max Planck Research School for Computer Science, Saarland Informatics Campus, Saarbrücken, Germany.

出版信息

Sci Rep. 2019 Mar 26;9(1):5133. doi: 10.1038/s41598-019-41502-9.

DOI:10.1038/s41598-019-41502-9
PMID:30914698
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6435659/
Abstract

Specialized de novo assemblers for diverse datatypes have been developed and are in widespread use for the analyses of single-cell genomics, metagenomics and RNA-seq data. However, assembly of large sequencing datasets produced by modern technologies is challenging and computationally intensive. In-silico read normalization has been suggested as a computational strategy to reduce redundancy in read datasets, which leads to significant speedups and memory savings of assembly pipelines. Previously, we presented a set multi-cover optimization based approach, ORNA, where reads are reduced without losing important k-mer connectivity information, as used in assembly graphs. Here we propose extensions to ORNA, named ORNA-Q and ORNA-K, which consider a weighted set multi-cover optimization formulation for the in-silico read normalization problem. These novel formulations make use of the base quality scores obtained from sequencers (ORNA-Q) or k-mer abundances of reads (ORNA-K) to improve normalization further. We devise efficient heuristic algorithms for solving both formulations. In applications to human RNA-seq data, ORNA-Q and ORNA-K are shown to assemble more or equally many full length transcripts compared to other normalization methods at similar or higher read reduction values. The algorithm is implemented under the latest version of ORNA (v2.0, https://github.com/SchulzLab/ORNA ).

摘要

针对不同数据类型,已经开发出了专门的从头组装程序,并且广泛应用于单细胞基因组学、宏基因组学和 RNA-seq 数据分析。然而,利用现代技术生成的大型测序数据集的组装具有挑战性,并且计算密集度高。在计算中,reads 的数据归一化被认为是一种减少重复数据的计算策略,这可以显著提高组装流水线的速度和内存节省。在此之前,我们提出了一种基于多覆盖优化的方法 ORNA,该方法在不丢失重要的 k-mer 连接信息的情况下减少 reads 的数量,这些信息用于组装图。在这里,我们提出了 ORNA 的扩展,分别命名为 ORNA-Q 和 ORNA-K,它们考虑了用于在计算中进行 reads 归一化问题的加权多覆盖优化公式。这些新的公式利用从测序仪获得的碱基质量得分(ORNA-Q)或 reads 的 k-mer 丰度(ORNA-K)来进一步改进归一化。我们为这两种公式设计了高效的启发式算法来求解。在对人类 RNA-seq 数据的应用中,与其他归一化方法相比,ORNA-Q 和 ORNA-K 在相似或更高的 read 减少值下组装了更多或同等数量的全长转录本。该算法在最新版本的 ORNA(v2.0,https://github.com/SchulzLab/ORNA)下实现。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47cd/6435659/c59142bc51dc/41598_2019_41502_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47cd/6435659/b2e490bfab40/41598_2019_41502_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47cd/6435659/f03b3d8b5799/41598_2019_41502_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47cd/6435659/ce57c758dc17/41598_2019_41502_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47cd/6435659/c59142bc51dc/41598_2019_41502_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47cd/6435659/b2e490bfab40/41598_2019_41502_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47cd/6435659/f03b3d8b5799/41598_2019_41502_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47cd/6435659/ce57c758dc17/41598_2019_41502_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/47cd/6435659/c59142bc51dc/41598_2019_41502_Fig4_HTML.jpg

相似文献

1
Improving in-silico normalization using read weights.利用读取权重提高计算机模拟中的归一化效果。
Sci Rep. 2019 Mar 26;9(1):5133. doi: 10.1038/s41598-019-41502-9.
2
In silico read normalization using set multi-cover optimization.基于集合多重覆盖优化的计算读归一化。
Bioinformatics. 2018 Oct 1;34(19):3273-3280. doi: 10.1093/bioinformatics/bty307.
3
Improving the sensitivity of long read overlap detection using grouped short k-mer matches.利用分组短 k-mer 匹配提高长读重叠检测的灵敏度。
BMC Genomics. 2019 Apr 4;20(Suppl 2):190. doi: 10.1186/s12864-019-5475-x.
4
Informed kmer selection for de novo transcriptome assembly.用于从头转录组组装的信息性k-mer选择
Bioinformatics. 2016 Jun 1;32(11):1670-7. doi: 10.1093/bioinformatics/btw217. Epub 2016 Apr 28.
5
An improved filtering algorithm for big read datasets and its application to single-cell assembly.一种针对大型读取数据集的改进过滤算法及其在单细胞组装中的应用。
BMC Bioinformatics. 2017 Jul 3;18(1):324. doi: 10.1186/s12859-017-1724-7.
6
NeatFreq: reference-free data reduction and coverage normalization for De Novo sequence assembly.NeatFreq:用于从头序列组装的无参考数据缩减和覆盖度归一化
BMC Bioinformatics. 2014 Nov 19;15(1):357. doi: 10.1186/s12859-014-0357-3.
7
Benchmarking of de novo assembly algorithms for Nanopore data reveals optimal performance of OLC approaches.用于纳米孔数据的从头组装算法基准测试揭示了重叠布局一致(OLC)方法的最佳性能。
BMC Genomics. 2016 Aug 22;17 Suppl 7(Suppl 7):507. doi: 10.1186/s12864-016-2895-8.
8
PERGA: a paired-end read guided de novo assembler for extending contigs using SVM and look ahead approach.PERGA:一种用于使用支持向量机和前瞻方法扩展重叠群的双端读段引导的从头组装器。
PLoS One. 2014 Dec 2;9(12):e114253. doi: 10.1371/journal.pone.0114253. eCollection 2014.
9
GenHap: a novel computational method based on genetic algorithms for haplotype assembly.GenHap:一种基于遗传算法的新型单倍型组装计算方法。
BMC Bioinformatics. 2019 Apr 18;20(Suppl 4):172. doi: 10.1186/s12859-019-2691-y.
10
TraRECo: a greedy approach based de novo transcriptome assembler with read error correction using consensus matrix.TraRECo:一种基于贪心策略的从头转录组组装方法,使用一致矩阵进行读错误校正。
BMC Genomics. 2018 Sep 4;19(1):653. doi: 10.1186/s12864-018-5034-x.

引用本文的文献

1
Morphogenesis, starvation, and light responses in a mushroom-forming fungus revealed by long-read sequencing and extensive expression profiling.通过长读长测序和广泛的表达谱分析揭示的一种形成蘑菇的真菌中的形态发生、饥饿和光反应
Cell Genom. 2025 Jun 11;5(6):100853. doi: 10.1016/j.xgen.2025.100853. Epub 2025 Apr 21.
2
VenomCap: An exon-capture probe set for the targeted sequencing of snake venom genes.毒液捕获探针组:一种用于靶向测序蛇毒基因的外显子捕获探针组。
Mol Ecol Resour. 2024 Nov;24(8):e14020. doi: 10.1111/1755-0998.14020. Epub 2024 Sep 19.
3
De novo assembly of transcriptomes and differential gene expression analysis using short-read data from emerging model organisms - a brief guide.

本文引用的文献

1
In silico read normalization using set multi-cover optimization.基于集合多重覆盖优化的计算读归一化。
Bioinformatics. 2018 Oct 1;34(19):3273-3280. doi: 10.1093/bioinformatics/bty307.
2
K-mer clustering algorithm using a MapReduce framework: application to the parallelization of the Inchworm module of Trinity.使用MapReduce框架的K-mer聚类算法:在Trinity的Inchworm模块并行化中的应用。
BMC Bioinformatics. 2017 Nov 3;18(1):467. doi: 10.1186/s12859-017-1881-8.
3
An improved filtering algorithm for big read datasets and its application to single-cell assembly.
利用新兴模式生物的短读长数据进行转录组的从头组装和差异基因表达分析——简要指南
Front Zool. 2024 Jun 20;21(1):17. doi: 10.1186/s12983-024-00538-y.
4
The long and short of it: benchmarking viromics using Illumina, Nanopore and PacBio sequencing technologies.简而言之:使用Illumina、Nanopore和PacBio测序技术对病毒组进行基准测试。
Microb Genom. 2024 Feb;10(2). doi: 10.1099/mgen.0.001198.
5
Reference-free assembly of long-read transcriptome sequencing data with RNA-Bloom2.无参考组装长读转录组测序数据的 RNA-Bloom2 方法。
Nat Commun. 2023 May 22;14(1):2940. doi: 10.1038/s41467-023-38553-y.
6
Antimicrobial Resistance and Genetic Diversity of Strains Isolated from Equine and Other Veterinary Samples.从马和其他兽医样本中分离出的菌株的抗微生物药物耐药性和遗传多样性
Pathogens. 2022 Dec 30;12(1):64. doi: 10.3390/pathogens12010064.
7
Design of Hydrogel Silk-Based Microarrays and Molecular Beacons for Reagentless Point-of-Care Diagnostics.用于无需试剂的即时诊断的水凝胶丝基微阵列和分子信标的设计
Front Bioeng Biotechnol. 2022 Jul 22;10:881679. doi: 10.3389/fbioe.2022.881679. eCollection 2022.
8
In vitro and in silico parameters for precise cgMLST typing of Listeria monocytogenes.体外和计算机模拟参数用于精确的单核细胞增生李斯特菌 cgMLST 分型。
BMC Genomics. 2022 Mar 26;23(1):235. doi: 10.1186/s12864-022-08437-4.
9
Bacteriophages Roam the Wheat Phyllosphere.噬菌体在小麦叶围游荡。
Viruses. 2022 Jan 26;14(2):244. doi: 10.3390/v14020244.
10
A simple guide to de novo transcriptome assembly and annotation.从头转录组组装与注释简明指南。
Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab563.
一种针对大型读取数据集的改进过滤算法及其在单细胞组装中的应用。
BMC Bioinformatics. 2017 Jul 3;18(1):324. doi: 10.1186/s12859-017-1724-7.
4
A scalable and memory-efficient algorithm for de novo transcriptome assembly of non-model organisms.一种用于非模式生物从头转录组组装的可扩展且内存高效的算法。
BMC Genomics. 2017 May 24;18(Suppl 4):387. doi: 10.1186/s12864-017-3735-1.
5
Ensembl 2017.Ensembl 2017年
Nucleic Acids Res. 2017 Jan 4;45(D1):D635-D642. doi: 10.1093/nar/gkw1104. Epub 2016 Nov 28.
6
Metagenomic Assembly: Overview, Challenges and Applications.宏基因组组装:概述、挑战与应用
Yale J Biol Med. 2016 Sep 30;89(3):353-362. eCollection 2016 Sep.
7
Assembly, Assessment, and Availability of De novo Generated Eukaryotic Transcriptomes.从头合成的真核转录组的组装、评估与可用性
Front Genet. 2016 Jan 11;6:361. doi: 10.3389/fgene.2015.00361. eCollection 2015.
8
The khmer software package: enabling efficient nucleotide sequence analysis.高棉软件包:实现高效的核苷酸序列分析
F1000Res. 2015 Sep 25;4:900. doi: 10.12688/f1000research.6924.1. eCollection 2015.
9
Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads.Rcorrector:对Illumina RNA测序读数进行高效准确的纠错。
Gigascience. 2015 Oct 19;4:48. doi: 10.1186/s13742-015-0089-y. eCollection 2015.
10
Evaluation of de novo transcriptome assemblies from RNA-Seq data.基于RNA测序数据的从头转录组组装评估。
Genome Biol. 2014 Dec 21;15(12):553. doi: 10.1186/s13059-014-0553-5.