• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

Minirmd:一种通过多个 minimizers 进行短读段准确快速去重的工具。

Minirmd: accurate and fast duplicate removal tool for short reads via multiple minimizers.

机构信息

College of Information Science and Engineering, Hunan University, Changsha, Hunan 410012, China.

Advanced Analytics Institute, University of Technology Sydney, Broadway, NSW 2007, Australia.

出版信息

Bioinformatics. 2021 Jul 12;37(11):1604-1606. doi: 10.1093/bioinformatics/btaa915.

DOI:10.1093/bioinformatics/btaa915
PMID:33112385
Abstract

SUMMARY

Removing duplicate and near-duplicate reads, generated by high-throughput sequencing technologies, is able to reduce computational resources in downstream applications. Here we develop minirmd, a de novo tool to remove duplicate reads via multiple rounds of clustering using different length of minimizer. Experiments demonstrate that minirmd removes more near-duplicate reads than existing clustering approaches and is faster than existing multi-core tools. To the best of our knowledge, minirmd is the first tool to remove near-duplicates on reverse-complementary strand.

AVAILABILITY AND IMPLEMENTATION

https://github.com/yuansliu/minirmd.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

摘要

使用高通量测序技术产生的重复和近似重复reads 可以减少下游应用中的计算资源。本文开发了 minirmd,这是一种通过使用不同长度的 minimizer 进行多轮聚类来去除重复reads 的全新工具。实验表明,minirmd 比现有的聚类方法去除了更多的近似重复reads,并且比现有的多核工具更快。据我们所知,minirmd 是第一个去除反向互补链上的近似重复reads 的工具。

可用性和实现

https://github.com/yuansliu/minirmd。

补充信息

补充数据可在 Bioinformatics 在线获得。

相似文献

1
Minirmd: accurate and fast duplicate removal tool for short reads via multiple minimizers.Minirmd:一种通过多个 minimizers 进行短读段准确快速去重的工具。
Bioinformatics. 2021 Jul 12;37(11):1604-1606. doi: 10.1093/bioinformatics/btaa915.
2
Nubeam-dedup: a fast and RAM-efficient tool to de-duplicate sequencing reads without mapping.Nubeam-dedup:一款快速且节省内存的去重工具,无需进行测序读取映射。
Bioinformatics. 2020 May 1;36(10):3254-3256. doi: 10.1093/bioinformatics/btaa112.
3
Gencore: an efficient tool to generate consensus reads for error suppressing and duplicate removing of NGS data.Gencore:一种高效的工具,用于生成共识读数,以抑制 NGS 数据的错误并去除重复。
BMC Bioinformatics. 2019 Dec 27;20(Suppl 23):606. doi: 10.1186/s12859-019-3280-9.
4
Index suffix-prefix overlaps by (w, k)-minimizer to generate long contigs for reads compression.通过 (w, k)-最小化子索引后缀-前缀重叠来生成用于读取压缩的长连续体。
Bioinformatics. 2019 Jun 1;35(12):2066-2074. doi: 10.1093/bioinformatics/bty936.
5
Alignment-free clustering of UMI tagged DNA molecules.无比对聚类分析 UMI 标签化 DNA 分子。
Bioinformatics. 2019 Jun 1;35(11):1829-1836. doi: 10.1093/bioinformatics/bty888.
6
Filtering duplicate reads from 454 pyrosequencing data.从 454 焦磷酸测序数据中过滤重复读取。
Bioinformatics. 2013 Apr 1;29(7):830-6. doi: 10.1093/bioinformatics/btt047. Epub 2013 Feb 1.
7
ParDRe: faster parallel duplicated reads removal tool for sequencing studies.ParDRe:用于测序研究的更快的并行重复读数去除工具。
Bioinformatics. 2016 May 15;32(10):1562-4. doi: 10.1093/bioinformatics/btw038. Epub 2016 Jan 22.
8
Toward perfect reads: self-correction of short reads via mapping on de Bruijn graphs.迈向完美读段:通过在 De Bruijn 图上进行映射来自我纠正短读段。
Bioinformatics. 2020 Mar 1;36(5):1374-1381. doi: 10.1093/bioinformatics/btz102.
9
BFC: correcting Illumina sequencing errors.BFC:校正Illumina测序错误。
Bioinformatics. 2015 Sep 1;31(17):2885-7. doi: 10.1093/bioinformatics/btv290. Epub 2015 May 6.
10
OGRE: Overlap Graph-based metagenomic Read clustEring.OGRE:基于重叠图的宏基因组读聚类。
Bioinformatics. 2021 May 17;37(7):905-912. doi: 10.1093/bioinformatics/btaa760.

引用本文的文献

1
Phylogenetic relationships and the identification of allopolyploidy in circumpolar Silene sect. Physolychnis.环北极蝇子草属Physolychnis组的系统发育关系及异源多倍体鉴定
Am J Bot. 2025 Jun;112(6):e70051. doi: 10.1002/ajb2.70051. Epub 2025 May 22.
2
Construction of edit-distance graphs for large sets of short reads through minimizer-bucketing.通过最小化器分桶为大量短读段构建编辑距离图。
Bioinform Adv. 2025 Apr 10;5(1):vbaf081. doi: 10.1093/bioadv/vbaf081. eCollection 2025.
3
When less is more: sketching with minimizers in genomics.
少即是多:基因组学中的最小化器草图。
Genome Biol. 2024 Oct 14;25(1):270. doi: 10.1186/s13059-024-03414-4.
4
Creating and Using Minimizer Sketches in Computational Genomics.在计算基因组学中创建和使用最小草图。
J Comput Biol. 2023 Dec;30(12):1251-1276. doi: 10.1089/cmb.2023.0094. Epub 2023 Aug 30.
5
Recall DNA methylation levels at low coverage sites using a CNN model in WGBS.使用 CNN 模型在 WGBS 中召回低覆盖位点的 DNA 甲基化水平。
PLoS Comput Biol. 2023 Jun 14;19(6):e1011205. doi: 10.1371/journal.pcbi.1011205. eCollection 2023 Jun.
6
SparkGC: Spark based genome compression for large collections of genomes.SparkGC:基于 Spark 的基因组压缩方法,适用于大规模基因组集合。
BMC Bioinformatics. 2022 Jul 25;23(1):297. doi: 10.1186/s12859-022-04825-5.
7
Fast-HBR: Fast hash based duplicate read remover.Fast-HBR:基于快速哈希的重复读取消除器。
Bioinformation. 2022 Jan 31;18(1):36-40. doi: 10.6026/97320630018036. eCollection 2022.
8
Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm.通过极端梯度提升算法识别DNA结合蛋白。
Front Genet. 2022 Jan 28;12:821996. doi: 10.3389/fgene.2021.821996. eCollection 2021.
9
Research on the Computational Prediction of Essential Genes.必需基因的计算预测研究
Front Cell Dev Biol. 2021 Dec 6;9:803608. doi: 10.3389/fcell.2021.803608. eCollection 2021.
10
Hamming-shifting graph of genomic short reads: Efficient construction and its application for compression.基因组短读段的汉明移位图:高效构建及其在压缩中的应用
PLoS Comput Biol. 2021 Jul 19;17(7):e1009229. doi: 10.1371/journal.pcbi.1009229. eCollection 2021 Jul.