OrfM：一种用于宏基因组数据的快速开放阅读框预测工具。

OrfM: a fast open reading frame predictor for metagenomic data.

作者信息

Woodcroft Ben J, Boyd Joel A, Tyson Gene W

机构信息

Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD 4072, Australia.

出版信息

Bioinformatics. 2016 Sep 1;32(17):2702-3. doi: 10.1093/bioinformatics/btw241. Epub 2016 May 3.

DOI:10.1093/bioinformatics/btw241

PMID:27153669

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5013905/

Abstract

UNLABELLED

Finding and translating stretches of DNA lacking stop codons is a task common in the analysis of sequence data. However, the computational tools for finding open reading frames are sufficiently slow that they are becoming a bottleneck as the volume of sequence data grows. This computational bottleneck is especially problematic in metagenomics when searching unassembled reads, or screening assembled contigs for genes of interest. Here, we present OrfM, a tool to rapidly identify open reading frames (ORFs) in sequence data by applying the Aho-Corasick algorithm to find regions uninterrupted by stop codons. Benchmarking revealed that OrfM finds identical ORFs to similar tools ('GetOrf' and 'Translate') but is four-five times faster. While OrfM is sequencing platform-agnostic, it is best suited to large, high quality datasets such as those produced by Illumina sequencers.

AVAILABILITY AND IMPLEMENTATION

Source code and binaries are freely available for download at http://github.com/wwood/OrfM or through GNU Guix under the LGPL 3+ license. OrfM is implemented in C and supported on GNU/Linux and OSX.

CONTACTS

b.woodcroft@uq.edu.au

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

未加标签

查找和翻译缺乏终止密码子的DNA片段是序列数据分析中的常见任务。然而，用于查找开放阅读框的计算工具速度足够慢，以至于随着序列数据量的增长，它们正成为一个瓶颈。在宏基因组学中，当搜索未组装的读段或筛选组装的重叠群以寻找感兴趣的基因时，这种计算瓶颈尤其成问题。在这里，我们展示了OrfM，这是一种通过应用Aho-Corasick算法来快速识别序列数据中的开放阅读框（ORF）的工具，以找到未被终止密码子中断的区域。基准测试表明，OrfM与类似工具（“GetOrf”和“Translate”）找到的ORF相同，但速度快四到五倍。虽然OrfM与测序平台无关，但它最适合大型、高质量的数据集，如Illumina测序仪产生的数据集。

可用性和实现方式

源代码和二进制文件可在http://github.com/wwood/OrfM免费下载，或通过GNU Guix在LGPL 3+许可下获取。OrfM用C语言实现，支持GNU/Linux和OSX。

联系方式

b.woodcroft@uq.edu.au

补充信息

补充数据可在《生物信息学》在线获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c77b/5013905/48083ed06f00/btw241f1p.jpg

相似文献

OrfM: a fast open reading frame predictor for metagenomic data.

Bioinformatics. 2016 Sep 1;32(17):2702-3. doi: 10.1093/bioinformatics/btw241. Epub 2016 May 3.

FARAO: the flexible all-round annotation organizer.

Bioinformatics. 2016 Dec 1;32(23):3664-3666. doi: 10.1093/bioinformatics/btw499. Epub 2016 Aug 4.

UProC: tools for ultra-fast protein domain classification.

Bioinformatics. 2015 May 1;31(9):1382-8. doi: 10.1093/bioinformatics/btu843. Epub 2014 Dec 23.

Simulating Illumina metagenomic data with InSilicoSeq.

Bioinformatics. 2019 Feb 1;35(3):521-522. doi: 10.1093/bioinformatics/bty630.

LiveKraken--real-time metagenomic classification of illumina data.

Bioinformatics. 2018 Nov 1;34(21):3750-3752. doi: 10.1093/bioinformatics/bty433.

Pinstripe: a suite of programs for integrating transcriptomic and proteomic datasets identifies novel proteins and improves differentiation of protein-coding and non-coding genes.

Bioinformatics. 2012 Dec 1;28(23):3042-50. doi: 10.1093/bioinformatics/bts582. Epub 2012 Oct 7.

MMseqs software suite for fast and deep clustering and searching of large protein sequence sets.

Bioinformatics. 2016 May 1;32(9):1323-30. doi: 10.1093/bioinformatics/btw006. Epub 2016 Jan 6.

Metagenomic binning through low-density hashing.

Bioinformatics. 2019 Jan 15;35(2):219-226. doi: 10.1093/bioinformatics/bty611.

MetaFast: fast reference-free graph-based comparison of shotgun metagenomic data.

Bioinformatics. 2016 Sep 15;32(18):2760-7. doi: 10.1093/bioinformatics/btw312. Epub 2016 Jun 3.

Large scale microbiome profiling in the cloud.

Bioinformatics. 2019 Jul 15;35(14):i13-i22. doi: 10.1093/bioinformatics/btz356.

引用本文的文献

Epigenetic silencing and genome dynamics determine the fate of giant virus endogenizations in Acanthamoeba.

BMC Biol. 2025 Jul 1;23(1):171. doi: 10.1186/s12915-025-02280-1.

Horizontal transmission of functionally diverse transposons is a major source of new introns.

Proc Natl Acad Sci U S A. 2025 May 27;122(21):e2414761122. doi: 10.1073/pnas.2414761122. Epub 2025 May 22.

Reprogramming site-specific retrotransposon activity to new DNA sites.

Nature. 2025 Apr 9. doi: 10.1038/s41586-025-08877-4.

Microproteins unveiling new dimensions in cancer.

Funct Integr Genomics. 2024 Sep 3;24(5):152. doi: 10.1007/s10142-024-01426-8.

Microbial community response to hydrocarbon exposure in iron oxide mats: an environmental study.

Front Microbiol. 2024 May 10;15:1388973. doi: 10.3389/fmicb.2024.1388973. eCollection 2024.

The low-temperature germinating spores of the thermophilic contribute to an extremely high sulfate reduction in burning coal seams.

Front Microbiol. 2023 Sep 15;14:1204102. doi: 10.3389/fmicb.2023.1204102. eCollection 2023.

Metagenomic approach to infer rumen microbiome derived traits of cattle.

World J Microbiol Biotechnol. 2023 Jul 13;39(9):250. doi: 10.1007/s11274-023-03694-1.

Alternative Splicing Analysis Revealed the Role of Alpha-Linolenic Acid and Carotenoids in Fruit Development of .

Int J Mol Sci. 2023 May 12;24(10):8666. doi: 10.3390/ijms24108666.

Methyltransferase-like (METTL) homologues participate in antiviral responses.

Plant Signal Behav. 2023 Dec 31;18(1):2214760. doi: 10.1080/15592324.2023.2214760.

Metagenome-derived virus-microbe ratios across ecosystems.

ISME J. 2023 Oct;17(10):1552-1563. doi: 10.1038/s41396-023-01431-y. Epub 2023 May 11.

本文引用的文献

Challenges and opportunities in understanding microbial communities with metagenome assembly (accompanied by IPython Notebook tutorial).

Front Microbiol. 2015 Jul 9;6:678. doi: 10.3389/fmicb.2015.00678. eCollection 2015.

Patterns in wetland microbial community composition and functional gene repertoire associated with methane emissions.

mBio. 2015 May 19;6(3):e00066-15. doi: 10.1128/mBio.00066-15.

Metagenomics using next-generation sequencing.

Methods Mol Biol. 2014;1096:183-201. doi: 10.1007/978-1-62703-712-9_15.

Updating benchtop sequencing performance comparison.

Nat Biotechnol. 2013 Apr;31(4):294-6. doi: 10.1038/nbt.2522.

Comparative metagenomic and rRNA microbial diversity characterization using archaeal and bacterial synthetic communities.

Environ Microbiol. 2013 Jun;15(6):1882-99. doi: 10.1111/1462-2920.12086. Epub 2013 Feb 6.

IMG: the Integrated Microbial Genomes database and comparative analysis system.

Nucleic Acids Res. 2012 Jan;40(Database issue):D115-22. doi: 10.1093/nar/gkr1044.

BLAST+: architecture and applications.

BMC Bioinformatics. 2009 Dec 15;10:421. doi: 10.1186/1471-2105-10-421.

Comparative metagenomics of microbial communities.

Science. 2005 Apr 22;308(5721):554-7. doi: 10.1126/science.1107851.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

OrfM：一种用于宏基因组数据的快速开放阅读框预测工具。

OrfM: a fast open reading frame predictor for metagenomic data.

作者信息

Woodcroft Ben J, Boyd Joel A, Tyson Gene W

机构信息

Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, QLD 4072, Australia.

出版信息

Bioinformatics. 2016 Sep 1;32(17):2702-3. doi: 10.1093/bioinformatics/btw241. Epub 2016 May 3.

DOI:10.1093/bioinformatics/btw241

PMID:27153669

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5013905/

Abstract

UNLABELLED

AVAILABILITY AND IMPLEMENTATION

Source code and binaries are freely available for download at http://github.com/wwood/OrfM or through GNU Guix under the LGPL 3+ license. OrfM is implemented in C and supported on GNU/Linux and OSX.

CONTACTS

b.woodcroft@uq.edu.au

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

未加标签

可用性和实现方式

源代码和二进制文件可在http://github.com/wwood/OrfM免费下载，或通过GNU Guix在LGPL 3+许可下获取。OrfM用C语言实现，支持GNU/Linux和OSX。

联系方式

b.woodcroft@uq.edu.au

补充信息

补充数据可在《生物信息学》在线获取。

OrfM：一种用于宏基因组数据的快速开放阅读框预测工具。

OrfM: a fast open reading frame predictor for metagenomic data.

作者信息

机构信息

出版信息

UNLABELLED

AVAILABILITY AND IMPLEMENTATION

CONTACTS

SUPPLEMENTARY INFORMATION

未加标签

可用性和实现方式

联系方式

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献

OrfM：一种用于宏基因组数据的快速开放阅读框预测工具。

OrfM: a fast open reading frame predictor for metagenomic data.

作者信息

机构信息

出版信息

UNLABELLED

AVAILABILITY AND IMPLEMENTATION

CONTACTS

SUPPLEMENTARY INFORMATION

未加标签

可用性和实现方式

联系方式

补充信息