OrthoFiller：利用多个物种的数据提高基因组注释的完整性。

OrthoFiller: utilising data from multiple species to improve the completeness of genome annotations.

作者信息

Dunne Michael P, Kelly Steven

机构信息

Department of Plant Sciences, University of Oxford, South Parks Road, Oxford, OX1 3RB, UK.

出版信息

BMC Genomics. 2017 May 18;18(1):390. doi: 10.1186/s12864-017-3771-x.

DOI:10.1186/s12864-017-3771-x

PMID:28521726

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5437544/

Abstract

BACKROUND

Complete and accurate annotation of sequenced genomes is of paramount importance to their utility and analysis. Differences in gene prediction pipelines mean that genome annotations for a species can differ considerably in the quality and quantity of their predicted genes. Furthermore, genes that are present in genome sequences sometimes fail to be detected by computational gene prediction methods. Erroneously unannotated genes can lead to oversights and inaccurate assertions in biological investigations, especially for smaller-scale genome projects, which rely heavily on computational prediction.

RESULTS

Here we present OrthoFiller, a tool designed to address the problem of finding and adding such missing genes to genome annotations. OrthoFiller leverages information from multiple related species to identify those genes whose existence can be verified through comparison with known gene families, but which have not been predicted. By simulating missing gene annotations in real sequence datasets from both plants and fungi we demonstrate the accuracy and utility of OrthoFiller for finding missing genes and improving genome annotations. Furthermore, we show that applying OrthoFiller to existing "complete" genome annotations can identify and correct substantial numbers of erroneously missing genes in these two sets of species.

CONCLUSIONS

We show that significant improvements in the completeness of genome annotations can be made by leveraging information from multiple species.

摘要

背景

对测序基因组进行完整准确的注释对于其应用和分析至关重要。基因预测流程的差异意味着一个物种的基因组注释在预测基因的质量和数量上可能有很大差异。此外，基因组序列中存在的基因有时无法通过计算基因预测方法检测到。错误地未注释基因可能导致生物学研究中的疏忽和不准确的论断，特别是对于严重依赖计算预测的小规模基因组项目。

结果

我们在此展示了OrthoFiller，这是一种旨在解决在基因组注释中查找并添加此类缺失基因问题的工具。OrthoFiller利用来自多个相关物种的信息来识别那些通过与已知基因家族比较可以验证其存在，但尚未被预测到的基因。通过在来自植物和真菌的真实序列数据集中模拟缺失基因注释，我们证明了OrthoFiller在查找缺失基因和改进基因组注释方面的准确性和实用性。此外，我们表明将OrthoFiller应用于现有的“完整”基因组注释可以识别并纠正这两组物种中大量错误缺失的基因。

结论

我们表明，通过利用多个物种的信息，可以显著提高基因组注释的完整性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4353/5437544/38e4567d7373/12864_2017_3771_Fig1_HTML.jpg

相似文献

OrthoFiller: utilising data from multiple species to improve the completeness of genome annotations.

BMC Genomics. 2017 May 18;18(1):390. doi: 10.1186/s12864-017-3771-x.

CodingQuarry: highly accurate hidden Markov model gene prediction in fungal genomes using RNA-seq transcripts.

BMC Genomics. 2015 Mar 11;16(1):170. doi: 10.1186/s12864-015-1344-4.

GASS: genome structural annotation for Eukaryotes based on species similarity.

BMC Genomics. 2015 Mar 4;16(1):150. doi: 10.1186/s12864-015-1353-3.

CvManGO, a method for leveraging computational predictions to improve literature-based Gene Ontology annotations.

Database (Oxford). 2012 Mar 20;2012:bas001. doi: 10.1093/database/bas001. Print 2012.

Saccharomyces cerevisiae: gene annotation and genome variability, state of the art through comparative genomics.

Methods Mol Biol. 2011;759:31-40. doi: 10.1007/978-1-61779-173-4_2.

zDB: bacterial comparative genomics made easy.

mSystems. 2024 Jul 23;9(7):e0047324. doi: 10.1128/msystems.00473-24. Epub 2024 Jun 28.

Using computational predictions to improve literature-based Gene Ontology annotations: a feasibility study.

Database (Oxford). 2011 Mar 15;2011:bar004. doi: 10.1093/database/bar004. Print 2011.

OMGene: mutual improvement of gene models through optimisation of evolutionary conservation.

BMC Genomics. 2018 Apr 27;19(1):307. doi: 10.1186/s12864-018-4704-z.

BEACON: automated tool for Bacterial GEnome Annotation ComparisON.

BMC Genomics. 2015 Aug 18;16(1):616. doi: 10.1186/s12864-015-1826-4.

High-throughput comparison, functional annotation, and metabolic modeling of plant genomes using the PlantSEED resource.

Proc Natl Acad Sci U S A. 2014 Jul 1;111(26):9645-50. doi: 10.1073/pnas.1401329111. Epub 2014 Jun 9.

引用本文的文献

Endophyte genomes support greater metabolic gene cluster diversity compared with non-endophytes in Trichoderma.

PLoS One. 2023 Dec 21;18(12):e0289280. doi: 10.1371/journal.pone.0289280. eCollection 2023.

Leveraging genomic redundancy to improve inference and alignment of orthologous proteins.

G3 (Bethesda). 2023 Dec 6;13(12). doi: 10.1093/g3journal/jkad222.

Whole-Genome Sequence Data for the Holotype Strain of Diaporthe ilicicola, a Fungus Associated with Latent Fruit Rot in Deciduous Holly.

Microbiol Resour Announc. 2022 Sep 15;11(9):e0063122. doi: 10.1128/mra.00631-22. Epub 2022 Aug 22.

Giant Starship Elements Mobilize Accessory Genes in Fungal Genomes.

Mol Biol Evol. 2022 May 3;39(5). doi: 10.1093/molbev/msac109.

The state of Medusozoa genomics: current evidence and future challenges.

Gigascience. 2022 May 17;11. doi: 10.1093/gigascience/giac036.

Draft Genome Sequence of the Termite-Associated "Cuckoo Fungus," () sp. TMB Strain TB5.

Microbiol Resour Announc. 2021 Jan 7;10(1):e01230-20. doi: 10.1128/MRA.01230-20.

Machine learning: A powerful tool for gene function prediction in plants.

Appl Plant Sci. 2020 Jul 28;8(7):e11376. doi: 10.1002/aps3.11376. eCollection 2020 Jul.

What Is in Umbilicaria pustulata? A Metagenomic Approach to Reconstruct the Holo-Genome of a Lichen.

Genome Biol Evol. 2020 Apr 1;12(4):309-324. doi: 10.1093/gbe/evaa049.

TaF: a web platform for taxonomic profile-based fungal gene prediction.

Genes Genomics. 2019 Mar;41(3):337-342. doi: 10.1007/s13258-018-0766-1. Epub 2018 Nov 19.

ImproveAssembly - Tool for identifying new gene products and improving genome assembly.

PLoS One. 2018 Oct 26;13(10):e0206000. doi: 10.1371/journal.pone.0206000. eCollection 2018.

本文引用的文献

Are We There Yet? Reliably Estimating the Completeness of Plant Genome Sequences.

Plant Cell. 2016 Aug;28(8):1759-68. doi: 10.1105/tpc.16.00349. Epub 2016 Aug 10.

OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy.

Genome Biol. 2015 Aug 6;16(1):157. doi: 10.1186/s13059-015-0721-2.

BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Bioinformatics. 2015 Oct 1;31(19):3210-2. doi: 10.1093/bioinformatics/btv351. Epub 2015 Jun 9.

HISAT: a fast spliced aligner with low memory requirements.

Nat Methods. 2015 Apr;12(4):357-60. doi: 10.1038/nmeth.3317. Epub 2015 Mar 9.

Extensive error in the number of genes inferred from draft genome assemblies.

PLoS Comput Biol. 2014 Dec 4;10(12):e1003998. doi: 10.1371/journal.pcbi.1003998. eCollection 2014 Dec.

Sequencing of diverse mandarin, pummelo and orange genomes reveals complex history of admixture during citrus domestication.

Nat Biotechnol. 2014 Jul;32(7):656-62. doi: 10.1038/nbt.2906. Epub 2014 Jun 8.

The Brassica oleracea genome reveals the asymmetrical evolution of polyploid genomes.

Nat Commun. 2014 May 23;5:3930. doi: 10.1038/ncomms4930.

Technology: The $1,000 genome.

Nature. 2014 Mar 20;507(7492):294-5. doi: 10.1038/507294a.

Automated alignment-based curation of gene models in filamentous fungi.

BMC Bioinformatics. 2014 Jan 16;15:19. doi: 10.1186/1471-2105-15-19.

MAFFT multiple sequence alignment software version 7: improvements in performance and usability.

Mol Biol Evol. 2013 Apr;30(4):772-80. doi: 10.1093/molbev/mst010. Epub 2013 Jan 16.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

OrthoFiller：利用多个物种的数据提高基因组注释的完整性。

OrthoFiller: utilising data from multiple species to improve the completeness of genome annotations.

作者信息

机构信息

出版信息

BACKROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献