Orthograph：一种将编码核苷酸序列映射到直系同源基因簇的多功能工具。

Orthograph: a versatile tool for mapping coding nucleotide sequences to clusters of orthologous genes.

作者信息

Petersen Malte, Meusemann Karen, Donath Alexander, Dowling Daniel, Liu Shanlin, Peters Ralph S, Podsiadlowski Lars, Vasilikopoulos Alexandros, Zhou Xin, Misof Bernhard, Niehuis Oliver

机构信息

Center for Molecular Biodiversity Research, Zoological Research Museum Alexander Koenig, Adenauerallee 160, Bonn, 53113, Germany.

Australian National Insect Collection, CSIRO National Research Collections Australia (NRCA), Clunies Ross Street, Canberra, ACT 2601, Australia.

出版信息

BMC Bioinformatics. 2017 Feb 16;18(1):111. doi: 10.1186/s12859-017-1529-8.

DOI:10.1186/s12859-017-1529-8

PMID:28209129

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5312442/

Abstract

BACKGROUND

Orthology characterizes genes of different organisms that arose from a single ancestral gene via speciation, in contrast to paralogy, which is assigned to genes that arose via gene duplication. An accurate orthology assignment is a crucial step for comparative genomic studies. Orthologous genes in two organisms can be identified by applying a so-called reciprocal search strategy, given that complete information of the organisms' gene repertoire is available. In many investigations, however, only a fraction of the gene content of the organisms under study is examined (e.g., RNA sequencing). Here, identification of orthologous nucleotide or amino acid sequences can be achieved using a graph-based approach that maps nucleotide sequences to genes of known orthology. Existing implementations of this approach, however, suffer from algorithmic issues that may cause problems in downstream analyses.

RESULTS

We present a new software pipeline, Orthograph, that addresses and solves the above problems and implements useful features for a wide range of comparative genomic and transcriptomic analyses. Orthograph applies a best reciprocal hit search strategy using profile hidden Markov models and maps nucleotide sequences to the globally best matching cluster of orthologous genes, thus enabling researchers to conveniently and reliably delineate orthologs and paralogs from transcriptomic and genomic sequence data. We demonstrate the performance of our approach on de novo-sequenced and assembled transcript libraries of 24 species of apoid wasps (Hymenoptera: Aculeata) as well as on published genomic datasets.

CONCLUSION

With Orthograph, we implemented a best reciprocal hit approach to reference-based orthology prediction for coding nucleotide sequences such as RNAseq data. Orthograph is flexible, easy to use, open source and freely available at https://mptrsen.github.io/Orthograph . Additionally, we release 24 de novo-sequenced and assembled transcript libraries of apoid wasp species.

摘要

背景

直系同源性描述的是不同生物体中通过物种形成从单个祖先基因衍生而来的基因，与之相对的是旁系同源性，旁系同源性指的是通过基因复制产生的基因。准确的直系同源性分配是比较基因组研究的关键步骤。如果有生物体基因库的完整信息，那么可以通过应用所谓的相互搜索策略来识别两种生物体中的直系同源基因。然而，在许多研究中，仅检查了所研究生物体基因内容的一部分（例如RNA测序）。在这里，可以使用基于图形的方法来识别直系同源核苷酸或氨基酸序列，该方法将核苷酸序列映射到已知直系同源性的基因。然而，该方法的现有实现存在算法问题，可能会在下游分析中导致问题。

结果

我们提出了一种新的软件流程Orthograph，它解决了上述问题，并为广泛的比较基因组和转录组分析实现了有用的功能。Orthograph使用轮廓隐马尔可夫模型应用最佳相互比对搜索策略，并将核苷酸序列映射到直系同源基因的全局最佳匹配簇，从而使研究人员能够方便且可靠地从转录组和基因组序列数据中划分出直系同源基因和旁系同源基因。我们在24种apoide黄蜂（膜翅目：针尾部）的从头测序和组装转录本库以及已发表的基因组数据集上展示了我们方法的性能。

结论

通过Orthograph，我们实现了一种基于参考的直系同源性预测的最佳相互比对方法，用于诸如RNAseq数据等编码核苷酸序列。Orthograph灵活、易于使用、开源且可在https://mptrsen.github.io/Orthograph上免费获取。此外，我们发布了24种apoide黄蜂物种的从头测序和组装转录本库。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2612/5312442/94f077946ed9/12859_2017_1529_Fig1_HTML.jpg

相似文献

Orthograph: a versatile tool for mapping coding nucleotide sequences to clusters of orthologous genes.Orthograph：一种将编码核苷酸序列映射到直系同源基因簇的多功能工具。

BMC Bioinformatics. 2017 Feb 16;18(1):111. doi: 10.1186/s12859-017-1529-8.

OGS2: genome re-annotation of the jewel wasp Nasonia vitripennis.OGS2：丽蝇蛹集金小蜂基因组的重新注释

BMC Genomics. 2016 Aug 25;17(1):678. doi: 10.1186/s12864-016-2886-9.

BaitFisher: A Software Package for Multispecies Target DNA Enrichment Probe Design.BaitFisher：用于多物种目标 DNA 富集探针设计的软件包。

Mol Biol Evol. 2016 Jul;33(7):1875-86. doi: 10.1093/molbev/msw056. Epub 2016 Mar 23.

Phylogenomic analysis of Apoidea sheds new light on the sister group of bees.蜂类系统基因组分析为研究蜜蜂的姊妹群提供了新线索。

BMC Evol Biol. 2018 May 18;18(1):71. doi: 10.1186/s12862-018-1155-8.

SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups. splicedFamAlign：CDS 到基因拼接对齐和转录本同源物组的鉴定。

BMC Bioinformatics. 2019 Mar 29;20(Suppl 3):133. doi: 10.1186/s12859-019-2647-2.

Exploiting orthology and de novo transcriptome assembly to refine target sequence information.利用直系同源和从头转录组组装来优化目标序列信息。

BMC Med Genomics. 2019 May 23;12(1):69. doi: 10.1186/s12920-019-0524-5.

OrthoSelect: a protocol for selecting orthologous groups in phylogenomics.OrthoSelect：一种在系统发育基因组学中选择直系同源组的方案。

BMC Bioinformatics. 2009 Jul 16;10:219. doi: 10.1186/1471-2105-10-219.

A universal genomic coordinate translator for comparative genomics.用于比较基因组学的通用基因组坐标转换器。

BMC Bioinformatics. 2014 Jun 30;15:227. doi: 10.1186/1471-2105-15-227.

DNA barcodes identify 99 per cent of apoid wasp species (Hymenoptera: Ampulicidae, Crabronidae, Sphecidae) from the Western Palearctic.DNA 条形码可鉴定出 99% 的西方古北区螯蜂总科物种（膜翅目：细腰亚目，土蜂科，蜾蠃科）。

Mol Ecol Resour. 2019 Mar;19(2):476-484. doi: 10.1111/1755-0998.12963. Epub 2018 Dec 18.

Transcriptome and target DNA enrichment sequence data provide new insights into the phylogeny of vespid wasps (Hymenoptera: Aculeata: Vespidae).转录组和目标DNA富集序列数据为胡蜂（膜翅目：针尾部：胡蜂科）的系统发育提供了新的见解。

Mol Phylogenet Evol. 2017 Nov;116:213-226. doi: 10.1016/j.ympev.2017.08.020. Epub 2017 Sep 6.

引用本文的文献

Population Phylogenomics and Genetic Structure of the Polyphagous Leafminer, (Burgess) (Diptera: Agromyzidae).多食性潜叶蝇（Burgess）（双翅目：潜蝇科）的群体系统基因组学与遗传结构

Evol Appl. 2025 Jul 9;18(7):e70132. doi: 10.1111/eva.70132. eCollection 2025 Jul.

Feature Architecture-Aware Ortholog Search With fDOG Reveals the Distribution of Plant Cell Wall-Degrading Enzymes Across Life.基于fDOG的特征架构感知直系同源物搜索揭示了植物细胞壁降解酶在生命中的分布。

Mol Biol Evol. 2025 Jun 4;42(6). doi: 10.1093/molbev/msaf120.

Species Delimitation Using Genomic Data: Options and Limitations.利用基因组数据进行物种界定：方法与局限

Mol Ecol. 2025 Apr;34(8):e17717. doi: 10.1111/mec.17717. Epub 2025 Mar 3.

Phylogenomics resolves long-standing questions about the affinities of an endangered Corsican endemic fly.系统发生基因组学解决了一个长期存在的问题，即关于一种濒危科西嘉地方性飞蝇的亲缘关系。

J Insect Sci. 2024 Jul 1;24(4). doi: 10.1093/jisesa/ieae073.

PhyloAln: A Convenient Reference-Based Tool to Align Sequences and High-Throughput Reads for Phylogeny and Evolution in the Omic Era.PhyloAln：一个方便的基于参考的工具，用于在组学时代进行系统发育和进化的序列和高通量读取对齐。

Mol Biol Evol. 2024 Jul 3;41(7). doi: 10.1093/molbev/msae150.

Phylogenomics recovers multiple origins of portable case making in caddisflies (Insecta: Trichoptera), nature's underwater architects.系统发生基因组学揭示了水生建筑师——石蛾目昆虫（昆虫纲：毛翅目）中可移动鞘翅的多个起源。

Proc Biol Sci. 2024 Jul;291(2026):20240514. doi: 10.1098/rspb.2024.0514. Epub 2024 Jul 3.

Enigmatic : Shedding light on the delayed origin of bioluminescence in ancient Gondwanan click beetles.神秘的：揭示古代冈瓦纳叩甲生物发光延迟起源的奥秘

iScience. 2023 Nov 14;26(12):108440. doi: 10.1016/j.isci.2023.108440. eCollection 2023 Dec 15.

Phylogenomics reveals the history of host use in mosquitoes.系统发育基因组学揭示了蚊子宿主利用的历史。

Nat Commun. 2023 Oct 6;14(1):6252. doi: 10.1038/s41467-023-41764-y.

Potential Contribution of Ancient Introgression to the Evolution of a Derived Reproductive Strategy in Ricefishes.古代渗入对稻鱼类衍生生殖策略进化的潜在贡献。

Genome Biol Evol. 2023 Aug 1;15(8). doi: 10.1093/gbe/evad138.

The genome of the glasshouse plant noble rhubarb (Rheum nobile) provides a window into alpine adaptation.温室植物贵族大黄（Rheum nobile）的基因组为高山适应提供了一个窗口。

Commun Biol. 2023 Jul 10;6(1):706. doi: 10.1038/s42003-023-05044-1.

本文引用的文献

Phylogenomics of Annelida revisited: a cladistic approach using genome-wide expressed sequence tag data mining and examining the effects of missing data.环节动物门系统发育基因组学再探讨：一种使用全基因组表达序列标签数据挖掘的分支系统学方法并研究缺失数据的影响。

Cladistics. 2013 Aug;29(4):435-448. doi: 10.1111/cla.12015. Epub 2013 Feb 22.

BaitFisher: A Software Package for Multispecies Target DNA Enrichment Probe Design.BaitFisher：用于多物种目标 DNA 富集探针设计的软件包。

Mol Biol Evol. 2016 Jul;33(7):1875-86. doi: 10.1093/molbev/msw056. Epub 2016 Mar 23.

OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy.OrthoFinder：解决全基因组比较中的基本偏差可显著提高直系同源组推断准确性。

Genome Biol. 2015 Aug 6;16(1):157. doi: 10.1186/s13059-015-0721-2.

Whole-genome analyses resolve early branches in the tree of life of modern birds.全基因组分析解决了现代鸟类生命之树早期分支的问题。

Science. 2014 Dec 12;346(6215):1320-31. doi: 10.1126/science.1253451.

Extensive error in the number of genes inferred from draft genome assemblies.从基因组草图组装推断出的基因数量存在大量误差。

PLoS Comput Biol. 2014 Dec 4;10(12):e1003998. doi: 10.1371/journal.pcbi.1003998. eCollection 2014 Dec.

InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic.InParanoid 8：273个蛋白质组之间的直系同源分析，大部分为真核生物蛋白质组。

Nucleic Acids Res. 2015 Jan;43(Database issue):D234-9. doi: 10.1093/nar/gku1203. Epub 2014 Nov 27.

OrthoDB v8: update of the hierarchical catalog of orthologs and the underlying free software.OrthoDB v8：直系同源基因分层目录及底层免费软件的更新

Nucleic Acids Res. 2015 Jan;43(Database issue):D250-6. doi: 10.1093/nar/gku1220. Epub 2014 Nov 26.

The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements.2015年的OMA直系同源数据库：功能预测、对植物的更好支持、共线性视图及其他改进

Nucleic Acids Res. 2015 Jan;43(Database issue):D240-9. doi: 10.1093/nar/gku1158. Epub 2014 Nov 15.

Platyzoan paraphyly based on phylogenomic data supports a noncoelomate ancestry of spiralia.基于系统基因组学数据的扁盘动物并系发生支持螺旋动物具有无体腔的祖先。

Mol Biol Evol. 2014 Jul;31(7):1833-49. doi: 10.1093/molbev/msu143. Epub 2014 Apr 18.

A phylogenomics approach for selecting robust sets of phylogenetic markers.基于系统发生基因组学的方法选择稳健的系统发生标记数据集。

Nucleic Acids Res. 2014 Apr;42(7):e54. doi: 10.1093/nar/gku071. Epub 2014 Jan 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

Orthograph：一种将编码核苷酸序列映射到直系同源基因簇的多功能工具。

Orthograph: a versatile tool for mapping coding nucleotide sequences to clusters of orthologous genes.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献