Suppr超能文献

生物多样性基因组学时代的同源基因探索。

Quest for Orthologs in the Era of Biodiversity Genomics.

机构信息

Department for Applied Bioinformatics, Institute of Cell Biology and Neuroscience, Goethe University, Frankfurt, Germany.

Institute of Structural and Molecular Biology, University College London, WC1E 6BT, London, UK.

出版信息

Genome Biol Evol. 2024 Oct 9;16(10). doi: 10.1093/gbe/evae224.

Abstract

The era of biodiversity genomics is characterized by large-scale genome sequencing efforts that aim to represent each living taxon with an assembled genome. Generating knowledge from this wealth of data has not kept up with this pace. We here discuss major challenges to integrating these novel genomes into a comprehensive functional and evolutionary network spanning the tree of life. In summary, the expanding datasets create a need for scalable gene annotation methods. To trace gene function across species, new methods must seek to increase the resolution of ortholog analyses, e.g. by extending analyses to the protein domain level and by accounting for alternative splicing. Additionally, the scope of orthology prediction should be pushed beyond well-investigated proteomes. This demands the development of specialized methods for the identification of orthologs to short proteins and noncoding RNAs and for the functional characterization of novel gene families. Furthermore, protein structures predicted by machine learning are now readily available, but this new information is yet to be integrated with orthology-based analyses. Finally, an increasing focus should be placed on making orthology assignments adhere to the findable, accessible, interoperable, and reusable (FAIR) principles. This fosters green bioinformatics by avoiding redundant computations and helps integrating diverse scientific communities sharing the need for comparative genetics and genomics information. It should also help with communicating orthology-related concepts in a format that is accessible to the public, to counteract existing misinformation about evolution.

摘要

生物多样性基因组学时代的特点是大规模的基因组测序工作,旨在用组装的基因组来代表每个现存的分类单元。从这些丰富的数据中产生知识的速度并没有跟上这一步伐。我们在这里讨论了将这些新基因组整合到一个涵盖生命之树的综合功能和进化网络中的主要挑战。总之,不断扩大的数据集需要可扩展的基因注释方法。为了在物种间追踪基因功能,新的方法必须寻求提高同源基因分析的分辨率,例如通过将分析扩展到蛋白质结构域水平,并考虑可变剪接。此外,同源基因预测的范围应该超越研究充分的蛋白质组。这需要开发专门的方法来识别短蛋白和非编码 RNA 的同源基因,并对新的基因家族进行功能特征分析。此外,机器学习预测的蛋白质结构现在已经可以方便地获得,但这一新信息尚未与基于同源基因的分析相结合。最后,应该更加关注使同源基因分配符合可发现性、可访问性、互操作性和可重用性(FAIR)原则。这通过避免重复计算来促进绿色生物信息学,并有助于整合需要比较遗传学和基因组学信息的不同科学社区。它还有助于以公众可访问的格式传达与同源基因相关的概念,以对抗关于进化的现有错误信息。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6376/11523110/413f2747a362/evae224f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验