Suppr超能文献

基于机器学习,仅通过基因相对位置对五个模式真核生物的基因功能进行预测。

Gene function prediction in five model eukaryotes exclusively based on gene relative location through machine learning.

机构信息

Departamento de Biología del Neurodesarrollo, Instituto de Investigaciones Biológicas Clemente Estable, Av. Italia 3318, 11600, Montevideo, Uruguay.

Unidad de Bioquímica y Proteómica Analíticas, Instituto Pasteur de Montevideo, Montevideo, Uruguay.

出版信息

Sci Rep. 2022 Jul 8;12(1):11655. doi: 10.1038/s41598-022-15329-w.

Abstract

The function of most genes is unknown. The best results in automated function prediction are obtained with machine learning-based methods that combine multiple data sources, typically sequence derived features, protein structure and interaction data. Even though there is ample evidence showing that a gene's function is not independent of its location, the few available examples of gene function prediction based on gene location rely on sequence identity between genes of different organisms and are thus subjected to the limitations of the relationship between sequence and function. Here we predict thousands of gene functions in five model eukaryotes (Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Mus musculus and Homo sapiens) using machine learning models exclusively trained with features derived from the location of genes in the genomes to which they belong. Our aim was not to obtain the best performing method to automated function prediction but to explore the extent to which a gene's location can predict its function in eukaryotes. We found that our models outperform BLAST when predicting terms from Biological Process and Cellular Component Ontologies, showing that, at least in some cases, gene location alone can be more useful than sequence to infer gene function.

摘要

大多数基因的功能是未知的。在自动化功能预测中,最好的结果是使用基于机器学习的方法获得的,这些方法结合了多种数据源,通常是序列衍生特征、蛋白质结构和相互作用数据。尽管有大量证据表明基因的功能与其位置无关,但基于基因位置进行基因功能预测的少数可用示例依赖于不同生物体基因之间的序列同一性,因此受到序列和功能之间关系的限制。在这里,我们使用仅使用源自其所属基因组中基因位置的特征训练的机器学习模型,预测了 5 种模式真核生物(酿酒酵母、秀丽隐杆线虫、黑腹果蝇、小家鼠和智人)中的数千个基因功能。我们的目的不是获得用于自动化功能预测的表现最佳的方法,而是探索基因位置在多大程度上可以预测真核生物的功能。我们发现,与 BLAST 相比,我们的模型在预测生物过程和细胞成分本体论中的术语时表现更好,这表明,至少在某些情况下,仅基因位置就比序列更有助于推断基因功能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bac6/9270439/b5abf5575650/41598_2022_15329_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验