预测基因本体注释的计算算法。

Computational algorithms to predict Gene Ontology annotations.

作者信息

Pinoli Pietro, Chicco Davide, Masseroli Marco

出版信息

BMC Bioinformatics. 2015;16 Suppl 6(Suppl 6):S4. doi: 10.1186/1471-2105-16-S6-S4. Epub 2015 Apr 17.

DOI:10.1186/1471-2105-16-S6-S4

PMID:25916950

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4416163/

Abstract

BACKGROUND

Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful.

METHODS

We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set.

RESULTS

We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster ). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm.

CONCLUSIONS

Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a significant number of novel annotations, demonstrating to actually be a helpful tool in supporting scientists in the curation process of gene functional annotations.

摘要

背景

基因功能注释是基因与描述基因功能特征的受控词汇表中的术语之间的关联，在现代生物学中至关重要。这些注释数据集，如基因本体论联盟提供的数据集，用于设计新颖的生物学实验并解释其结果。尽管它们很重要，但这些信息来源存在一些已知问题。它们是不完整的，因为生物学知识远未确定且迅速发展，并且可能存在一些错误注释。由于新注释的策划过程在经济和时间方面都是一个昂贵的过程，因此能够可靠地预测可能注释从而加快新基因注释发现的计算工具非常有用。

方法

我们使用了一组计算算法和加权方案，从一组已知注释中推断新的基因注释。我们采用了潜在语义分析方法，实现了两种流行算法（潜在语义索引和概率潜在语义分析），并提出了一种新颖的方法——语义改进潜在语义分析，该方法在考虑的基因集上增加了一个聚类步骤。此外，我们通过对输入集中的注释进行加权来改进这些算法。

结果

我们在三种模式生物基因（牛、斑马鱼和黑腹果蝇）的基因本体注释集上测试了我们的方法及其加权变体。这些方法展示了它们预测新基因注释的能力，加权程序显示出带来了有价值的改进，尽管获得的结果因输入注释集的维度和所考虑的算法而异。

结论

在所考虑的三种方法中，语义改进潜在语义分析提供了更好的结果。特别是，当与适当的加权策略相结合时，它能够预测大量新注释，证明实际上是支持科学家进行基因功能注释策划过程的有用工具。

相似文献

Computational algorithms to predict Gene Ontology annotations.预测基因本体注释的计算算法。

BMC Bioinformatics. 2015;16 Suppl 6(Suppl 6):S4. doi: 10.1186/1471-2105-16-S6-S4. Epub 2015 Apr 17.

Cross-organism learning method to discover new gene functionalities.跨生物学习方法发现新基因功能。

Comput Methods Programs Biomed. 2016 Apr;126:20-34. doi: 10.1016/j.cmpb.2015.12.002. Epub 2015 Dec 17.

Ontology-Based Prediction and Prioritization of Gene Functional Annotations.基于本体的基因功能注释预测与优先级排序

IEEE/ACM Trans Comput Biol Bioinform. 2016 Mar-Apr;13(2):248-60. doi: 10.1109/TCBB.2015.2459694.

GO functional similarity clustering depends on similarity measure, clustering method, and annotation completeness.GO 功能相似性聚类取决于相似性度量、聚类方法和注释完整性。

BMC Bioinformatics. 2019 Mar 27;20(1):155. doi: 10.1186/s12859-019-2752-2.

GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts.GOcats：一个将基因本体论分类为用户定义概念子图的工具。

PLoS One. 2020 Jun 11;15(6):e0233311. doi: 10.1371/journal.pone.0233311. eCollection 2020.

A relation based measure of semantic similarity for Gene Ontology annotations.一种基于关系的基因本体注释语义相似度度量方法。

BMC Bioinformatics. 2008 Nov 4;9:468. doi: 10.1186/1471-2105-9-468.

TopoICSim: a new semantic similarity measure based on gene ontology.TopoICSim：一种基于基因本体论的新语义相似性度量方法。

BMC Bioinformatics. 2016 Jul 29;17(1):296. doi: 10.1186/s12859-016-1160-0.

Novelty Indicator for Enhanced Prioritization of Predicted Gene Ontology Annotations.新型指标提高预测基因本体论注释的优先级。

IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):954-965. doi: 10.1109/TCBB.2017.2695459. Epub 2017 Apr 18.

Interspecies gene function prediction using semantic similarity.基于语义相似性的跨物种基因功能预测

BMC Syst Biol. 2016 Dec 23;10(Suppl 4):121. doi: 10.1186/s12918-016-0361-5.

Multi-Factored Gene-Gene Proximity Measures Exploiting Biological Knowledge Extracted from Gene Ontology: Application in Gene Clustering.多因素基因-基因邻近度度量方法，利用从基因本体论中提取的生物学知识：在基因聚类中的应用。

IEEE/ACM Trans Comput Biol Bioinform. 2020 Jan-Feb;17(1):207-219. doi: 10.1109/TCBB.2018.2849362. Epub 2018 Jun 21.

引用本文的文献

Finding Gene Associations by Text Mining and Annotating it with Gene Ontology.通过文本挖掘发现基因关联，并使用基因本体论对其进行注释。

Methods Mol Biol. 2022;2496:71-90. doi: 10.1007/978-1-0716-2305-3_4.

Protein function prediction with gene ontology: from traditional to deep learning models.利用基因本体进行蛋白质功能预测：从传统模型到深度学习模型

PeerJ. 2021 Aug 24;9:e12019. doi: 10.7717/peerj.12019. eCollection 2021.

USP11 degrades KLF4 via its deubiquitinase activity in liver diseases.USP11 通过其去泛素化酶活性在肝脏疾病中降解 KLF4。

J Cell Mol Med. 2021 Jul;25(14):6976-6987. doi: 10.1111/jcmm.16709. Epub 2021 Jun 10.

Supervised deep learning embeddings for the prediction of cervical cancer diagnosis.用于预测宫颈癌诊断的监督式深度学习嵌入

PeerJ Comput Sci. 2018 May 14;4:e154. doi: 10.7717/peerj-cs.154. eCollection 2018.

Gene function finding through cross-organism ensemble learning.通过跨物种集成学习进行基因功能发现。

BioData Min. 2021 Feb 12;14(1):14. doi: 10.1186/s13040-021-00239-w.

Supervised and Unsupervised End-to-End Deep Learning for Gene Ontology Classification of Neural In Situ Hybridization Images.用于神经原位杂交图像基因本体分类的监督式和非监督式端到端深度学习

Entropy (Basel). 2019 Feb 26;21(3):221. doi: 10.3390/e21030221.

Screening and identification of potential target genes in head and neck cancer using bioinformatics analysis.利用生物信息学分析对头颈部癌潜在靶基因进行筛选与鉴定

Oncol Lett. 2019 Sep;18(3):2955-2966. doi: 10.3892/ol.2019.10616. Epub 2019 Jul 15.

Computational prediction of diagnosis and feature selection on mesothelioma patient health records.计算预测间皮瘤患者健康记录的诊断和特征选择。

PLoS One. 2019 Jan 10;14(1):e0208737. doi: 10.1371/journal.pone.0208737. eCollection 2019.

Ten quick tips for machine learning in computational biology.计算生物学中机器学习的十条快速提示。

BioData Min. 2017 Dec 8;10:35. doi: 10.1186/s13040-017-0155-3. eCollection 2017.

Finding an appropriate equation to measure similarity between binary vectors: case studies on Indonesian and Japanese herbal medicines.找到一种合适的方程来衡量二进制向量之间的相似性：印度尼西亚和日本草药的案例研究。

BMC Bioinformatics. 2016 Dec 7;17(1):520. doi: 10.1186/s12859-016-1392-z.

本文引用的文献

Protein Function Prediction with Incomplete Annotations.利用不完整注释进行蛋白质功能预测。

IEEE/ACM Trans Comput Biol Bioinform. 2014 May-Jun;11(3):579-91. doi: 10.1109/TCBB.2013.142.

Explorative search of distributed bio-data to answer complex biomedical questions.探索性搜索分布式生物数据以回答复杂的生物医学问题。

BMC Bioinformatics. 2014;15 Suppl 1(Suppl 1):S3. doi: 10.1186/1471-2105-15-S1-S3. Epub 2014 Jan 10.

Matrix factorization-based data fusion for gene function prediction in baker's yeast and slime mold.基于矩阵分解的数据融合用于面包酵母和黏菌中的基因功能预测

Pac Symp Biocomput. 2014:400-11.

An efficient algorithm to integrate network and attribute data for gene function prediction.一种整合网络和属性数据以进行基因功能预测的高效算法。

Pac Symp Biocomput. 2014:388-99.

Genomic comparative analysis and gene function prediction in infectious diseases: application to the investigation of a meningitis outbreak.基因组比较分析和传染病中的基因功能预测：在脑膜炎爆发调查中的应用。

BMC Infect Dis. 2013 Nov 19;13:554. doi: 10.1186/1471-2334-13-554.

pROC: an open-source package for R and S+ to analyze and compare ROC curves.pROC：一个用于 R 和 S+的开源软件包，用于分析和比较 ROC 曲线。

BMC Bioinformatics. 2011 Mar 17;12:77. doi: 10.1186/1471-2105-12-77.

Reactome: a database of reactions, pathways and biological processes.Reactome：一个关于反应、通路和生物过程的数据库。

Nucleic Acids Res. 2011 Jan;39(Database issue):D691-7. doi: 10.1093/nar/gkq1018. Epub 2010 Nov 9.

Text Mining approaches for automated literature knowledge extraction and representation.用于自动文献知识提取与表示的文本挖掘方法。

Stud Health Technol Inform. 2010;160(Pt 2):954-8.

Predicting novel human gene ontology annotations using semantic analysis.利用语义分析预测新的人类基因本体论注释。

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jan-Mar;7(1):91-9. doi: 10.1109/TCBB.2008.29.

Information theory applied to the sparse gene ontology annotation network to predict novel gene function.信息论应用于稀疏基因本体注释网络以预测新的基因功能。

Bioinformatics. 2007 Jul 1;23(13):i529-38. doi: 10.1093/bioinformatics/btm195.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

预测基因本体注释的计算算法。

Computational algorithms to predict Gene Ontology annotations.

作者信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献