Suppr超能文献

预测基因本体注释的计算算法。

Computational algorithms to predict Gene Ontology annotations.

作者信息

Pinoli Pietro, Chicco Davide, Masseroli Marco

出版信息

BMC Bioinformatics. 2015;16 Suppl 6(Suppl 6):S4. doi: 10.1186/1471-2105-16-S6-S4. Epub 2015 Apr 17.

Abstract

BACKGROUND

Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful.

METHODS

We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set.

RESULTS

We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster ). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm.

CONCLUSIONS

Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a significant number of novel annotations, demonstrating to actually be a helpful tool in supporting scientists in the curation process of gene functional annotations.

摘要

背景

基因功能注释是基因与描述基因功能特征的受控词汇表中的术语之间的关联,在现代生物学中至关重要。这些注释数据集,如基因本体论联盟提供的数据集,用于设计新颖的生物学实验并解释其结果。尽管它们很重要,但这些信息来源存在一些已知问题。它们是不完整的,因为生物学知识远未确定且迅速发展,并且可能存在一些错误注释。由于新注释的策划过程在经济和时间方面都是一个昂贵的过程,因此能够可靠地预测可能注释从而加快新基因注释发现的计算工具非常有用。

方法

我们使用了一组计算算法和加权方案,从一组已知注释中推断新的基因注释。我们采用了潜在语义分析方法,实现了两种流行算法(潜在语义索引和概率潜在语义分析),并提出了一种新颖的方法——语义改进潜在语义分析,该方法在考虑的基因集上增加了一个聚类步骤。此外,我们通过对输入集中的注释进行加权来改进这些算法。

结果

我们在三种模式生物基因(牛、斑马鱼和黑腹果蝇)的基因本体注释集上测试了我们的方法及其加权变体。这些方法展示了它们预测新基因注释的能力,加权程序显示出带来了有价值的改进,尽管获得的结果因输入注释集的维度和所考虑的算法而异。

结论

在所考虑的三种方法中,语义改进潜在语义分析提供了更好的结果。特别是,当与适当的加权策略相结合时,它能够预测大量新注释,证明实际上是支持科学家进行基因功能注释策划过程的有用工具。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验