Suppr超能文献

利用语义分析预测新的人类基因本体论注释。

Predicting novel human gene ontology annotations using semantic analysis.

机构信息

Department of Computer Science, Wayne State University, 5143 Cass Ave., Detroit, MI 48202, USA.

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2010 Jan-Mar;7(1):91-9. doi: 10.1109/TCBB.2008.29.

Abstract

The correct interpretation of many molecular biology experiments depends in an essential way on the accuracy and consistency of the existing annotation databases. Such databases are meant to act as repositories for our biological knowledge as we acquire and refine it. Hence, by definition, they are incomplete at any given time. In this paper, we describe a technique that improves our previous method for predicting novel GO annotations by extracting implicit semantic relationships between genes and functions. In this work, we use a vector space model and a number of weighting schemes in addition to our previous latent semantic indexing approach. The technique described here is able to take into consideration the hierarchical structure of the Gene Ontology (GO) and can weight differently GO terms situated at different depths. The prediction abilities of 15 different weighting schemes are compared and evaluated. Nine such schemes were previously used in other problem domains, while six of them are introduced in this paper. The best weighting scheme was a novel scheme, n2tn. Out of the top 50 functional annotations predicted using this weighting scheme, we found support in the literature for 84 percent of them, while 6 percent of the predictions were contradicted by the existing literature. For the remaining 10 percent, we did not find any relevant publications to confirm or contradict the predictions. The n2tn weighting scheme also outperformed the simple binary scheme used in our previous approach.

摘要

许多分子生物学实验的正确解释在很大程度上依赖于现有注释数据库的准确性和一致性。这些数据库旨在作为我们获取和完善生物知识的知识库。因此,从定义上讲,它们在任何给定的时间都是不完整的。在本文中,我们描述了一种技术,通过提取基因和功能之间的隐含语义关系,改进了我们以前用于预测新的 GO 注释的方法。在这项工作中,我们除了使用先前的潜在语义索引方法之外,还使用了向量空间模型和几种加权方案。这里描述的技术能够考虑到基因本体论 (GO) 的层次结构,并可以对位于不同深度的 GO 术语进行不同的加权。比较和评估了 15 种不同加权方案的预测能力。其中 9 种方案以前在其他问题领域中使用过,而其中 6 种是本文中引入的。最佳加权方案是一种新方案 n2tn。在使用这种加权方案预测的前 50 个功能注释中,我们在文献中找到了 84%的注释的支持,而 6%的预测与现有文献相矛盾。对于剩下的 10%,我们没有找到任何相关的出版物来证实或反驳这些预测。n2tn 加权方案也优于我们以前方法中使用的简单二进制方案。

相似文献

1
Predicting novel human gene ontology annotations using semantic analysis.
IEEE/ACM Trans Comput Biol Bioinform. 2010 Jan-Mar;7(1):91-9. doi: 10.1109/TCBB.2008.29.
2
A semantic analysis of the annotations of the human genome.
Bioinformatics. 2005 Aug 15;21(16):3416-21. doi: 10.1093/bioinformatics/bti538. Epub 2005 Jun 14.
3
GOChase-II: correcting semantic inconsistencies from Gene Ontology-based annotations for gene products.
BMC Bioinformatics. 2011 Feb 15;12 Suppl 1(Suppl 1):S40. doi: 10.1186/1471-2105-12-S1-S40.
4
Computational algorithms to predict Gene Ontology annotations.
BMC Bioinformatics. 2015;16 Suppl 6(Suppl 6):S4. doi: 10.1186/1471-2105-16-S6-S4. Epub 2015 Apr 17.
5
GOcats: A tool for categorizing Gene Ontology into subgraphs of user-defined concepts.
PLoS One. 2020 Jun 11;15(6):e0233311. doi: 10.1371/journal.pone.0233311. eCollection 2020.
6
A relation based measure of semantic similarity for Gene Ontology annotations.
BMC Bioinformatics. 2008 Nov 4;9:468. doi: 10.1186/1471-2105-9-468.
8
Novelty Indicator for Enhanced Prioritization of Predicted Gene Ontology Annotations.
IEEE/ACM Trans Comput Biol Bioinform. 2018 May-Jun;15(3):954-965. doi: 10.1109/TCBB.2017.2695459. Epub 2017 Apr 18.
9
Information theory applied to the sparse gene ontology annotation network to predict novel gene function.
Bioinformatics. 2007 Jul 1;23(13):i529-38. doi: 10.1093/bioinformatics/btm195.
10
Ontology-Based Prediction and Prioritization of Gene Functional Annotations.
IEEE/ACM Trans Comput Biol Bioinform. 2016 Mar-Apr;13(2):248-60. doi: 10.1109/TCBB.2015.2459694.

引用本文的文献

1
HEC-ASD: a hybrid ensemble-based classification model for predicting autism spectrum disorder disease genes.
BMC Bioinformatics. 2022 Dec 21;23(1):554. doi: 10.1186/s12859-022-05099-7.
2
Gene function finding through cross-organism ensemble learning.
BioData Min. 2021 Feb 12;14(1):14. doi: 10.1186/s13040-021-00239-w.
3
A Literature Review of Gene Function Prediction by Modeling Gene Ontology.
Front Genet. 2020 Apr 24;11:400. doi: 10.3389/fgene.2020.00400. eCollection 2020.
4
Protein Function Prediction Using Deep Restricted Boltzmann Machines.
Biomed Res Int. 2017;2017:1729301. doi: 10.1155/2017/1729301. Epub 2017 Jun 28.
5
NoGOA: predicting noisy GO annotations using evidences and sparse representation.
BMC Bioinformatics. 2017 Jul 21;18(1):350. doi: 10.1186/s12859-017-1764-z.
6
Predicting protein function via downward random walks on a gene ontology.
BMC Bioinformatics. 2015 Aug 27;16:271. doi: 10.1186/s12859-015-0713-y.
7
A method of searching for related literature on protein structure analysis by considering a user's intention.
BMC Bioinformatics. 2015;16 Suppl 7(Suppl 7):S4. doi: 10.1186/1471-2105-16-S7-S4. Epub 2015 Apr 23.
8
Hierarchical ensemble methods for protein function prediction.
ISRN Bioinform. 2014 May 4;2014:901419. doi: 10.1155/2014/901419. eCollection 2014.
9
Computational algorithms to predict Gene Ontology annotations.
BMC Bioinformatics. 2015;16 Suppl 6(Suppl 6):S4. doi: 10.1186/1471-2105-16-S6-S4. Epub 2015 Apr 17.
10
Gene function prediction based on the Gene Ontology hierarchical structure.
PLoS One. 2014 Sep 5;9(9):e107187. doi: 10.1371/journal.pone.0107187. eCollection 2014.

本文引用的文献

1
Ontological analysis of gene expression data: current tools, limitations, and open problems.
Bioinformatics. 2005 Sep 15;21(18):3587-95. doi: 10.1093/bioinformatics/bti565. Epub 2005 Jun 30.
2
A semantic analysis of the annotations of the human genome.
Bioinformatics. 2005 Aug 15;21(16):3416-21. doi: 10.1093/bioinformatics/bti538. Epub 2005 Jun 14.
3
Textpresso: an ontology-based information retrieval and extraction system for biological literature.
PLoS Biol. 2004 Nov;2(11):e309. doi: 10.1371/journal.pbio.0020309. Epub 2004 Sep 21.
4
Gene clustering by latent semantic indexing of MEDLINE abstracts.
Bioinformatics. 2005 Jan 1;21(1):104-15. doi: 10.1093/bioinformatics/bth464. Epub 2004 Aug 12.
7
FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes.
Bioinformatics. 2004 Mar 1;20(4):578-80. doi: 10.1093/bioinformatics/btg455. Epub 2004 Jan 22.
9
GOstat: find statistically overrepresented Gene Ontologies within a group of genes.
Bioinformatics. 2004 Jun 12;20(9):1464-5. doi: 10.1093/bioinformatics/bth088. Epub 2004 Feb 12.
10
Genomics. Microarrays--guilt by association.
Science. 2003 Oct 10;302(5643):240-1. doi: 10.1126/science.1090887.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验