利用词邻近网络在基因本体论中将蛋白质注释作为术语分类。

Protein annotation as term categorization in the gene ontology using word proximity networks.

作者信息

Verspoor Karin, Cohn Judith, Joslyn Cliff, Mniszewski Sue, Rechtsteiner Andreas, Rocha Luis M, Simas Tiago

机构信息

Los Alamos National Laboratory, PO Box 1663, MS B256, Los Alamos, NM 87545, USA.

出版信息

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S20. doi: 10.1186/1471-2105-6-S1-S20. Epub 2005 May 24.

DOI:10.1186/1471-2105-6-S1-S20

PMID:15960833

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1869013/

Abstract

BACKGROUND

We participated in the BioCreAtIvE Task 2, which addressed the annotation of proteins into the Gene Ontology (GO) based on the text of a given document and the selection of evidence text from the document justifying that annotation. We approached the task utilizing several combinations of two distinct methods: an unsupervised algorithm for expanding words associated with GO nodes, and an annotation methodology which treats annotation as categorization of terms from a protein's document neighborhood into the GO.

RESULTS

The evaluation results indicate that the method for expanding words associated with GO nodes is quite powerful; we were able to successfully select appropriate evidence text for a given annotation in 38% of Task 2.1 queries by building on this method. The term categorization methodology achieved a precision of 16% for annotation within the correct extended family in Task 2.2, though we show through subsequent analysis that this can be improved with a different parameter setting. Our architecture proved not to be very successful on the evidence text component of the task, in the configuration used to generate the submitted results.

CONCLUSION

The initial results show promise for both of the methods we explored, and we are planning to integrate the methods more closely to achieve better results overall.

摘要

背景

我们参与了生物创意任务2，该任务基于给定文档的文本将蛋白质注释到基因本体（GO）中，并从文档中选择证明该注释的证据文本。我们使用两种不同方法的几种组合来处理该任务：一种用于扩展与GO节点相关联的词的无监督算法，以及一种将注释视为将来自蛋白质文档邻域的术语分类到GO中的注释方法。

结果

评估结果表明，用于扩展与GO节点相关联的词的方法非常强大；通过基于此方法，我们能够在任务2.1的38%的查询中成功为给定注释选择合适的证据文本。在任务2.2中，术语分类方法在正确扩展家族内的注释精度达到了16%，不过我们通过后续分析表明，通过不同的参数设置可以提高该精度。在用于生成提交结果的配置中，我们的架构在任务的证据文本部分证明不太成功。

结论

初步结果显示了我们探索的两种方法都有前景，我们计划更紧密地整合这些方法以总体上取得更好的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4bd4/1869013/708154f4e764/1471-2105-6-S1-S20-1.jpg

相似文献

Protein annotation as term categorization in the gene ontology using word proximity networks.利用词邻近网络在基因本体论中将蛋白质注释作为术语分类。

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S20. doi: 10.1186/1471-2105-6-S1-S20. Epub 2005 May 24.

Evaluation of BioCreAtIvE assessment of task 2.生物创意任务2评估的评价

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S16. doi: 10.1186/1471-2105-6-S1-S16. Epub 2005 May 24.

An evaluation of GO annotation retrieval for BioCreAtIvE and GOA.对生物创意（BioCreAtIvE）和基因本体注释（GOA）的基因本体（GO）注释检索的评估。

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S17. doi: 10.1186/1471-2105-6-S1-S17. Epub 2005 May 24.

Finding genomic ontology terms in text using evidence content.利用证据内容在文本中查找基因组本体术语。

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S21. doi: 10.1186/1471-2105-6-S1-S21. Epub 2005 May 24.

Gene ontology annotation by density and gravitation models.基于密度和引力模型的基因本体注释

Genome Inform. 2006;17(2):110-20.

Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks.基因本体注释的自动提取及其与蛋白质网络中聚类的相关性。

BMC Bioinformatics. 2007 Jul 10;8:243. doi: 10.1186/1471-2105-8-243.

Learning statistical models for annotating proteins with function information using biomedical text.利用生物医学文本学习用于用功能信息注释蛋白质的统计模型。

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S18. doi: 10.1186/1471-2105-6-S1-S18. Epub 2005 May 24.

A sentence sliding window approach to extract protein annotations from biomedical articles.一种用于从生物医学文章中提取蛋白质注释的句子滑动窗口方法。

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S19. doi: 10.1186/1471-2105-6-S1-S19. Epub 2005 May 24.

Automatic extraction of gene/protein biological functions from biomedical text.从生物医学文本中自动提取基因/蛋白质的生物学功能。

Bioinformatics. 2005 Apr 1;21(7):1227-36. doi: 10.1093/bioinformatics/bti084. Epub 2004 Oct 27.

Data-poor categorization and passage retrieval for gene ontology annotation in Swiss-Prot.用于Swiss-Prot中基因本体注释的数据匮乏分类与段落检索

BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S23. doi: 10.1186/1471-2105-6-S1-S23. Epub 2005 May 24.

引用本文的文献

Assessing the impact of case sensitivity and term information gain on biomedical concept recognition.评估大小写敏感性和术语信息增益对生物医学概念识别的影响。

PLoS One. 2015 Mar 19;10(3):e0119091. doi: 10.1371/journal.pone.0119091. eCollection 2015.

Gene function prediction based on the Gene Ontology hierarchical structure.基于基因本体层次结构的基因功能预测

PLoS One. 2014 Sep 5;9(9):e107187. doi: 10.1371/journal.pone.0107187. eCollection 2014.

Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters.大规模生物医学概念识别：当前自动标注器及其参数的评估。

BMC Bioinformatics. 2014 Feb 26;15:59. doi: 10.1186/1471-2105-15-59.

Text mining improves prediction of protein functional sites.文本挖掘提高了蛋白质功能位点的预测能力。

PLoS One. 2012;7(2):e32171. doi: 10.1371/journal.pone.0032171. Epub 2012 Feb 29.

Mining semantic networks of bioinformatics e-resources from the literature.从文献中挖掘生物信息学电子资源的语义网络。

J Biomed Semantics. 2011 Mar 7;2 Suppl 1(Suppl 1):S4. doi: 10.1186/2041-1480-2-S1-S4.

Multi-label literature classification based on the Gene Ontology graph.基于基因本体图的多标签文献分类

BMC Bioinformatics. 2008 Dec 8;9:525. doi: 10.1186/1471-2105-9-525.

Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks.使用新型线性模型和词邻近网络揭示摘要和文本中的蛋白质相互作用。

Genome Biol. 2008;9 Suppl 2(Suppl 2):S11. doi: 10.1186/gb-2008-9-s2-s11. Epub 2008 Sep 1.

Novel metrics for evaluating the functional coherence of protein groups via protein semantic network.通过蛋白质语义网络评估蛋白质组功能连贯性的新指标。

Genome Biol. 2007;8(7):R153. doi: 10.1186/gb-2007-8-7-r153.

Automatic extraction of gene ontology annotation and its correlation with clusters in protein networks.基因本体注释的自动提取及其与蛋白质网络中聚类的相关性。

BMC Bioinformatics. 2007 Jul 10;8:243. doi: 10.1186/1471-2105-8-243.

A categorization approach to automated ontological function annotation.一种用于自动本体功能注释的分类方法。

Protein Sci. 2006 Jun;15(6):1544-9. doi: 10.1110/ps.062184006. Epub 2006 May 2.

本文引用的文献

The gene ontology categorizer.基因本体分类器。

Bioinformatics. 2004 Aug 4;20 Suppl 1:i169-77. doi: 10.1093/bioinformatics/bth921.

Mapping Gene Ontology to proteins based on protein-protein interaction data.基于蛋白质-蛋白质相互作用数据将基因本体映射到蛋白质。

Bioinformatics. 2004 Apr 12;20(6):895-902. doi: 10.1093/bioinformatics/btg500. Epub 2004 Jan 29.

Genes, themes and microarrays: using information retrieval for large-scale gene analysis.基因、主题与微阵列：利用信息检索进行大规模基因分析。

Proc Int Conf Intell Syst Mol Biol. 2000;8:317-28.

Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.基因本体论：生物学统一工具。基因本体论联合会。

Nat Genet. 2000 May;25(1):25-9. doi: 10.1038/75556.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用词邻近网络在基因本体论中将蛋白质注释作为术语分类。

Protein annotation as term categorization in the gene ontology using word proximity networks.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献