Verspoor Karin, Cohn Judith, Joslyn Cliff, Mniszewski Sue, Rechtsteiner Andreas, Rocha Luis M, Simas Tiago
Los Alamos National Laboratory, PO Box 1663, MS B256, Los Alamos, NM 87545, USA.
BMC Bioinformatics. 2005;6 Suppl 1(Suppl 1):S20. doi: 10.1186/1471-2105-6-S1-S20. Epub 2005 May 24.
We participated in the BioCreAtIvE Task 2, which addressed the annotation of proteins into the Gene Ontology (GO) based on the text of a given document and the selection of evidence text from the document justifying that annotation. We approached the task utilizing several combinations of two distinct methods: an unsupervised algorithm for expanding words associated with GO nodes, and an annotation methodology which treats annotation as categorization of terms from a protein's document neighborhood into the GO.
The evaluation results indicate that the method for expanding words associated with GO nodes is quite powerful; we were able to successfully select appropriate evidence text for a given annotation in 38% of Task 2.1 queries by building on this method. The term categorization methodology achieved a precision of 16% for annotation within the correct extended family in Task 2.2, though we show through subsequent analysis that this can be improved with a different parameter setting. Our architecture proved not to be very successful on the evidence text component of the task, in the configuration used to generate the submitted results.
The initial results show promise for both of the methods we explored, and we are planning to integrate the methods more closely to achieve better results overall.
我们参与了生物创意任务2,该任务基于给定文档的文本将蛋白质注释到基因本体(GO)中,并从文档中选择证明该注释的证据文本。我们使用两种不同方法的几种组合来处理该任务:一种用于扩展与GO节点相关联的词的无监督算法,以及一种将注释视为将来自蛋白质文档邻域的术语分类到GO中的注释方法。
评估结果表明,用于扩展与GO节点相关联的词的方法非常强大;通过基于此方法,我们能够在任务2.1的38%的查询中成功为给定注释选择合适的证据文本。在任务2.2中,术语分类方法在正确扩展家族内的注释精度达到了16%,不过我们通过后续分析表明,通过不同的参数设置可以提高该精度。在用于生成提交结果的配置中,我们的架构在任务的证据文本部分证明不太成功。
初步结果显示了我们探索的两种方法都有前景,我们计划更紧密地整合这些方法以总体上取得更好的结果。