Leitner Florian, Valencia Alfonso
Structural Computational Biology Group, Spanish National Cancer Research Centre (CNIO), Madrid, Spain.
FEBS Lett. 2008 Apr 9;582(8):1178-81. doi: 10.1016/j.febslet.2008.02.072. Epub 2008 Mar 6.
We propose that the combination of human expertise and automatic text-mining systems can be used to create a first generation of electronically annotated information (EAI) that can be added to journal abstracts and that is directly related to the information in the corresponding text. The first experiments have concentrated on the annotation of gene/protein names and those of organisms, as these are the best resolved problems. A second generation of systems could then attempt to address the problems of annotating protein interactions and protein/gene functions, a more difficult task for text-mining systems. EAI will permit easier categorization of this information, it will help in the evaluation of papers for their curation in databases, and it will be invaluable for maintaining the links between the information in databases and the facts described in text. Additionally, it will contribute to the efforts towards completing database information and creating collections of annotated text that can be used to train new generations of text-mining systems. The recent introduction of the first meta-server for the annotation of biological text, with the possibility of collecting annotations from available text-mining systems, adds credibility to the technical feasibility of this proposal.
我们建议,将人类专业知识与自动文本挖掘系统相结合,可用于创建第一代电子注释信息(EAI),该信息可添加到期刊摘要中,且与相应文本中的信息直接相关。首批实验集中于基因/蛋白质名称以及生物体名称的注释,因为这些是最容易解决的问题。第二代系统随后可尝试解决注释蛋白质相互作用和蛋白质/基因功能的问题,这对文本挖掘系统而言是一项更艰巨的任务。EAI将使该信息的分类更加容易,有助于评估论文以便在数据库中进行管理,对于维护数据库中的信息与文本中描述的事实之间的联系将非常宝贵。此外,它将有助于完善数据库信息并创建注释文本集合,这些集合可用于训练新一代的文本挖掘系统。最近推出的首个用于生物文本注释的元服务器,能够从现有的文本挖掘系统收集注释,这增加了该提议在技术上的可行性。