Department of Biological Sciences and Institute of Molecular and Structural Biology, Birkbeck, University of London, London, UK.
BMC Bioinformatics. 2012 Jul 23;13:172. doi: 10.1186/1471-2105-13-172.
Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways. However, one important category of pathway - metabolic pathways - has been largely neglected.Here we present a relatively simple method for extracting metabolic reaction information from free text that scores different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence and location of stemmed keywords. This method extends an approach that has proved effective in the context of the extraction of protein-protein interactions.
When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task.
We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein-protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed.
越来越多的生物文本挖掘研究侧重于提取与生物网络和途径的构建和管理相关的复杂关系。然而,途径的一个重要类别——代谢途径——在很大程度上被忽视了。在这里,我们提出了一种相对简单的方法,用于从文本中提取代谢反应信息,该方法根据给定句子中词干关键字的存在和位置,对分配实体(酶和代谢物)的不同排列进行评分。这种方法扩展了一种在提取蛋白质-蛋白质相互作用方面已被证明有效的方法。
当使用标准性能标准评估一组手动整理的代谢途径时,我们的方法表现非常出色。精度和召回率与以前在著名的蛋白质-蛋白质相互作用提取任务中所达到的相当。
我们得出结论,自动代谢途径构建比通常假设的更具可操作性,并且(与蛋白质-蛋白质相互作用提取的情况一样)相对简单的文本挖掘方法可能会非常有效。希望这些结果将为进一步的研究提供动力,并成为判断尚未开发的更复杂方法性能的有用基准。