Ding J, Berleant D, Nettleton D, Wurtele E
Department of Electrical and Computer Engineering, Iowa State University, Ames, Iowa 50011, USA.
Pac Symp Biocomput. 2002:326-37. doi: 10.1142/9789812799623_0031.
A growing body of works address automated mining of biochemical knowledge from digital repositories of scientific literature, such as MEDLINE. Some of these works use abstracts as the unit of text from which to extract facts. Others use sentences for this purpose, while still others use phrases. Here we compare abstracts, sentences, and phrases in MEDLINE using the standard information retrieval performance measures of recall, precision, and effectiveness, for the task of mining interactions among biochemical terms based on term co-occurrence. Results show statistically significant differences that can impact the choice of text unit.
越来越多的研究致力于从诸如MEDLINE之类的科学文献数字存储库中自动挖掘生化知识。其中一些研究将摘要作为提取事实的文本单元。另一些研究则为此目的使用句子,还有一些研究使用短语。在此,我们基于词共现任务,使用召回率、精确率和有效性等标准信息检索性能指标,比较MEDLINE中的摘要、句子和短语,以挖掘生化术语之间的相互作用。结果显示出具有统计学意义的差异,这些差异可能会影响文本单元的选择。