Smith Larry H, Wilbur W John
Computational Biology Branch, National Center for Biotechnology Information, Building 38A, 8600 Rockville Pike, Bethesda, MD 20894 USA.
Inf Retr Boston. 2010 Dec;13(6):601-617. doi: 10.1007/s10791-010-9126-8. Epub 2010 Jan 23.
We explore the feasibility of automatically identifying sentences in different MEDLINE abstracts that are related in meaning. We compared traditional vector space models with machine learning methods for detecting relatedness, and found that machine learning was superior. The Huber method, a variant of Support Vector Machines which minimizes the modified Huber loss function, achieves 73% precision when the score cutoff is set high enough to identify about one related sentence per abstract on average. We illustrate how an abstract viewed in PubMed might be modified to present the related sentences found in other abstracts by this automatic procedure.
我们探讨了自动识别不同MEDLINE摘要中语义相关句子的可行性。我们将传统向量空间模型与用于检测相关性的机器学习方法进行了比较,发现机器学习方法更具优势。支持向量机的一种变体——Huber方法,通过最小化修正后的Huber损失函数,当分数截止值设置得足够高,以平均每个摘要识别出大约一个相关句子时,可达到73%的精度。我们说明了如何通过此自动程序修改在PubMed中查看的摘要,以呈现其他摘要中找到的相关句子。