Suppr超能文献

用于识别高质量MEDLINE文档的引用指标与机器学习过滤器的比较。

A comparison of citation metrics to machine learning filters for the identification of high quality MEDLINE documents.

作者信息

Aphinyanaphongs Yindalon, Statnikov Alexander, Aliferis Constantin F

机构信息

Department of Biomedical Informatics, Eskind Biomedical Library, room 412, Vanderbilt University, 2209 Garland Avenue, Nashville, TN 37232, USA.

出版信息

J Am Med Inform Assoc. 2006 Jul-Aug;13(4):446-55. doi: 10.1197/jamia.M2031. Epub 2006 Apr 18.

Abstract

OBJECTIVE

The present study explores the discriminatory performance of existing and novel gold-standard-specific machine learning (GSS-ML) focused filter models (i.e., models built specifically for a retrieval task and a gold standard against which they are evaluated) and compares their performance to citation count and impact factors, and non-specific machine learning (NS-ML) models (i.e., models built for a different task and/or different gold standard).

DESIGN

Three gold standard corpora were constructed using the SSOAB bibliography, the ACPJ-cited treatment articles, and the ACPJ-cited etiology articles. Citation counts and impact factors were obtained for each article. Support vector machine models were used to classify the articles using combinations of content, impact factors, and citation counts as predictors.

MEASUREMENTS

Discriminatory performance was estimated using the area under the receiver operating characteristic curve and n-fold cross-validation.

RESULTS

For all three gold standards and tasks, GSS-ML filters outperformed citation count, impact factors, and NS-ML filters. Combinations of content with impact factor or citation count produced no or negligible improvements to the GSS machine learning filters.

CONCLUSIONS

These experiments provide evidence that when building information retrieval filters focused on a retrieval task and corresponding gold standard, the filter models have to be built specifically for this task and gold standard. Under those conditions, machine learning filters outperform standard citation metrics. Furthermore, citation counts and impact factors add marginal value to discriminatory performance. Previous research that claimed better performance of citation metrics than machine learning in one of the corpora examined here is attributed to using machine learning filters built for a different gold standard and task.

摘要

目的

本研究探讨了现有的和新型的特定金标准机器学习(GSS-ML)聚焦过滤模型(即专门为检索任务构建并以其进行评估的金标准的模型)的区分性能,并将其性能与引用次数和影响因子以及非特定机器学习(NS-ML)模型(即为不同任务和/或不同金标准构建的模型)进行比较。

设计

使用SSOAB文献目录、被《美国内科医师学会杂志》引用的治疗文章以及被《美国内科医师学会杂志》引用的病因学文章构建了三个金标准语料库。获取了每篇文章的引用次数和影响因子。使用支持向量机模型,将内容、影响因子和引用次数的组合作为预测因子对文章进行分类。

测量

使用受试者工作特征曲线下面积和n折交叉验证来估计区分性能。

结果

对于所有三个金标准和任务,GSS-ML过滤器的表现均优于引用次数、影响因子和NS-ML过滤器。内容与影响因子或引用次数的组合对GSS机器学习过滤器没有产生或仅产生了可忽略不计的改进。

结论

这些实验提供了证据,即在构建专注于检索任务和相应金标准的信息检索过滤器时,必须专门为该任务和金标准构建过滤模型。在这些条件下,机器学习过滤器的表现优于标准引用指标。此外,引用次数和影响因子对区分性能的增加值很小。之前在此处研究的其中一个语料库中声称引用指标比机器学习表现更好的研究,归因于使用了为不同金标准和任务构建的机器学习过滤器。

相似文献

3
4
Using citation data to improve retrieval from MEDLINE.利用引用数据改进MEDLINE检索。
J Am Med Inform Assoc. 2006 Jan-Feb;13(1):96-105. doi: 10.1197/jamia.M1909. Epub 2005 Oct 12.
10
Towards automatic recognition of scientifically rigorous clinical research evidence.迈向科学严谨临床研究证据的自动识别。
J Am Med Inform Assoc. 2009 Jan-Feb;16(1):25-31. doi: 10.1197/jamia.M2996. Epub 2008 Oct 24.

引用本文的文献

8
A new iterative method to reduce workload in systematic review process.一种减少系统评价过程中工作量的新迭代方法。
Int J Comput Biol Drug Des. 2013;6(1-2):5-17. doi: 10.1504/IJCBDD.2013.052198. Epub 2013 Feb 21.

本文引用的文献

2
Using citation data to improve retrieval from MEDLINE.利用引用数据改进MEDLINE检索。
J Am Med Inform Assoc. 2006 Jan-Feb;13(1):96-105. doi: 10.1197/jamia.M1909. Epub 2005 Oct 12.
4
6
Evaluation of methodological search filters--a review.方法学检索过滤器的评估——一项综述。
Health Info Libr J. 2004 Sep;21(3):148-63. doi: 10.1111/j.1471-1842.2004.00511.x.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验