用于识别高质量MEDLINE文档的引用指标与机器学习过滤器的比较。

A comparison of citation metrics to machine learning filters for the identification of high quality MEDLINE documents.

作者信息

Aphinyanaphongs Yindalon, Statnikov Alexander, Aliferis Constantin F

机构信息

Department of Biomedical Informatics, Eskind Biomedical Library, room 412, Vanderbilt University, 2209 Garland Avenue, Nashville, TN 37232, USA.

出版信息

J Am Med Inform Assoc. 2006 Jul-Aug;13(4):446-55. doi: 10.1197/jamia.M2031. Epub 2006 Apr 18.

DOI:10.1197/jamia.M2031

PMID:16622165

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1513679/

Abstract

OBJECTIVE

The present study explores the discriminatory performance of existing and novel gold-standard-specific machine learning (GSS-ML) focused filter models (i.e., models built specifically for a retrieval task and a gold standard against which they are evaluated) and compares their performance to citation count and impact factors, and non-specific machine learning (NS-ML) models (i.e., models built for a different task and/or different gold standard).

DESIGN

Three gold standard corpora were constructed using the SSOAB bibliography, the ACPJ-cited treatment articles, and the ACPJ-cited etiology articles. Citation counts and impact factors were obtained for each article. Support vector machine models were used to classify the articles using combinations of content, impact factors, and citation counts as predictors.

MEASUREMENTS

Discriminatory performance was estimated using the area under the receiver operating characteristic curve and n-fold cross-validation.

RESULTS

For all three gold standards and tasks, GSS-ML filters outperformed citation count, impact factors, and NS-ML filters. Combinations of content with impact factor or citation count produced no or negligible improvements to the GSS machine learning filters.

CONCLUSIONS

These experiments provide evidence that when building information retrieval filters focused on a retrieval task and corresponding gold standard, the filter models have to be built specifically for this task and gold standard. Under those conditions, machine learning filters outperform standard citation metrics. Furthermore, citation counts and impact factors add marginal value to discriminatory performance. Previous research that claimed better performance of citation metrics than machine learning in one of the corpora examined here is attributed to using machine learning filters built for a different gold standard and task.

摘要

目的

本研究探讨了现有的和新型的特定金标准机器学习（GSS-ML）聚焦过滤模型（即专门为检索任务构建并以其进行评估的金标准的模型）的区分性能，并将其性能与引用次数和影响因子以及非特定机器学习（NS-ML）模型（即为不同任务和/或不同金标准构建的模型）进行比较。

设计

使用SSOAB文献目录、被《美国内科医师学会杂志》引用的治疗文章以及被《美国内科医师学会杂志》引用的病因学文章构建了三个金标准语料库。获取了每篇文章的引用次数和影响因子。使用支持向量机模型，将内容、影响因子和引用次数的组合作为预测因子对文章进行分类。

测量

使用受试者工作特征曲线下面积和n折交叉验证来估计区分性能。

结果

对于所有三个金标准和任务，GSS-ML过滤器的表现均优于引用次数、影响因子和NS-ML过滤器。内容与影响因子或引用次数的组合对GSS机器学习过滤器没有产生或仅产生了可忽略不计的改进。

结论

这些实验提供了证据，即在构建专注于检索任务和相应金标准的信息检索过滤器时，必须专门为该任务和金标准构建过滤模型。在这些条件下，机器学习过滤器的表现优于标准引用指标。此外，引用次数和影响因子对区分性能的增加值很小。之前在此处研究的其中一个语料库中声称引用指标比机器学习表现更好的研究，归因于使用了为不同金标准和任务构建的机器学习过滤器。

相似文献

A comparison of citation metrics to machine learning filters for the identification of high quality MEDLINE documents.用于识别高质量MEDLINE文档的引用指标与机器学习过滤器的比较。

J Am Med Inform Assoc. 2006 Jul-Aug;13(4):446-55. doi: 10.1197/jamia.M2031. Epub 2006 Apr 18.

Prospective validation of text categorization filters for identifying high-quality, content-specific articles in MEDLINE.用于在医学文献数据库（MEDLINE）中识别高质量、特定内容文章的文本分类过滤器的前瞻性验证。

AMIA Annu Symp Proc. 2006;2006:6-10.

Text categorization models for high-quality article retrieval in internal medicine.用于内科高质量文章检索的文本分类模型。

J Am Med Inform Assoc. 2005 Mar-Apr;12(2):207-16. doi: 10.1197/jamia.M1641. Epub 2004 Nov 23.

Using citation data to improve retrieval from MEDLINE.利用引用数据改进MEDLINE检索。

J Am Med Inform Assoc. 2006 Jan-Feb;13(1):96-105. doi: 10.1197/jamia.M1909. Epub 2005 Oct 12.

A comparison of impact factor, clinical query filters, and pattern recognition query filters in terms of sensitivity to topic.在对主题的敏感性方面，影响因子、临床查询过滤器和模式识别查询过滤器的比较。

Stud Health Technol Inform. 2007;129(Pt 1):716-20.

Search strategies to identify diagnostic accuracy studies in MEDLINE and EMBASE.在MEDLINE和EMBASE中识别诊断准确性研究的检索策略。

Cochrane Database Syst Rev. 2013 Sep 11;2013(9):MR000022. doi: 10.1002/14651858.MR000022.pub3.

What predicts citation counts and translational impact in headache research? A machine learning analysis.什么因素可以预测头痛研究的引用次数和转化影响力？一项机器学习分析。

Cephalalgia. 2024 May;44(5):3331024241251488. doi: 10.1177/03331024241251488.

Automatic identification of high impact articles in PubMed to support clinical decision making.在PubMed中自动识别高影响力文章以支持临床决策。

J Biomed Inform. 2017 Sep;73:95-103. doi: 10.1016/j.jbi.2017.07.015. Epub 2017 Jul 26.

Using incomplete citation data for MEDLINE results ranking.使用不完整的引用数据进行MEDLINE结果排名。

AMIA Annu Symp Proc. 2005;2005:316-20.

Towards automatic recognition of scientifically rigorous clinical research evidence.迈向科学严谨临床研究证据的自动识别。

J Am Med Inform Assoc. 2009 Jan-Feb;16(1):25-31. doi: 10.1197/jamia.M2996. Epub 2008 Oct 24.

引用本文的文献

Clinical Context-Aware Biomedical Text Summarization Using Deep Neural Network: Model Development and Validation.基于深度神经网络的临床相关生物医学文本摘要：模型开发与验证。

J Med Internet Res. 2020 Oct 23;22(10):e19810. doi: 10.2196/19810.

Automatic identification of recent high impact clinical articles in PubMed to support clinical decision making using time-agnostic features.使用与时间无关的特征自动识别 PubMed 中最近具有高影响力的临床文章，以支持临床决策。

J Biomed Inform. 2019 Jan;89:1-10. doi: 10.1016/j.jbi.2018.11.010. Epub 2018 Nov 22.

Automatic identification of high impact articles in PubMed to support clinical decision making.在PubMed中自动识别高影响力文章以支持临床决策。

J Biomed Inform. 2017 Sep;73:95-103. doi: 10.1016/j.jbi.2017.07.015. Epub 2017 Jul 26.

Physicians' perception of alternative displays of clinical research evidence for clinical decision support - A study with case vignettes.医生对用于临床决策支持的临床研究证据替代展示方式的认知——一项基于病例 vignettes 的研究

J Biomed Inform. 2017 Jul;71S:S53-S59. doi: 10.1016/j.jbi.2017.01.007. Epub 2017 Jan 13.

Classification of Clinically Useful Sentences in MEDLINE.医学文献数据库（MEDLINE）中临床有用语句的分类

AMIA Annu Symp Proc. 2015 Nov 5;2015:2015-24. eCollection 2015.

Automated confidence ranked classification of randomized controlled trial articles: an aid to evidence-based medicine.随机对照试验文章的自动置信度分级分类：循证医学的辅助手段

J Am Med Inform Assoc. 2015 May;22(3):707-17. doi: 10.1093/jamia/ocu025. Epub 2015 Feb 5.

Algorithms for Discovery of Multiple Markov Boundaries.用于发现多个马尔可夫边界的算法。

J Mach Learn Res. 2013 Feb;14:499-566.

A new iterative method to reduce workload in systematic review process.一种减少系统评价过程中工作量的新迭代方法。

Int J Comput Biol Drug Des. 2013;6(1-2):5-17. doi: 10.1504/IJCBDD.2013.052198. Epub 2013 Feb 21.

Search filter precision can be improved by NOTing out irrelevant content.通过剔除不相关内容可以提高搜索过滤器的精度。

AMIA Annu Symp Proc. 2011;2011:1506-13. Epub 2011 Oct 22.

Cross-topic learning for work prioritization in systematic review creation and update.跨主题学习在系统综述创建和更新中的工作优先级排序。

J Am Med Inform Assoc. 2009 Sep-Oct;16(5):690-704. doi: 10.1197/jamia.M3162. Epub 2009 Jun 30.

本文引用的文献

Extracting drug-drug interaction articles from MEDLINE to improve the content of drug databases.从医学文献数据库（MEDLINE）中提取药物相互作用文章以改善药物数据库的内容。

AMIA Annu Symp Proc. 2005;2005:216-20.

Using citation data to improve retrieval from MEDLINE.利用引用数据改进MEDLINE检索。

J Am Med Inform Assoc. 2006 Jan-Feb;13(1):96-105. doi: 10.1197/jamia.M1909. Epub 2005 Oct 12.

Optimal search strategies for detecting clinically sound prognostic studies in EMBASE: an analytic survey.在EMBASE中检测临床合理的预后研究的最佳搜索策略：一项分析性调查。

J Am Med Inform Assoc. 2005 Jul-Aug;12(4):481-5. doi: 10.1197/jamia.M1752. Epub 2005 Mar 31.

Text categorization models for high-quality article retrieval in internal medicine.用于内科高质量文章检索的文本分类模型。

J Am Med Inform Assoc. 2005 Mar-Apr;12(2):207-16. doi: 10.1197/jamia.M1641. Epub 2004 Nov 23.

Learning Boolean queries for article quality filtering.学习用于文章质量过滤的布尔查询。

Stud Health Technol Inform. 2004;107(Pt 1):263-7.

Evaluation of methodological search filters--a review.方法学检索过滤器的评估——一项综述。

Health Info Libr J. 2004 Sep;21(3):148-63. doi: 10.1111/j.1471-1842.2004.00511.x.

HITON: a novel Markov Blanket algorithm for optimal variable selection.希顿：一种用于最优变量选择的新型马尔可夫毯算法。

AMIA Annu Symp Proc. 2003;2003:21-5.

Developing optimal search strategies for detecting clinically sound studies in MEDLINE.制定用于在医学文献数据库（MEDLINE）中检索临床合理研究的最佳检索策略。

J Am Med Inform Assoc. 1994 Nov-Dec;1(6):447-58. doi: 10.1136/jamia.1994.95153434.

Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach.比较两条或多条相关的受试者工作特征曲线下的面积：一种非参数方法。

Biometrics. 1988 Sep;44(3):837-45.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验