MScanner：一种用于检索医学文献数据库（Medline）引用文献的分类器。

MScanner: a classifier for retrieving Medline citations.

作者信息

Poulter Graham L, Rubin Daniel L, Altman Russ B, Seoighe Cathal

机构信息

UCT NBN Node, Department of Molecular and Cell Biology, University of Cape Town, Cape Town, South Africa.

出版信息

BMC Bioinformatics. 2008 Feb 19;9:108. doi: 10.1186/1471-2105-9-108.

DOI:10.1186/1471-2105-9-108

PMID:18284683

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2263023/

Abstract

BACKGROUND

Keyword searching through PubMed and other systems is the standard means of retrieving information from Medline. However, ad-hoc retrieval systems do not meet all of the needs of databases that curate information from literature, or of text miners developing a corpus on a topic that has many terms indicative of relevance. Several databases have developed supervised learning methods that operate on a filtered subset of Medline, to classify Medline records so that fewer articles have to be manually reviewed for relevance. A few studies have considered generalisation of Medline classification to operate on the entire Medline database in a non-domain-specific manner, but existing applications lack speed, available implementations, or a means to measure performance in new domains.

RESULTS

MScanner is an implementation of a Bayesian classifier that provides a simple web interface for submitting a corpus of relevant training examples in the form of PubMed IDs and returning results ranked by decreasing probability of relevance. For maximum speed it uses the Medical Subject Headings (MeSH) and journal of publication as a concise document representation, and takes roughly 90 seconds to return results against the 16 million records in Medline. The web interface provides interactive exploration of the results, and cross validated performance evaluation on the relevant input against a random subset of Medline. We describe the classifier implementation, cross validate it on three domain-specific topics, and compare its performance to that of an expert PubMed query for a complex topic. In cross validation on the three sample topics against 100,000 random articles, the classifier achieved excellent separation of relevant and irrelevant article score distributions, ROC areas between 0.97 and 0.99, and averaged precision between 0.69 and 0.92.

CONCLUSION

MScanner is an effective non-domain-specific classifier that operates on the entire Medline database, and is suited to retrieving topics for which many features may indicate relevance. Its web interface simplifies the task of classifying Medline citations, compared to building a pre-filter and classifier specific to the topic. The data sets and open source code used to obtain the results in this paper are available on-line and as supplementary material, and the web interface may be accessed at http://mscanner.stanford.edu.

摘要

背景

通过PubMed及其他系统进行关键词搜索是从Medline检索信息的标准方法。然而，临时检索系统无法满足从文献中整理信息的数据库的所有需求，也无法满足文本挖掘人员针对一个有许多相关指示词的主题构建语料库的需求。一些数据库已经开发了监督学习方法，这些方法在Medline的一个经过筛选的子集中运行，对Medline记录进行分类，从而减少需要人工审核相关性的文章数量。有一些研究考虑将Medline分类进行泛化，以便以非特定领域的方式在整个Medline数据库上运行，但现有的应用程序缺乏速度、可用的实现方式，或者缺乏在新领域中衡量性能的手段。

结果

MScanner是一个贝叶斯分类器的实现，它提供了一个简单的网页界面，用于以PubMed ID的形式提交相关训练示例的语料库，并返回按相关性概率递减排序的结果。为了实现最大速度，它使用医学主题词（MeSH）和出版物期刊作为简洁的文档表示形式，针对Medline中的1600万条记录返回结果大约需要90秒。该网页界面提供了对结果的交互式探索，以及针对Medline的一个随机子集对相关输入进行交叉验证的性能评估。我们描述了分类器的实现，在三个特定领域的主题上对其进行交叉验证，并将其性能与针对一个复杂主题的专家PubMed查询的性能进行比较。在针对100,000篇随机文章的三个示例主题的交叉验证中，该分类器在相关和不相关文章得分分布之间实现了出色的区分，ROC面积在0.97至0.9之间，平均精度在0.69至0.92之间。

结论

MScanner是一个有效的非特定领域分类器，可在整个Medline数据库上运行，适用于检索有许多特征可能指示相关性的主题。与构建特定于该主题的预过滤器和分类器相比，其网页界面简化了对Medline引用进行分类的任务。用于获得本文结果的数据集和开源代码可在线获取并作为补充材料，网页界面可通过http://mscanner.stanford.edu访问。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a4f8/2263023/4e67a4a64c00/1471-2105-9-108-1.jpg

相似文献

MScanner: a classifier for retrieving Medline citations.

BMC Bioinformatics. 2008 Feb 19;9:108. doi: 10.1186/1471-2105-9-108.

OvidSP Medline-to-PubMed search filter translation: a methodology for extending search filter range to include PubMed's unique content.

BMC Med Res Methodol. 2013 Jul 2;13:86. doi: 10.1186/1471-2288-13-86.

G-Bean: an ontology-graph based web tool for biomedical literature retrieval.

BMC Bioinformatics. 2014;15 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-15-S12-S1. Epub 2014 Nov 6.

Ranking the whole MEDLINE database according to a large training set using text indexing.

BMC Bioinformatics. 2005 Mar 24;6:75. doi: 10.1186/1471-2105-6-75.

How to improve your PubMed/MEDLINE searches: 1. background and basic searching.

J Telemed Telecare. 2013 Dec;19(8):479-86. doi: 10.1177/1357633X13512061. Epub 2013 Nov 6.

CDAPubMed: a browser extension to retrieve EHR-based biomedical literature.

BMC Med Inform Decis Mak. 2012 Apr 5;12:29. doi: 10.1186/1472-6947-12-29.

Web-based citation management compared to EndNote: options for medical sciences.

Med Ref Serv Q. 2008 Fall;27(3):260-71. doi: 10.1080/02763860802198804.

Retrieval comparison of EndNote to search MEDLINE (Ovid and PubMed) versus searching them directly.

Med Ref Serv Q. 2004 Fall;23(3):25-32. doi: 10.1300/J115v23n03_03.

Beyond the black stump: rapid reviews of health research issues affecting regional, rural and remote Australia.

Med J Aust. 2020 Dec;213 Suppl 11:S3-S32.e1. doi: 10.5694/mja2.50881.

Improving the utility of MeSH® terms using the TopicalMeSH representation.

J Biomed Inform. 2016 Jun;61:77-86. doi: 10.1016/j.jbi.2016.03.013. Epub 2016 Mar 19.

引用本文的文献

Large expert-curated database for benchmarking document similarity detection in biomedical literature search.

Database (Oxford). 2019 Jan 1;2019. doi: 10.1093/database/baz085.

BioReader: a text mining tool for performing classification of biomedical literature.

BMC Bioinformatics. 2019 Feb 4;19(Suppl 13):57. doi: 10.1186/s12859-019-2607-x.

AMIA Jt Summits Transl Sci Proc. 2016 Jul 20;2016:225-34. eCollection 2016.

Mining biomedical images towards valuable information retrieval in biomedical and life sciences.

Database (Oxford). 2016 Aug 18;2016. doi: 10.1093/database/baw118. Print 2016.

Integrating unified medical language system and association mining techniques into relevance feedback for biomedical literature search.

BMC Bioinformatics. 2016 Jul 19;17 Suppl 9(Suppl 9):264. doi: 10.1186/s12859-016-1129-z.

Text mining for neuroanatomy using WhiteText with an updated corpus and a new web application.

Front Neuroinform. 2015 May 21;9:13. doi: 10.3389/fninf.2015.00013. eCollection 2015.

Comparing the Precision of Information Retrieval of MeSH-Controlled Vocabulary Search Method and a Visual Method in the Medline Medical Database.

Electron Physician. 2014 May 10;6(2):832-7. doi: 10.14661/2014.832-837. eCollection 2014 Apr-Jun.

Accessing biomedical literature in the current information landscape.

Methods Mol Biol. 2014;1159:11-31. doi: 10.1007/978-1-4939-0709-0_2.

Area under precision-recall curves for weighted and unweighted data.

PLoS One. 2014 Mar 20;9(3):e92209. doi: 10.1371/journal.pone.0092209. eCollection 2014.

Comparison and combination of several MeSH indexing approaches.

AMIA Annu Symp Proc. 2013 Nov 16;2013:709-18. eCollection 2013.

本文引用的文献

BMC Bioinformatics. 2007 Oct 30;8:423. doi: 10.1186/1471-2105-8-423.

Automating document classification for the Immune Epitope Database.

BMC Bioinformatics. 2007 Jul 26;8:269. doi: 10.1186/1471-2105-8-269.

An effective general purpose approach for automated biomedical document classification.

AMIA Annu Symp Proc. 2006;2006:161-5.

EBIMed--text crunching to gather facts for proteins from Medline.

Bioinformatics. 2007 Jan 15;23(2):e237-44. doi: 10.1093/bioinformatics/btl302.

Relemed: sentence-level search engine with relevance score for the MEDLINE database of biomedical articles.

BMC Med Inform Decis Mak. 2007 Jan 10;7:1. doi: 10.1186/1472-6947-7-1.

Bioinformatics. 2006 Sep 15;22(18):2298-304. doi: 10.1093/bioinformatics/btl388. Epub 2006 Aug 22.

Automatic document classification of biological literature.

BMC Bioinformatics. 2006 Aug 7;7:370. doi: 10.1186/1471-2105-7-370.

Finding the evidence for protein-protein interactions from PubMed abstracts.

Bioinformatics. 2006 Jul 15;22(14):e220-6. doi: 10.1093/bioinformatics/btl203.

A tutorial on information retrieval: basic terms and concepts.

J Biomed Discov Collab. 2006 Mar 13;1:2. doi: 10.1186/1747-5333-1-2.

The use of receiver operating characteristic curves in biomedical informatics.

J Biomed Inform. 2005 Oct;38(5):404-15. doi: 10.1016/j.jbi.2005.02.008. Epub 2005 Apr 2.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

MScanner：一种用于检索医学文献数据库（Medline）引用文献的分类器。

MScanner: a classifier for retrieving Medline citations.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献