Suppr超能文献

过去20年医学领域自然语言处理研究进展的系统评价:基于PubMed的文献计量学研究

Systematic Evaluation of Research Progress on Natural Language Processing in Medicine Over the Past 20 Years: Bibliometric Study on PubMed.

作者信息

Wang Jing, Deng Huan, Liu Bangtao, Hu Anbin, Liang Jun, Fan Lingye, Zheng Xu, Wang Tong, Lei Jianbo

机构信息

School of Medical Informatics and Engineering, Southwest Medical University, Luzhou, China.

IT Center, Second Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou, China.

出版信息

J Med Internet Res. 2020 Jan 23;22(1):e16816. doi: 10.2196/16816.

Abstract

BACKGROUND

Natural language processing (NLP) is an important traditional field in computer science, but its application in medical research has faced many challenges. With the extensive digitalization of medical information globally and increasing importance of understanding and mining big data in the medical field, NLP is becoming more crucial.

OBJECTIVE

The goal of the research was to perform a systematic review on the use of NLP in medical research with the aim of understanding the global progress on NLP research outcomes, content, methods, and study groups involved.

METHODS

A systematic review was conducted using the PubMed database as a search platform. All published studies on the application of NLP in medicine (except biomedicine) during the 20 years between 1999 and 2018 were retrieved. The data obtained from these published studies were cleaned and structured. Excel (Microsoft Corp) and VOSviewer (Nees Jan van Eck and Ludo Waltman) were used to perform bibliometric analysis of publication trends, author orders, countries, institutions, collaboration relationships, research hot spots, diseases studied, and research methods.

RESULTS

A total of 3498 articles were obtained during initial screening, and 2336 articles were found to meet the study criteria after manual screening. The number of publications increased every year, with a significant growth after 2012 (number of publications ranged from 148 to a maximum of 302 annually). The United States has occupied the leading position since the inception of the field, with the largest number of articles published. The United States contributed to 63.01% (1472/2336) of all publications, followed by France (5.44%, 127/2336) and the United Kingdom (3.51%, 82/2336). The author with the largest number of articles published was Hongfang Liu (70), while Stéphane Meystre (17) and Hua Xu (33) published the largest number of articles as the first and corresponding authors. Among the first author's affiliation institution, Columbia University published the largest number of articles, accounting for 4.54% (106/2336) of the total. Specifically, approximately one-fifth (17.68%, 413/2336) of the articles involved research on specific diseases, and the subject areas primarily focused on mental illness (16.46%, 68/413), breast cancer (5.81%, 24/413), and pneumonia (4.12%, 17/413).

CONCLUSIONS

NLP is in a period of robust development in the medical field, with an average of approximately 100 publications annually. Electronic medical records were the most used research materials, but social media such as Twitter have become important research materials since 2015. Cancer (24.94%, 103/413) was the most common subject area in NLP-assisted medical research on diseases, with breast cancers (23.30%, 24/103) and lung cancers (14.56%, 15/103) accounting for the highest proportions of studies. Columbia University and the talents trained therein were the most active and prolific research forces on NLP in the medical field.

摘要

背景

自然语言处理(NLP)是计算机科学中一个重要的传统领域,但其在医学研究中的应用面临诸多挑战。随着全球医学信息的广泛数字化以及医学领域中理解和挖掘大数据的重要性日益增加,NLP变得愈发关键。

目的

本研究的目标是对NLP在医学研究中的应用进行系统评价,旨在了解NLP研究成果、内容、方法以及所涉及的研究群体的全球进展情况。

方法

以PubMed数据库作为搜索平台进行系统评价。检索了1999年至2018年这20年间所有已发表的关于NLP在医学(生物医学除外)中应用的研究。对从这些已发表研究中获得的数据进行清理和结构化处理。使用Excel(微软公司)和VOSviewer(内斯·扬·范·埃克和卢多·沃尔特曼)对发表趋势、作者顺序、国家、机构、合作关系、研究热点、所研究疾病以及研究方法进行文献计量分析。

结果

初步筛选共获得3498篇文章,经人工筛选后发现2336篇文章符合研究标准。每年的出版物数量都在增加,2012年后有显著增长(每年的出版物数量从148篇到最多302篇不等)。自该领域创立以来,美国一直占据领先地位,发表的文章数量最多。美国的出版物占所有出版物的63.01%(1472/2336),其次是法国(5.44%,127/2336)和英国(3.51%,82/2336)。发表文章数量最多的作者是刘红芳(70篇),而斯特凡·梅斯特雷(17篇)和徐华(33篇)作为第一作者和通讯作者发表的文章数量最多。在第一作者所属机构中,哥伦比亚大学发表的文章数量最多,占总数的4.54%(106/2336)。具体而言,约五分之一(17.68%,413/2336)的文章涉及特定疾病的研究,主题领域主要集中在精神疾病(16.46%,68/413)、乳腺癌(5.81%,24/413)和肺炎(4.12%,17/413)。

结论

NLP在医学领域正处于蓬勃发展时期,平均每年约有100篇出版物。电子病历是最常用的研究材料,但自2015年以来,推特等社交媒体已成为重要的研究材料。癌症(24.94%,103/413)是NLP辅助医学疾病研究中最常见的主题领域,其中乳腺癌(23.30%,24/103)和肺癌(14.56%,15/103)的研究比例最高。哥伦比亚大学及其培养的人才是医学领域NLP最活跃、最多产的研究力量。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1a08/7005695/7e23a4438a79/jmir_v22i1e16816_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验