基于机器学习的自然语言处理方法对传染病发生的在线资源进行自动分类。

Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches.

机构信息

Department of Preventive Medicine, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.

Department of Data and HPC Science, University of Science and Technology, Daejeon 34113, Korea.

出版信息

Int J Environ Res Public Health. 2020 Dec 17;17(24):9467. doi: 10.3390/ijerph17249467.

DOI:10.3390/ijerph17249467

PMID:33348764

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7766498/

Abstract

Collecting valid information from electronic sources to detect the potential outbreak of infectious disease is time-consuming and labor-intensive. The automated identification of relevant information using machine learning is necessary to respond to a potential disease outbreak. A total of 2864 documents were collected from various websites and subsequently manually categorized and labeled by two reviewers. Accurate labels for the training and test data were provided based on a reviewer consensus. Two machine learning algorithms-ConvNet and bidirectional long short-term memory (BiLSTM)-and two classification methods-DocClass and SenClass-were used for classifying the documents. The precision, recall, F1, accuracy, and area under the curve were measured to evaluate the performance of each model. ConvNet yielded higher average, min, and max accuracies (87.6%, 85.2%, and 91.1%, respectively) than BiLSTM with DocClass, while BiLSTM performed better than ConvNet with SenClass with average, min, and max accuracies of 92.8%, 92.6%, and 93.3%, respectively. The performance of BiLSTM with SenClass yielded an overall accuracy of 92.9% in classifying infectious disease occurrences. Machine learning had a compatible performance with a human expert given a particular text extraction system. This study suggests that analyzing information from the website using machine learning can achieve significant accuracies in the presence of abundant articles/documents.

摘要

从电子资源中收集有效信息以检测传染病的潜在爆发是耗时且劳动密集的。使用机器学习自动识别相关信息对于应对潜在的疾病爆发是必要的。总共从各种网站收集了 2864 篇文件，然后由两名评审员手动进行分类和标记。根据评审员的共识，为训练和测试数据提供了准确的标签。使用两种机器学习算法（卷积神经网络和双向长短时记忆网络）和两种分类方法（DocClass 和 SenClass）对文档进行分类。测量了精度、召回率、F1 值、准确性和曲线下面积，以评估每个模型的性能。在使用 DocClass 进行分类时，卷积神经网络的平均、最小和最大准确率（分别为 87.6%、85.2%和 91.1%）均高于双向长短时记忆网络，而在使用 SenClass 进行分类时，双向长短时记忆网络的平均、最小和最大准确率（分别为 92.8%、92.6%和 93.3%）均高于卷积神经网络。使用 SenClass 的双向长短时记忆网络在分类传染病发生方面的整体准确率为 92.9%。机器学习与特定文本提取系统结合，其性能与人类专家相当。本研究表明，在存在大量文章/文档的情况下，使用机器学习分析网站信息可以实现较高的准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4e2/7766498/0d296e524cf8/ijerph-17-09467-g001.jpg

相似文献

Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches.

Int J Environ Res Public Health. 2020 Dec 17;17(24):9467. doi: 10.3390/ijerph17249467.

Development of a global infectious disease activity database using natural language processing, machine learning, and human expertise.

J Am Med Inform Assoc. 2019 Nov 1;26(11):1355-1359. doi: 10.1093/jamia/ocz112.

Social Reminiscence in Older Adults' Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning.

J Med Internet Res. 2020 Sep 15;22(9):e19133. doi: 10.2196/19133.

Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training.

J Biomed Inform. 2019 Aug;96:103252. doi: 10.1016/j.jbi.2019.103252. Epub 2019 Jul 16.

Automated Travel History Extraction From Clinical Notes for Informing the Detection of Emergent Infectious Disease Events: Algorithm Development and Validation.

JMIR Public Health Surveill. 2021 Mar 24;7(3):e26719. doi: 10.2196/26719.

Performance of a Machine Learning Classifier of Knee MRI Reports in Two Large Academic Radiology Practices: A Tool to Estimate Diagnostic Yield.

AJR Am J Roentgenol. 2017 Apr;208(4):750-753. doi: 10.2214/AJR.16.16128. Epub 2017 Jan 31.

Natural language processing and machine learning approaches for food categorization and nutrition quality prediction compared with traditional methods.

Am J Clin Nutr. 2023 Mar;117(3):553-563. doi: 10.1016/j.ajcnut.2022.11.022. Epub 2022 Dec 23.

A comparison of rule-based and machine learning approaches for classifying patient portal messages.

Int J Med Inform. 2017 Sep;105:110-120. doi: 10.1016/j.ijmedinf.2017.06.004. Epub 2017 Jun 23.

Automation of penicillin adverse drug reaction categorisation and risk stratification with machine learning natural language processing.

Int J Med Inform. 2021 Dec;156:104611. doi: 10.1016/j.ijmedinf.2021.104611. Epub 2021 Oct 5.

Natural Language Processing for Imaging Protocol Assignment: Machine Learning for Multiclass Classification of Abdominal CT Protocols Using Indication Text Data.

J Digit Imaging. 2022 Oct;35(5):1120-1130. doi: 10.1007/s10278-022-00633-8. Epub 2022 Jun 2.

引用本文的文献

Machine Learning and Artificial Intelligence for Infectious Disease Surveillance, Diagnosis, and Prognosis.

Viruses. 2025 Jun 23;17(7):882. doi: 10.3390/v17070882.

Extracting circumstances of Covid-19 transmission from free text with large language models.

Nat Commun. 2025 Jul 1;16(1):5836. doi: 10.1038/s41467-025-60762-w.

Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques.

Front Public Health. 2022 Nov 17;10:1031147. doi: 10.3389/fpubh.2022.1031147. eCollection 2022.

Sentiment Classification of Chinese Tourism Reviews Based on ERNIE-Gram+GCN.

Int J Environ Res Public Health. 2022 Oct 19;19(20):13520. doi: 10.3390/ijerph192013520.

Elaboration of a new framework for fine-grained epidemiological annotation.

Sci Data. 2022 Oct 26;9(1):655. doi: 10.1038/s41597-022-01743-2.

Linguistic Pattern-Infused Dual-Channel Bidirectional Long Short-term Memory With Attention for Dengue Case Summary Generation From the Program for Monitoring Emerging Diseases-Mail Database: Algorithm Development Study.

JMIR Public Health Surveill. 2022 Jul 13;8(7):e34583. doi: 10.2196/34583.

Machine and cognitive intelligence for human health: systematic review.

Brain Inform. 2022 Feb 12;9(1):5. doi: 10.1186/s40708-022-00153-9.

Identification of the high-risk area for schistosomiasis transmission in China based on information value and machine learning: a newly data-driven modeling attempt.

Infect Dis Poverty. 2021 Jun 27;10(1):88. doi: 10.1186/s40249-021-00874-9.

本文引用的文献

Automatic online news monitoring and classification for syndromic surveillance.

Decis Support Syst. 2009 Nov;47(4):508-517. doi: 10.1016/j.dss.2009.04.016. Epub 2009 May 4.

Development of a global infectious disease activity database using natural language processing, machine learning, and human expertise.

J Am Med Inform Assoc. 2019 Nov 1;26(11):1355-1359. doi: 10.1093/jamia/ocz112.

A machine learning-based approach for predicting the outbreak of cardiovascular diseases in patients on dialysis.

Comput Methods Programs Biomed. 2019 Aug;177:9-15. doi: 10.1016/j.cmpb.2019.05.005. Epub 2019 May 13.

Classification of Skin Disease using Ensemble Data Mining Techniques.

Asian Pac J Cancer Prev. 2019 Jun 1;20(6):1887-1894. doi: 10.31557/APJCP.2019.20.6.1887.

Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study.

Lancet Oncol. 2019 Jul;20(7):938-947. doi: 10.1016/S1470-2045(19)30333-X. Epub 2019 Jun 12.

Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants.

PLoS One. 2019 May 15;14(5):e0213653. doi: 10.1371/journal.pone.0213653. eCollection 2019.

Machine learning to parse breast pathology reports in Chinese.

Breast Cancer Res Treat. 2018 Jun;169(2):243-250. doi: 10.1007/s10549-018-4668-3. Epub 2018 Jan 29.

ProMED-mail: 22 years of digital surveillance of emerging infectious diseases.

Int Health. 2017 May 1;9(3):177-183. doi: 10.1093/inthealth/ihx014.

A new method for assessing the risk of infectious disease outbreak.

Sci Rep. 2017 Jan 9;7:40084. doi: 10.1038/srep40084.

An efficient machine learning approach for diagnosis of paraquat-poisoned patients.

Comput Biol Med. 2015 Apr;59:116-124. doi: 10.1016/j.compbiomed.2015.02.003. Epub 2015 Feb 12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于机器学习的自然语言处理方法对传染病发生的在线资源进行自动分类。

Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches.

机构信息

Department of Preventive Medicine, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.

Department of Data and HPC Science, University of Science and Technology, Daejeon 34113, Korea.

出版信息

Int J Environ Res Public Health. 2020 Dec 17;17(24):9467. doi: 10.3390/ijerph17249467.

DOI:10.3390/ijerph17249467

PMID:33348764

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7766498/

Abstract

摘要

基于机器学习的自然语言处理方法对传染病发生的在线资源进行自动分类。

Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

基于机器学习的自然语言处理方法对传染病发生的在线资源进行自动分类。

Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献