• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于机器学习的自然语言处理方法对传染病发生的在线资源进行自动分类。

Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches.

机构信息

Department of Preventive Medicine, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea.

Department of Data and HPC Science, University of Science and Technology, Daejeon 34113, Korea.

出版信息

Int J Environ Res Public Health. 2020 Dec 17;17(24):9467. doi: 10.3390/ijerph17249467.

DOI:10.3390/ijerph17249467
PMID:33348764
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7766498/
Abstract

Collecting valid information from electronic sources to detect the potential outbreak of infectious disease is time-consuming and labor-intensive. The automated identification of relevant information using machine learning is necessary to respond to a potential disease outbreak. A total of 2864 documents were collected from various websites and subsequently manually categorized and labeled by two reviewers. Accurate labels for the training and test data were provided based on a reviewer consensus. Two machine learning algorithms-ConvNet and bidirectional long short-term memory (BiLSTM)-and two classification methods-DocClass and SenClass-were used for classifying the documents. The precision, recall, F1, accuracy, and area under the curve were measured to evaluate the performance of each model. ConvNet yielded higher average, min, and max accuracies (87.6%, 85.2%, and 91.1%, respectively) than BiLSTM with DocClass, while BiLSTM performed better than ConvNet with SenClass with average, min, and max accuracies of 92.8%, 92.6%, and 93.3%, respectively. The performance of BiLSTM with SenClass yielded an overall accuracy of 92.9% in classifying infectious disease occurrences. Machine learning had a compatible performance with a human expert given a particular text extraction system. This study suggests that analyzing information from the website using machine learning can achieve significant accuracies in the presence of abundant articles/documents.

摘要

从电子资源中收集有效信息以检测传染病的潜在爆发是耗时且劳动密集的。使用机器学习自动识别相关信息对于应对潜在的疾病爆发是必要的。总共从各种网站收集了 2864 篇文件,然后由两名评审员手动进行分类和标记。根据评审员的共识,为训练和测试数据提供了准确的标签。使用两种机器学习算法(卷积神经网络和双向长短时记忆网络)和两种分类方法(DocClass 和 SenClass)对文档进行分类。测量了精度、召回率、F1 值、准确性和曲线下面积,以评估每个模型的性能。在使用 DocClass 进行分类时,卷积神经网络的平均、最小和最大准确率(分别为 87.6%、85.2%和 91.1%)均高于双向长短时记忆网络,而在使用 SenClass 进行分类时,双向长短时记忆网络的平均、最小和最大准确率(分别为 92.8%、92.6%和 93.3%)均高于卷积神经网络。使用 SenClass 的双向长短时记忆网络在分类传染病发生方面的整体准确率为 92.9%。机器学习与特定文本提取系统结合,其性能与人类专家相当。本研究表明,在存在大量文章/文档的情况下,使用机器学习分析网站信息可以实现较高的准确率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4e2/7766498/7fbf6bb9b9c0/ijerph-17-09467-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4e2/7766498/0d296e524cf8/ijerph-17-09467-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4e2/7766498/e61c82273fbb/ijerph-17-09467-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4e2/7766498/7d3ca678b823/ijerph-17-09467-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4e2/7766498/4d02f8b943a8/ijerph-17-09467-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4e2/7766498/7fbf6bb9b9c0/ijerph-17-09467-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4e2/7766498/0d296e524cf8/ijerph-17-09467-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4e2/7766498/e61c82273fbb/ijerph-17-09467-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4e2/7766498/7d3ca678b823/ijerph-17-09467-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4e2/7766498/4d02f8b943a8/ijerph-17-09467-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4e2/7766498/7fbf6bb9b9c0/ijerph-17-09467-g005.jpg

相似文献

1
Automated Classification of Online Sources for Infectious Disease Occurrences Using Machine-Learning-Based Natural Language Processing Approaches.基于机器学习的自然语言处理方法对传染病发生的在线资源进行自动分类。
Int J Environ Res Public Health. 2020 Dec 17;17(24):9467. doi: 10.3390/ijerph17249467.
2
Development of a global infectious disease activity database using natural language processing, machine learning, and human expertise.利用自然语言处理、机器学习和人类专业知识开发全球传染病活动数据库。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1355-1359. doi: 10.1093/jamia/ocz112.
3
Social Reminiscence in Older Adults' Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning.老年人日常对话中的社会怀旧:使用自然语言处理和机器学习的自动检测。
J Med Internet Res. 2020 Sep 15;22(9):e19133. doi: 10.2196/19133.
4
Named entity recognition from Chinese adverse drug event reports with lexical feature based BiLSTM-CRF and tri-training.基于词汇特征的 BiLSTM-CRF 和三训练的中药不良事件报告命名实体识别。
J Biomed Inform. 2019 Aug;96:103252. doi: 10.1016/j.jbi.2019.103252. Epub 2019 Jul 16.
5
Automated Travel History Extraction From Clinical Notes for Informing the Detection of Emergent Infectious Disease Events: Algorithm Development and Validation.从临床记录中自动提取旅行史以用于传染病事件的检测:算法的开发和验证。
JMIR Public Health Surveill. 2021 Mar 24;7(3):e26719. doi: 10.2196/26719.
6
Performance of a Machine Learning Classifier of Knee MRI Reports in Two Large Academic Radiology Practices: A Tool to Estimate Diagnostic Yield.在两家大型学术放射科实践中膝关节MRI报告的机器学习分类器性能:一种估计诊断率的工具
AJR Am J Roentgenol. 2017 Apr;208(4):750-753. doi: 10.2214/AJR.16.16128. Epub 2017 Jan 31.
7
Natural language processing and machine learning approaches for food categorization and nutrition quality prediction compared with traditional methods.与传统方法相比,用于食品分类和营养质量预测的自然语言处理和机器学习方法。
Am J Clin Nutr. 2023 Mar;117(3):553-563. doi: 10.1016/j.ajcnut.2022.11.022. Epub 2022 Dec 23.
8
A comparison of rule-based and machine learning approaches for classifying patient portal messages.基于规则和机器学习方法在患者门户消息分类中的比较。
Int J Med Inform. 2017 Sep;105:110-120. doi: 10.1016/j.ijmedinf.2017.06.004. Epub 2017 Jun 23.
9
Automation of penicillin adverse drug reaction categorisation and risk stratification with machine learning natural language processing.利用机器学习自然语言处理实现青霉素药物不良反应分类和风险分层的自动化。
Int J Med Inform. 2021 Dec;156:104611. doi: 10.1016/j.ijmedinf.2021.104611. Epub 2021 Oct 5.
10
Natural Language Processing for Imaging Protocol Assignment: Machine Learning for Multiclass Classification of Abdominal CT Protocols Using Indication Text Data.基于自然语言处理的成像协议分配:使用指示文本数据进行多类分类的腹部 CT 协议的机器学习。
J Digit Imaging. 2022 Oct;35(5):1120-1130. doi: 10.1007/s10278-022-00633-8. Epub 2022 Jun 2.

引用本文的文献

1
Machine Learning and Artificial Intelligence for Infectious Disease Surveillance, Diagnosis, and Prognosis.用于传染病监测、诊断和预后的机器学习与人工智能
Viruses. 2025 Jun 23;17(7):882. doi: 10.3390/v17070882.
2
Extracting circumstances of Covid-19 transmission from free text with large language models.使用大语言模型从自由文本中提取新冠病毒-19传播情况
Nat Commun. 2025 Jul 1;16(1):5836. doi: 10.1038/s41467-025-60762-w.
3
Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques.

本文引用的文献

1
Automatic online news monitoring and classification for syndromic surveillance.用于症状监测的自动在线新闻监测与分类
Decis Support Syst. 2009 Nov;47(4):508-517. doi: 10.1016/j.dss.2009.04.016. Epub 2009 May 4.
2
Development of a global infectious disease activity database using natural language processing, machine learning, and human expertise.利用自然语言处理、机器学习和人类专业知识开发全球传染病活动数据库。
J Am Med Inform Assoc. 2019 Nov 1;26(11):1355-1359. doi: 10.1093/jamia/ocz112.
3
A machine learning-based approach for predicting the outbreak of cardiovascular diseases in patients on dialysis.
追踪全球卫生共同财资金:使用自然语言处理技术的机器学习方法。
Front Public Health. 2022 Nov 17;10:1031147. doi: 10.3389/fpubh.2022.1031147. eCollection 2022.
4
Sentiment Classification of Chinese Tourism Reviews Based on ERNIE-Gram+GCN.基于 ERNIE-Gram+GCN 的中文旅游评论情感分类。
Int J Environ Res Public Health. 2022 Oct 19;19(20):13520. doi: 10.3390/ijerph192013520.
5
Elaboration of a new framework for fine-grained epidemiological annotation.细粒度流行病学注释新框架的构建。
Sci Data. 2022 Oct 26;9(1):655. doi: 10.1038/s41597-022-01743-2.
6
Linguistic Pattern-Infused Dual-Channel Bidirectional Long Short-term Memory With Attention for Dengue Case Summary Generation From the Program for Monitoring Emerging Diseases-Mail Database: Algorithm Development Study.基于语言模式融合双通道双向长短时记忆模型与注意力机制的登革热病例摘要生成研究:从疾病监测计划邮件数据库开发算法。
JMIR Public Health Surveill. 2022 Jul 13;8(7):e34583. doi: 10.2196/34583.
7
Machine and cognitive intelligence for human health: systematic review.用于人类健康的机器与认知智能:系统综述
Brain Inform. 2022 Feb 12;9(1):5. doi: 10.1186/s40708-022-00153-9.
8
Identification of the high-risk area for schistosomiasis transmission in China based on information value and machine learning: a newly data-driven modeling attempt.基于信息价值和机器学习的中国血吸虫病传播高风险区识别:一种新的数据驱动建模尝试。
Infect Dis Poverty. 2021 Jun 27;10(1):88. doi: 10.1186/s40249-021-00874-9.
基于机器学习的方法预测透析患者心血管疾病的爆发。
Comput Methods Programs Biomed. 2019 Aug;177:9-15. doi: 10.1016/j.cmpb.2019.05.005. Epub 2019 May 13.
4
Classification of Skin Disease using Ensemble Data Mining Techniques.使用集成数据挖掘技术对皮肤病进行分类。
Asian Pac J Cancer Prev. 2019 Jun 1;20(6):1887-1894. doi: 10.31557/APJCP.2019.20.6.1887.
5
Comparison of the accuracy of human readers versus machine-learning algorithms for pigmented skin lesion classification: an open, web-based, international, diagnostic study.比较人类读者和机器学习算法在色素性皮肤病变分类中的准确性:一项开放的、基于网络的、国际性的、诊断性研究。
Lancet Oncol. 2019 Jul;20(7):938-947. doi: 10.1016/S1470-2045(19)30333-X. Epub 2019 Jun 12.
6
Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants.使用自动化机器学习进行心血管疾病风险预测:对 423604 名英国生物库参与者的前瞻性研究。
PLoS One. 2019 May 15;14(5):e0213653. doi: 10.1371/journal.pone.0213653. eCollection 2019.
7
Machine learning to parse breast pathology reports in Chinese.基于机器学习的中文乳腺病理报告解析
Breast Cancer Res Treat. 2018 Jun;169(2):243-250. doi: 10.1007/s10549-018-4668-3. Epub 2018 Jan 29.
8
ProMED-mail: 22 years of digital surveillance of emerging infectious diseases.国际传染病监测预警组织(ProMED-mail):22年的新发传染病数字监测。
Int Health. 2017 May 1;9(3):177-183. doi: 10.1093/inthealth/ihx014.
9
A new method for assessing the risk of infectious disease outbreak.一种评估传染病爆发风险的新方法。
Sci Rep. 2017 Jan 9;7:40084. doi: 10.1038/srep40084.
10
An efficient machine learning approach for diagnosis of paraquat-poisoned patients.一种用于诊断百草枯中毒患者的高效机器学习方法。
Comput Biol Med. 2015 Apr;59:116-124. doi: 10.1016/j.compbiomed.2015.02.003. Epub 2015 Feb 12.