• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于电子健康记录的结核病临床辅助决策模型

Clinical assistant decision-making model of tuberculosis based on electronic health records.

作者信息

Wang Mengying, Lee Cuixia, Wei Zhenhao, Ji Hong, Yang Yingyun, Yang Cheng

机构信息

State Key Laboratory of Media Convergence and Communication, Communication University of China, No .1 Dingfuzhuang East Street, Chaoyang District, Beijing, China.

Peking University Third Hospital, Beijing, China.

出版信息

BioData Min. 2023 Mar 16;16(1):11. doi: 10.1186/s13040-023-00328-y.

DOI:10.1186/s13040-023-00328-y
PMID:36927471
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10022184/
Abstract

BACKGROUND

Tuberculosis is a dangerous infectious disease with the largest number of reported cases in China every year. Preventing missed diagnosis has an important impact on the prevention, treatment, and recovery of tuberculosis. The earliest pulmonary tuberculosis prediction models mainly used traditional image data combined with neural network models. However, a single data source tends to miss important information, such as primary symptoms and laboratory test results, that is available in multi-source data like medical records and tests. In this study, we propose a multi-stream integrated pulmonary tuberculosis diagnosis model based on structured and unstructured multi-source data from electronic health records. With the limited number of lung specialists and the high prevalence of tuberculosis, the application of this auxiliary diagnosis model can make substantial contributions to clinical settings.

METHODS

The subjects were patients at the respiratory department and infectious cases department of a large comprehensive hospital in China between 2015 to 2020. A total of 95,294 medical records were selected through a quality control process. Each record contains structured and unstructured data. First, numerical expressions of features for structured data were created. Then, feature engineering was performed through decision tree model, random forest, and GBDT. Features were included in the feature exclusion set as per their weights in descending order. When the importance of the set was higher than 0.7, this process was concluded. Finally, the contained features were used for model training. In addition, the unstructured free-text data was segmented at the character level and input into the model after indexing. Tuberculosis prediction was conducted through a multi-stream integration tuberculosis diagnosis model (MSI-PTDM), and the evaluation indices of accuracy, AUC, sensitivity, and specificity were compared against the prediction results of XGBoost, Text-CNN, Random Forest, SVM, and so on.

RESULTS

Through a variety of characteristic engineering methods, 20 characteristic factors, such as main complaint hemoptysis, cough, and test erythrocyte sedimentation rate, were selected, and the influencing factors were analyzed using the Chinese diagnostic standard of pulmonary tuberculosis. The area under the curve values for MSI-PTDM, XGBoost, Text-CNN, RF, and SVM were 0.9858, 0.9571, 0.9486, 0.9428, and 0.9429, respectively. The sensitivity, specificity, and accuracy of MSI-PTDM were 93.18%, 96.96%, and 96.96%, respectively. The MSI-PTDM prediction model was installed at a doctor workstation and operated in a real clinic environment for 4 months. A total of 692,949 patients were monitored, including 484 patients with confirmed pulmonary tuberculosis. The model predicted 440 cases of pulmonary tuberculosis. The positive sample recognition rate was 90.91%, the false-positive rate was 9.09%, the negative sample recognition rate was 96.17%, and the false-negative rate was 3.83%.

CONCLUSIONS

MSI-PTDM can process sparse data, dense data, and unstructured text data concurrently. The model adds a feature domain vector embedding the medical sparse features, and the single-valued sparse vectors are represented by multi-dimensional dense hidden vectors, which not only enhances the feature expression but also alleviates the side effects of sparsity on the model training. However, there may be information loss when features are extracted from text, and adding the processing of original unstructured text makes up for the error within the above process to a certain extent, so that the model can learn data more comprehensively and effectively. In addition, MSI-PTDM also allows interaction between features, considers the combination effect between patient features, adds more complex nonlinear calculation considerations, and improves the learning ability of the model. It has been verified using a test set and via deployment within an actual outpatient environment.

摘要

背景

结核病是一种危险的传染病,在中国每年报告的病例数最多。预防漏诊对结核病的预防、治疗和康复具有重要影响。最早的肺结核预测模型主要使用传统图像数据结合神经网络模型。然而,单一数据源往往会遗漏重要信息,如主要症状和实验室检查结果,而这些信息在病历和检查等多源数据中是存在的。在本研究中,我们基于电子健康记录中的结构化和非结构化多源数据,提出了一种多流集成肺结核诊断模型。鉴于肺部专科医生数量有限且结核病患病率高,这种辅助诊断模型的应用可为临床环境做出重大贡献。

方法

研究对象为2015年至2020年期间中国一家大型综合医院呼吸科和感染科的患者。通过质量控制流程共选取了95294份病历。每份记录包含结构化和非结构化数据。首先,创建结构化数据特征的数值表达式。然后,通过决策树模型、随机森林和梯度提升决策树(GBDT)进行特征工程。根据特征权重从高到低将特征纳入特征排除集。当该集合的重要性高于0.7时,此过程结束。最后,将包含的特征用于模型训练。此外,非结构化自由文本数据在字符级别进行分词,并在索引后输入模型。通过多流集成肺结核诊断模型(MSI-PTDM)进行肺结核预测,并将准确率、AUC、灵敏度和特异性等评估指标与XGBoost、Text-CNN、随机森林、支持向量机等的预测结果进行比较。

结果

通过多种特征工程方法,选取了咯血、咳嗽等主要症状以及红细胞沉降率检查等20个特征因素,并根据中国肺结核诊断标准分析了影响因素。MSI-PTDM、XGBoost、Text-CNN、随机森林和支持向量机的曲线下面积值分别为0.9858、0.9571、0.9486、0.9428和0.9429。MSI-PTDM的灵敏度、特异性和准确率分别为93.18%、96.96%和96.96%。MSI-PTDM预测模型安装在医生工作站,并在实际临床环境中运行4个月。共监测了692949名患者,其中确诊肺结核患者484例。该模型预测了440例肺结核。阳性样本识别率为90.91%,假阳性率为9.09%,阴性样本识别率为96.17%,假阴性率为3.83%。

结论

MSI-PTDM可以同时处理稀疏数据、密集数据和非结构化文本数据。该模型添加了一个嵌入医学稀疏特征的特征域向量,单值稀疏向量由多维密集隐藏向量表示,这不仅增强了特征表达,还减轻了稀疏性对模型训练的负面影响。然而,从文本中提取特征时可能会有信息损失,添加原始非结构化文本的处理在一定程度上弥补了上述过程中的误差,使模型能够更全面、有效地学习数据。此外,MSI-PTDM还允许特征之间进行交互,考虑了患者特征之间的组合效应,增加了更复杂的非线性计算考量,提高了模型的学习能力。已通过测试集验证并在实际门诊环境中进行了部署。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/afb5682b2f87/13040_2023_328_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/61714317e9da/13040_2023_328_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/5652029fb7c3/13040_2023_328_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/7a7117347fe0/13040_2023_328_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/1bdffefcc0f0/13040_2023_328_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/7f9b69886562/13040_2023_328_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/ccc4bbe76b81/13040_2023_328_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/0c5c0b41c61b/13040_2023_328_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/e6d2ccc99fa4/13040_2023_328_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/afb5682b2f87/13040_2023_328_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/61714317e9da/13040_2023_328_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/5652029fb7c3/13040_2023_328_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/7a7117347fe0/13040_2023_328_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/1bdffefcc0f0/13040_2023_328_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/7f9b69886562/13040_2023_328_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/ccc4bbe76b81/13040_2023_328_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/0c5c0b41c61b/13040_2023_328_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/e6d2ccc99fa4/13040_2023_328_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c550/10022184/afb5682b2f87/13040_2023_328_Fig9_HTML.jpg

相似文献

1
Clinical assistant decision-making model of tuberculosis based on electronic health records.基于电子健康记录的结核病临床辅助决策模型
BioData Min. 2023 Mar 16;16(1):11. doi: 10.1186/s13040-023-00328-y.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
LSTM-Based Prediction Model for Tuberculosis Among HIV-Infected Patients Using Structured Electronic Medical Records: A Retrospective Machine Learning Study.基于长短期记忆网络的使用结构化电子病历预测艾滋病毒感染患者结核病的模型:一项回顾性机器学习研究
J Multidiscip Healthc. 2024 Jul 23;17:3557-3573. doi: 10.2147/JMDH.S467877. eCollection 2024.
4
Artificial Intelligence-Based Traditional Chinese Medicine Assistive Diagnostic System: Validation Study.基于人工智能的中医辅助诊断系统:验证研究。
JMIR Med Inform. 2020 Jun 15;8(6):e17608. doi: 10.2196/17608.
5
Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.人工智能通过外部资源学习语义以对出院小结中的诊断代码进行分类。
J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.
6
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
7
Deep learning model for multi-classification of infectious diseases from unstructured electronic medical records.基于无结构电子病历的传染病多分类深度学习模型。
BMC Med Inform Decis Mak. 2022 Feb 16;22(1):41. doi: 10.1186/s12911-022-01776-y.
8
Establishment of machine learning-based tool for early detection of pulmonary embolism.基于机器学习的肺栓塞早期检测工具的建立。
Comput Methods Programs Biomed. 2024 Feb;244:107977. doi: 10.1016/j.cmpb.2023.107977. Epub 2023 Dec 12.
9
Artificial Intelligence-Based Multimodal Risk Assessment Model for Surgical Site Infection (AMRAMS): Development and Validation Study.基于人工智能的手术部位感染多模态风险评估模型(AMRAMS):开发与验证研究
JMIR Med Inform. 2020 Jun 15;8(6):e18186. doi: 10.2196/18186.
10
Prediction of myopia development among Chinese school-aged children using refraction data from electronic medical records: A retrospective, multicentre machine learning study.基于电子病历中的屈光数据预测中国学龄儿童近视进展:一项回顾性、多中心机器学习研究。
PLoS Med. 2018 Nov 6;15(11):e1002674. doi: 10.1371/journal.pmed.1002674. eCollection 2018 Nov.

引用本文的文献

1
Forecasting Hospitalization for Adult Asthma Patients in Emergency Departments Based on Multiple Environmental and Clinical Factors.基于多种环境和临床因素预测急诊科成年哮喘患者的住院情况
J Asthma Allergy. 2025 May 31;18:861-876. doi: 10.2147/JAA.S512405. eCollection 2025.
2
Diagnostic Performance of Artificial Intelligence-Based Methods for Tuberculosis Detection: Systematic Review.基于人工智能的结核病检测方法的诊断性能:系统评价
J Med Internet Res. 2025 Mar 7;27:e69068. doi: 10.2196/69068.
3
A qualitative study to inform the development of a decision support tool for the diagnosis of pulmonary tuberculosis in Tigray, Ethiopia.

本文引用的文献

1
Prevalence and Risk Factors of Subclinical Tuberculosis in a Low-Incidence Setting in China.中国低发病率地区亚临床结核病的患病率及危险因素
Front Microbiol. 2022 Jan 11;12:731532. doi: 10.3389/fmicb.2021.731532. eCollection 2021.
2
[Spatio-temporal distribution of pulmonary tuberculosis and influencing factors in Beijing, 2008-2018].[2008 - 2018年北京市肺结核的时空分布及影响因素]
Zhonghua Liu Xing Bing Xue Za Zhi. 2021 Jul 10;42(7):1240-1245. doi: 10.3760/cma.j.cn112338-20210106-00008.
3
Upward trends in new, rifampicin-resistant and concurrent extrapulmonary tuberculosis cases in northern Guizhou Province of China.
一项旨在为埃塞俄比亚提格雷地区肺结核诊断决策支持工具的开发提供信息的定性研究。
BMC Med Inform Decis Mak. 2024 Nov 14;24(1):338. doi: 10.1186/s12911-024-02765-z.
4
Building RadiologyNET: an unsupervised approach to annotating a large-scale multimodal medical database.构建放射学网络:一种用于注释大规模多模态医学数据库的无监督方法。
BioData Min. 2024 Jul 12;17(1):22. doi: 10.1186/s13040-024-00373-1.
中国贵州省北部新的、利福平耐药和同时发生的肺外结核病病例呈上升趋势。
Sci Rep. 2021 Sep 9;11(1):18023. doi: 10.1038/s41598-021-97595-8.
4
Explainable machine-learning predictions for complications after pediatric congenital heart surgery.小儿先天性心脏病手术后并发症的可解释机器学习预测。
Sci Rep. 2021 Aug 26;11(1):17244. doi: 10.1038/s41598-021-96721-w.
5
Tuberculosis detection from chest x-rays for triaging in a high tuberculosis-burden setting: an evaluation of five artificial intelligence algorithms.从高结核病负担环境中的胸部 X 光片中检测结核病以进行分诊:五种人工智能算法的评估。
Lancet Digit Health. 2021 Sep;3(9):e543-e554. doi: 10.1016/S2589-7500(21)00116-3.
6
Early warning of citric acid overdose and timely adjustment of regional citrate anticoagulation based on machine learning methods.基于机器学习方法的柠檬酸过量早期预警和区域性枸橼酸抗凝的及时调整。
BMC Med Inform Decis Mak. 2021 Jul 30;21(Suppl 2):126. doi: 10.1186/s12911-021-01489-8.
7
Identification of active molecules against Mycobacterium tuberculosis through machine learning.通过机器学习鉴定抗结核分枝杆菌的活性分子。
Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab068.
8
Gap to End-TB targets in eastern China: A joinpoint analysis from population-based notification data in Zhejiang Province, China, 2005-2018.中国东部地区结核病防治目标差距:基于浙江省基于人群的报告数据的联合分析,2005-2018 年。
Int J Infect Dis. 2021 Mar;104:407-414. doi: 10.1016/j.ijid.2021.01.007. Epub 2021 Jan 9.
9
Deep learning-based automated detection algorithm for active pulmonary tuberculosis on chest radiographs: diagnostic performance in systematic screening of asymptomatic individuals.基于深度学习的胸部 X 线片活动性肺结核自动检测算法:在无症状人群系统筛查中的诊断性能。
Eur Radiol. 2021 Feb;31(2):1069-1080. doi: 10.1007/s00330-020-07219-4. Epub 2020 Aug 28.
10
Characteristics of tuberculosis patients in the integrated tuberculosis control model in Chongqing, China: a retrospective study.中国重庆结核病综合控制模式中结核病患者的特征:一项回顾性研究。
BMC Infect Dis. 2020 Aug 5;20(1):576. doi: 10.1186/s12879-020-05304-z.