• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

医疗综合生物样本库中糖尿病的自动样本注释

Automated sample annotation for diabetes mellitus in healthcare integrated biobanking.

作者信息

Stolp Johannes, Weber Christoph, Ammon Danny, Scherag André, Fischer Claudia, Kloos Christof, Wolf Gunter, Schulze P Christian, Settmacher Utz, Bauer Michael, Stallmach Andreas, Kiehntopf Michael, Betz Boris

机构信息

Department of Clinical Chemistry and Laboratory Diagnostics and Integrated Biobank Jena (IBBJ), Jena University Hospital - Friedrich Schiller University Jena, Jena, Germany.

Data Integration Center, Jena University Hospital - Friedrich Schiller University Jena, Jena, Germany.

出版信息

Comput Struct Biotechnol J. 2024 Oct 23;24:724-733. doi: 10.1016/j.csbj.2024.10.033. eCollection 2024 Dec.

DOI:10.1016/j.csbj.2024.10.033
PMID:39668942
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11635603/
Abstract

Healthcare integrated biobanking describes the annotation and collection of residual samples from hospitalized patients for research purposes. The central idea of the current work is to establish an automated workflow for sample annotation, selection and storage for diabetes mellitus. This is challenging due to incomplete data at the time of sample selection. The study evaluates a machine learning (ML) and natural language processing (NLP) based two-step procedure for timely and precise sample annotation for diabetes mellitus. Electronic health record data of 785 persons were extracted from the hospital information system. In the first step, a conditional inference forest (CIF) model was trained and tested based on laboratory values from the first 72 h of the hospital stay using test- (n = 550) and training data sets (n = 235). Performance was compared with a simple laboratory cut-off classifier (LCC) and a logistic regression (LR) model. Algorithms based on laboratory values, ICD-10 codes or information from discharge summaries extracted by a natural language processing software (NLP-DS) were evaluated as a second (review) step designed to increase the precision of annotations. For the first step, recall/precision/F1-score/accuracy were 71 %/86 %/0.78/0.82 for CIF and 77 %/70 %/0.74/0.75 for LR compared to 73 %/68 %/0.70/0.72 for LCC. NLP-DS was the best-performing second (review) step (93 %/100 %/0.97/0.97). Combining first-step models with NLP-DS increased precision to 100 % for all procedures (66 %/100 %/0.80/0.85 for CIF&NLP-DS, 72 %/100 %/0.84/87.2 for LR&NLP-DS and 66 %/100 %/0.80/0.85 for LCC&NLP-DS). The number of samples removed by NLP-DS was higher for LR&NLP-DS and LCC&NLP-DS (removal rate 35 % and 38 % of initially selected samples) compared to CIF&NLP-DS (removal rate of 20 %). The developed two-step procedure is an efficient implementable method for timely and precise annotation of samples from diabetic hospitalized patients.

摘要

医疗保健综合生物样本库是指为研究目的而对住院患者的剩余样本进行注释和收集。当前工作的核心思想是建立一个用于糖尿病样本注释、选择和存储的自动化工作流程。由于样本选择时数据不完整,这具有挑战性。该研究评估了一种基于机器学习(ML)和自然语言处理(NLP)的两步程序,用于及时、精确地对糖尿病样本进行注释。从医院信息系统中提取了785人的电子健康记录数据。第一步,使用测试数据集(n = 550)和训练数据集(n = 235),基于住院前72小时的实验室值训练和测试条件推断森林(CIF)模型。将其性能与简单的实验室临界值分类器(LCC)和逻辑回归(LR)模型进行比较。作为第二步(审查),评估了基于实验室值、ICD - 10代码或由自然语言处理软件提取的出院小结信息的算法(NLP - DS),旨在提高注释的精度。对于第一步,CIF的召回率/精确率/F1分数/准确率分别为71%/86%/0.78/0.82,LR分别为77%/70%/0.74/0.75,而LCC分别为73%/68%/0.70/0.72。NLP - DS是表现最佳的第二步(审查)步骤(93%/100%/0.97/0.97)。将第一步模型与NLP - DS相结合,所有程序的精确率都提高到了100%(CIF&NLP - DS为66%/100%/0.80/0.85,LR&NLP - DS为72%/100%/0.84/87.2,LCC&NLP - DS为66%/100%/0.80/0.85)。与CIF&NLP - DS(去除率20%)相比,LR&NLP - DS和LCC&NLP - DS被NLP - DS去除的样本数量更多(去除率分别为最初选择样本的35%和38%)。所开发的两步程序是一种高效可实施的方法,用于及时、精确地注释糖尿病住院患者的样本。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed0/11635603/8ca369a683d9/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed0/11635603/0c72c92ca511/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed0/11635603/85c9dfeb2b3d/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed0/11635603/a12ccb25f407/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed0/11635603/d9cf72494c9d/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed0/11635603/871f1f4a6067/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed0/11635603/7cb9db0a0309/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed0/11635603/8ca369a683d9/gr6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed0/11635603/0c72c92ca511/ga1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed0/11635603/85c9dfeb2b3d/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed0/11635603/a12ccb25f407/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed0/11635603/d9cf72494c9d/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed0/11635603/871f1f4a6067/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed0/11635603/7cb9db0a0309/gr5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7ed0/11635603/8ca369a683d9/gr6.jpg

相似文献

1
Automated sample annotation for diabetes mellitus in healthcare integrated biobanking.医疗综合生物样本库中糖尿病的自动样本注释
Comput Struct Biotechnol J. 2024 Oct 23;24:724-733. doi: 10.1016/j.csbj.2024.10.033. eCollection 2024 Dec.
2
Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries.基于 FHIR 的电子健康记录表型框架的开发:以从出院小结中识别肥胖且伴有多种合并症的患者为例。
J Biomed Inform. 2019 Nov;99:103310. doi: 10.1016/j.jbi.2019.103310. Epub 2019 Oct 14.
3
Classifying Characteristics of Opioid Use Disorder From Hospital Discharge Summaries Using Natural Language Processing.使用自然语言处理对住院小结中的阿片类药物使用障碍特征进行分类。
Front Public Health. 2022 May 9;10:850619. doi: 10.3389/fpubh.2022.850619. eCollection 2022.
4
Social Reminiscence in Older Adults' Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning.老年人日常对话中的社会怀旧:使用自然语言处理和机器学习的自动检测。
J Med Internet Res. 2020 Sep 15;22(9):e19133. doi: 10.2196/19133.
5
Using natural language processing to identify opioid use disorder in electronic health record data.利用自然语言处理技术在电子健康记录数据中识别阿片类药物使用障碍。
Int J Med Inform. 2023 Feb;170:104963. doi: 10.1016/j.ijmedinf.2022.104963. Epub 2022 Dec 10.
6
Identification of Patients With Congestive Heart Failure From the Electronic Health Records of Two Hospitals: Retrospective Study.从两家医院的电子健康记录中识别充血性心力衰竭患者:回顾性研究
JMIR Med Inform. 2025 Apr 10;13:e64113. doi: 10.2196/64113.
7
Natural Language Processing for Clinical Laboratory Data Repository Systems: Implementation and Evaluation for Respiratory Viruses.临床实验室数据存储系统的自然语言处理:呼吸道病毒的实施与评估
JMIR AI. 2023 Jun 6;2:e44835. doi: 10.2196/44835.
8
Optimized Identification of Advanced Chronic Kidney Disease and Absence of Kidney Disease by Combining Different Electronic Health Data Resources and by Applying Machine Learning Strategies.通过整合不同电子健康数据资源并应用机器学习策略优化晚期慢性肾脏病及无肾脏疾病的识别
J Clin Med. 2020 Sep 12;9(9):2955. doi: 10.3390/jcm9092955.
9
Cerebrovascular disease case identification in inpatient electronic medical record data using natural language processing.利用自然语言处理技术在住院电子病历数据中进行脑血管疾病病例识别。
Brain Inform. 2023 Sep 2;10(1):22. doi: 10.1186/s40708-023-00203-w.
10
Artificial intelligence approaches for phenotyping heart failure in U.S. Veterans Health Administration electronic health record.美国退伍军人事务部电子健康记录中基于人工智能的心力衰竭表型分析方法。
ESC Heart Fail. 2024 Oct;11(5):3155-3166. doi: 10.1002/ehf2.14787. Epub 2024 Jun 14.

本文引用的文献

1
Exploring the reliability of inpatient EMR algorithms for diabetes identification.探讨住院电子病历算法在糖尿病识别中的可靠性。
BMJ Health Care Inform. 2023 Dec 20;30(1):e100894. doi: 10.1136/bmjhci-2023-100894.
2
Diabetes mellitus prediction and diagnosis from a data preprocessing and machine learning perspective.从数据预处理和机器学习角度看糖尿病的预测与诊断
Comput Methods Programs Biomed. 2022 Jun;220:106773. doi: 10.1016/j.cmpb.2022.106773. Epub 2022 Mar 31.
3
Use of Machine Learning and Routine Laboratory Tests for Diabetes Mellitus Screening.
使用机器学习和常规实验室检测进行糖尿病筛查。
Biomed Res Int. 2022 Mar 29;2022:8114049. doi: 10.1155/2022/8114049. eCollection 2022.
4
Machine learning models for classification and identification of significant attributes to detect type 2 diabetes.用于分类和识别重要属性以检测2型糖尿病的机器学习模型。
Health Inf Sci Syst. 2022 Feb 9;10(1):2. doi: 10.1007/s13755-021-00168-2. eCollection 2022 Dec.
5
Introduction of BD Vacutainer Barricor™ tubes in clinical biobanking and application of amino acid and cytokine quality indicators to Barricor plasma.BD Vacutainer Barricor™ 管在临床生物库中的应用及 Barricor 血浆中氨基酸和细胞因子质量指标的应用。
Clin Chem Lab Med. 2022 Jan 21;60(5):689-700. doi: 10.1515/cclm-2021-0899. Print 2022 Apr 26.
6
A Fusion-Based Machine Learning Approach for the Prediction of the Onset of Diabetes.一种基于融合的机器学习方法用于预测糖尿病的发病
Healthcare (Basel). 2021 Oct 18;9(10):1393. doi: 10.3390/healthcare9101393.
7
Hybrid artificial fish particle swarm optimizer and kernel extreme learning machine for type-II diabetes predictive model.混合人工鱼粒子群算法和核极限学习机的 II 型糖尿病预测模型。
Med Biol Eng Comput. 2021 Apr;59(4):841-867. doi: 10.1007/s11517-021-02333-x. Epub 2021 Mar 18.
8
Predictive Supervised Machine Learning Models for Diabetes Mellitus.用于糖尿病的预测性监督机器学习模型
SN Comput Sci. 2020;1(5):240. doi: 10.1007/s42979-020-00250-8. Epub 2020 Jul 21.
9
Identification of Potential Type II Diabetes in a Large-Scale Chinese Population Using a Systematic Machine Learning Framework.利用系统机器学习框架在中国大规模人群中识别潜在的 2 型糖尿病。
J Diabetes Res. 2020 Sep 24;2020:6873891. doi: 10.1155/2020/6873891. eCollection 2020.
10
A deep learning approach based on convolutional LSTM for detecting diabetes.基于卷积长短期记忆网络的糖尿病检测深度学习方法。
Comput Biol Chem. 2020 Oct;88:107329. doi: 10.1016/j.compbiolchem.2020.107329. Epub 2020 Jul 10.