• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

开发用于临床大数据研究的分类实验室检查标准化算法:回顾性研究

Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study.

作者信息

Kim Mina, Shin Soo-Yong, Kang Mira, Yi Byoung-Kee, Chang Dong Kyung

机构信息

Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology, Sungkyunkwan University, Seoul, Republic of Korea.

Health Information and Strategy Center, Samsung Medical Center, Seoul, Republic of Korea.

出版信息

JMIR Med Inform. 2019 Aug 29;7(3):e14083. doi: 10.2196/14083.

DOI:10.2196/14083
PMID:31469075
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6740165/
Abstract

BACKGROUND

Data standardization is essential in electronic health records (EHRs) for both clinical practice and retrospective research. However, it is still not easy to standardize EHR data because of nonidentical duplicates, typographical errors, or inconsistencies. To overcome this drawback, standardization efforts have been undertaken for collecting data in a standardized format as well as for curating the stored data in EHRs. To perform clinical big data research, the stored data in EHR should be standardized, starting from laboratory results, given their importance. However, most of the previous efforts have been based on labor-intensive manual methods.

OBJECTIVE

We aimed to develop an automatic standardization method for eliminating the noises of categorical laboratory data, grouping, and mapping of cleaned data using standard terminology.

METHODS

We developed a method called standardization algorithm for laboratory test-categorical result (SALT-C) that can process categorical laboratory data, such as pos +, 250 4+ (urinalysis results), and reddish (urinalysis color results). SALT-C consists of five steps. First, it applies data cleaning rules to categorical laboratory data. Second, it categorizes the cleaned data into 5 predefined groups (urine color, urine dipstick, blood type, presence-finding, and pathogenesis tests). Third, all data in each group are vectorized. Fourth, similarity is calculated between the vectors of data and those of each value in the predefined value sets. Finally, the value closest to the data is assigned.

RESULTS

The performance of SALT-C was validated using 59,213,696 data points (167,938 unique values) generated over 23 years from a tertiary hospital. Apart from the data whose original meaning could not be interpreted correctly (eg, ** and _^), SALT-C mapped unique raw data to the correct reference value for each group with accuracy of 97.6% (123/126; urine color tests), 97.5% (198/203; (urine dipstick tests), 95% (53/56; blood type tests), 99.68% (162,291/162,805; presence-finding tests), and 99.61% (4643/4661; pathogenesis tests).

CONCLUSIONS

The proposed SALT-C successfully standardized the categorical laboratory test results with high reliability. SALT-C can be beneficial for clinical big data research by reducing laborious manual standardization efforts.

摘要

背景

数据标准化在电子健康记录(EHR)中对于临床实践和回顾性研究都至关重要。然而,由于存在非完全相同的重复数据、排版错误或不一致性,对EHR数据进行标准化仍然并非易事。为克服这一缺点,已经开展了标准化工作,以标准化格式收集数据,并对EHR中存储的数据进行整理。为了进行临床大数据研究,鉴于实验室检查结果的重要性,EHR中存储的数据应从实验室检查结果开始进行标准化。然而,之前的大多数工作都是基于劳动密集型的手工方法。

目的

我们旨在开发一种自动标准化方法,以消除分类实验室数据中的噪声,使用标准术语对清理后的数据进行分组和映射。

方法

我们开发了一种称为实验室检查分类结果标准化算法(SALT-C)的方法,该方法可以处理分类实验室数据,如阳性+、250 4+(尿液分析结果)和微红(尿液颜色结果)。SALT-C由五个步骤组成。首先,它将数据清理规则应用于分类实验室数据。其次,它将清理后的数据分类为5个预定义组(尿液颜色、尿液试纸、血型、发现结果和发病机制检查)。第三,对每个组中的所有数据进行向量化。第四,计算数据向量与预定义值集中每个值的向量之间的相似度。最后,为数据分配最接近的值。

结果

使用一家三级医院23年来生成的59,213,696个数据点(167,938个唯一值)对SALT-C的性能进行了验证。除了那些原始含义无法正确解释的数据(如**和_^)外,SALT-C将唯一的原始数据准确映射到每个组的正确参考值,尿液颜色检查的准确率为97.6%(123/126),尿液试纸检查为97.5%(198/203),血型检查为95%(53/56),发现结果检查为99.68%(162,291/162,805),发病机制检查为99.61%(4643/4661)。

结论

所提出的SALT-C成功地以高可靠性对分类实验室检查结果进行了标准化。SALT-C通过减少费力的手工标准化工作,可能对临床大数据研究有益。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a55b/6740165/db237ee8f2fc/medinform_v7i3e14083_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a55b/6740165/38aa17afab28/medinform_v7i3e14083_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a55b/6740165/d70c11539aeb/medinform_v7i3e14083_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a55b/6740165/86f61f311d3f/medinform_v7i3e14083_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a55b/6740165/e4aa1f643dc4/medinform_v7i3e14083_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a55b/6740165/2f699cfcbb8f/medinform_v7i3e14083_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a55b/6740165/9e17c6bc6beb/medinform_v7i3e14083_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a55b/6740165/db237ee8f2fc/medinform_v7i3e14083_fig7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a55b/6740165/38aa17afab28/medinform_v7i3e14083_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a55b/6740165/d70c11539aeb/medinform_v7i3e14083_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a55b/6740165/86f61f311d3f/medinform_v7i3e14083_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a55b/6740165/e4aa1f643dc4/medinform_v7i3e14083_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a55b/6740165/2f699cfcbb8f/medinform_v7i3e14083_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a55b/6740165/9e17c6bc6beb/medinform_v7i3e14083_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a55b/6740165/db237ee8f2fc/medinform_v7i3e14083_fig7.jpg

相似文献

1
Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study.开发用于临床大数据研究的分类实验室检查标准化算法:回顾性研究
JMIR Med Inform. 2019 Aug 29;7(3):e14083. doi: 10.2196/14083.
2
A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity.一种使用机器学习和字符串距离相似度来标准化临床实验室分类测试结果的新方法。
Heliyon. 2023 Nov 8;9(11):e21523. doi: 10.1016/j.heliyon.2023.e21523. eCollection 2023 Nov.
3
LabRS: A Rosetta stone for retrospective standardization of clinical laboratory test results.LabRS:一种回顾性临床实验室检验结果标准化的罗塞塔石碑。
J Am Med Inform Assoc. 2018 Feb 1;25(2):121-126. doi: 10.1093/jamia/ocx046.
4
Evaluating the Reliability of EHR-Generated Clinical Outcomes Reports: A Case Study.评估电子健康记录生成的临床结果报告的可靠性:一项案例研究。
EGEMS (Wash DC). 2014 Oct 23;2(3):1102. doi: 10.13063/2327-9214.1102. eCollection 2014.
5
Comparison of test characteristics of urine dipstick and urinalysis at various test cutoff points.不同检测临界值下尿试纸条与尿液分析检测特征的比较。
Ann Emerg Med. 2001 Nov;38(5):505-12. doi: 10.1067/mem.2001.119427.
6
[Standardization of clinical laboratory data in Japan by the JAMT and JCCLS].日本临床检验标准化协会(JAMT)和日本临床检验标准化委员会(JCCLS)对日本临床实验室数据的标准化
Rinsho Byori. 2009 Jun;57(6):579-83.
7
Data extraction from electronic health records (EHRs) for quality measurement of the physical therapy process: comparison between EHR data and survey data.从电子健康记录(EHRs)中提取数据以进行物理治疗过程的质量测量:EHR数据与调查数据的比较。
BMC Med Inform Decis Mak. 2016 Nov 8;16(1):141. doi: 10.1186/s12911-016-0382-4.
8
Evaluation of an Algorithm for Identifying Ocular Conditions in Electronic Health Record Data.评估一种在电子健康记录数据中识别眼部疾病的算法。
JAMA Ophthalmol. 2019 May 1;137(5):491-497. doi: 10.1001/jamaophthalmol.2018.7051.
9
Accuracy of urinalysis dipstick techniques in predicting significant proteinuria in pregnancy.尿液分析试纸法预测妊娠期间显著蛋白尿的准确性。
Obstet Gynecol. 2004 Apr;103(4):769-77. doi: 10.1097/01.AOG.0000118311.18958.63.
10
Electronic Health Records Data and Metadata: Challenges for Big Data in the United States.电子健康记录数据和元数据:美国大数据面临的挑战。
Big Data. 2013 Dec;1(4):245-51. doi: 10.1089/big.2013.0023. Epub 2013 Dec 14.

引用本文的文献

1
Data science applied to the assessment of biological variation estimates.应用于生物变异估计评估的数据科学。
Adv Lab Med. 2025 Apr 1;6(2):154-159. doi: 10.1515/almed-2025-0042. eCollection 2025 Jun.
2
[Not Available].[无可用内容]
Adv Lab Med. 2025 Apr 3;6(2):160-165. doi: 10.1515/almed-2024-0163. eCollection 2025 Jun.
3
A novel approach for standardizing clinical laboratory categorical test results using machine learning and string distance similarity.一种使用机器学习和字符串距离相似度来标准化临床实验室分类测试结果的新方法。

本文引用的文献

1
Application of Efficient Data Cleaning Using Text Clustering for Semistructured Medical Reports to Large-Scale Stool Examination Reports: Methodology Study.将基于文本聚类的高效数据清理应用于半结构化医学报告以处理大规模粪便检查报告:方法学研究
J Med Internet Res. 2019 Jan 8;21(1):e10013. doi: 10.2196/10013.
2
Big Data Analysis and Machine Learning in Intensive Care Units.重症监护病房中的大数据分析与机器学习
Med Intensiva (Engl Ed). 2019 Oct;43(7):416-426. doi: 10.1016/j.medin.2018.10.007. Epub 2018 Dec 24.
3
Automated mapping of laboratory tests to LOINC codes using noisy labels in a national electronic health record system database.
Heliyon. 2023 Nov 8;9(11):e21523. doi: 10.1016/j.heliyon.2023.e21523. eCollection 2023 Nov.
4
Big data and reference intervals: rationale, current practices, harmonization and standardization prerequisites and future perspectives of indirect determination of reference intervals using routine data.大数据与参考区间:使用常规数据间接确定参考区间的基本原理、当前实践、协调与标准化的先决条件及未来展望
Adv Lab Med. 2020 Aug 8;2(1):9-25. doi: 10.1515/almed-2020-0034. eCollection 2021 Mar.
5
Digital Health Data Quality Issues: Systematic Review.数字健康数据质量问题:系统评价。
J Med Internet Res. 2023 Mar 31;25:e42615. doi: 10.2196/42615.
6
Automated Mapping of Real-world Oncology Laboratory Data to LOINC.真实世界肿瘤学实验室数据到 LOINC 的自动映射。
AMIA Annu Symp Proc. 2022 Feb 21;2021:611-620. eCollection 2021.
7
Rehabilitation of Sepsis Patients with Acute Kidney Injury Based on Intelligent Medical Big Data.基于智能医疗大数据的脓毒症急性肾损伤患者康复
J Healthc Eng. 2022 Jan 7;2022:8414135. doi: 10.1155/2022/8414135. eCollection 2022.
8
Logical Observation Identifiers Names and Codes (LOINC) Applied to Microbiology: A National Laboratory Mapping Experience in Taiwan.应用于微生物学的逻辑观察标识符名称和代码(LOINC):台湾的一项国家实验室映射经验。
Diagnostics (Basel). 2021 Aug 28;11(9):1564. doi: 10.3390/diagnostics11091564.
9
Effect of Age on the Initiation of Biologic Agent Therapy in Patients With Inflammatory Bowel Disease: Korean Common Data Model Cohort Study.年龄对炎症性肠病患者生物制剂治疗起始的影响:韩国通用数据模型队列研究
JMIR Med Inform. 2020 Apr 15;8(4):e15124. doi: 10.2196/15124.
10
Standard operating procedure for curation and clinical interpretation of variants in cancer.癌症变异的管理和临床解读标准操作规程。
Genome Med. 2019 Nov 29;11(1):76. doi: 10.1186/s13073-019-0687-x.
利用国家电子健康记录系统数据库中的噪声标签对实验室检测进行自动化 LOINC 编码映射。
J Am Med Inform Assoc. 2018 Oct 1;25(10):1292-1300. doi: 10.1093/jamia/ocy110.
4
A Comparison of Data Quality Assessment Checks in Six Data Sharing Networks.六个数据共享网络中数据质量评估检查的比较
EGEMS (Wash DC). 2017 Jun 12;5(1):8. doi: 10.5334/egems.223.
5
An OMOP CDM-Based Relational Database of Clinical Research Eligibility Criteria.基于观察性医疗结果合作组织通用数据模型的临床研究资格标准关系数据库。
Stud Health Technol Inform. 2017;245:950-954.
6
Electronic phenotyping with APHRODITE and the Observational Health Sciences and Informatics (OHDSI) data network.利用阿佛洛狄忒(APHRODITE)和观察性健康科学与信息学(OHDSI)数据网络进行电子表型分析。
AMIA Jt Summits Transl Sci Proc. 2017 Jul 26;2017:48-57. eCollection 2017.
7
A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data.电子健康记录数据二次使用的统一数据质量评估术语和框架。
EGEMS (Wash DC). 2016 Sep 11;4(1):1244. doi: 10.13063/2327-9214.1244. eCollection 2016.
8
A curated and standardized adverse drug event resource to accelerate drug safety research.一个经过策划和标准化的药物不良事件资源,以加速药物安全研究。
Sci Data. 2016 May 10;3:160026. doi: 10.1038/sdata.2016.26.
9
Data Extraction and Management in Networks of Observational Health Care Databases for Scientific Research: A Comparison of EU-ADR, OMOP, Mini-Sentinel and MATRICE Strategies.用于科研的观察性医疗保健数据库网络中的数据提取与管理:欧盟药物不良反应(EU-ADR)、观察医疗结果合作组织(OMOP)、小型哨点监测系统(Mini-Sentinel)和医学研究信息与计算中心(MATRICE)策略的比较
EGEMS (Wash DC). 2016 Feb 8;4(1):1189. doi: 10.13063/2327-9214.1189. eCollection 2016.
10
A normalization method for combination of laboratory test results from different electronic healthcare databases in a distributed research network.一种用于分布式研究网络中不同电子医疗数据库实验室检测结果合并的标准化方法。
Pharmacoepidemiol Drug Saf. 2016 Mar;25(3):307-16. doi: 10.1002/pds.3893. Epub 2015 Nov 3.