极性标记：用于训练疾病分类器的银标准算法。

Polar labeling: silver standard algorithm for training disease classifiers.

机构信息

Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02114, USA.

Partners Healthcare, Somerville, MA 02145, USA.

出版信息

Bioinformatics. 2020 May 1;36(10):3200-3206. doi: 10.1093/bioinformatics/btaa088.

DOI:10.1093/bioinformatics/btaa088

PMID:32049335

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7214041/

Abstract

MOTIVATION

Expert-labeled data are essential to train phenotyping algorithms for cohort identification. However expert labeling is time and labor intensive, and the costs remain prohibitive for scaling phenotyping to wider use-cases.

RESULTS

We present an approach referred to as polar labeling (PL), to create silver standard for training machine learning (ML) for disease classification. We test the hypothesis that ML models trained on the silver standard created by applying PL on unlabeled patient records, are comparable in performance to the ML models trained on gold standard, created by clinical experts through manual review of patient records. We perform experimental validation using health records of 38 023 patients spanning six diseases. Our results demonstrate the superior performance of the proposed approach.

AVAILABILITY AND IMPLEMENTATION

We provide a Python implementation of the algorithm and the Python code developed for this study on Github.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

专家标记的数据对于训练用于队列识别的表型算法至关重要。然而，专家标记既费时又费力，对于将表型分析扩展到更广泛的用例而言，其成本仍然过高。

结果

我们提出了一种称为极性标记（Polar Labeling，PL）的方法，用于创建用于疾病分类的机器学习（ML）训练的银标准。我们检验了一个假设，即通过对未标记的患者记录应用 PL 来训练的 ML 模型，其性能与通过对患者记录进行手动审查由临床专家创建的金标准训练的 ML 模型相当。我们使用跨越六种疾病的 38023 名患者的健康记录进行实验验证。我们的结果表明了所提出方法的优越性能。

可用性和实施

我们在 Github 上提供了该算法的 Python 实现以及为这项研究开发的 Python 代码。

补充信息

补充数据可在《生物信息学》在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f223/7214041/2192c7c4867c/btaa088f2.jpg

相似文献

Polar labeling: silver standard algorithm for training disease classifiers.极性标记：用于训练疾病分类器的银标准算法。

Bioinformatics. 2020 May 1;36(10):3200-3206. doi: 10.1093/bioinformatics/btaa088.

Automated feature selection of predictors in electronic medical records data.电子病历数据中预测指标的自动特征选择

Biometrics. 2019 Mar;75(1):268-277. doi: 10.1111/biom.12987. Epub 2019 Apr 2.

NIAPU: network-informed adaptive positive-unlabeled learning for disease gene identification.NIAPU：用于疾病基因识别的基于网络信息的自适应阳性无标签学习。

Bioinformatics. 2023 Feb 3;39(2). doi: 10.1093/bioinformatics/btac848.

Plant 3D (P3D): a plant phenotyping toolkit for 3D point clouds.Plant 3D (P3D)：一个用于 3D 点云的植物表型分析工具包。

Bioinformatics. 2020 Jun 1;36(12):3949-3950. doi: 10.1093/bioinformatics/btaa220.

Data-driven automated classification algorithms for acute health conditions: applying PheNorm to COVID-19 disease.用于急性健康状况的数据驱动自动分类算法：将PheNorm应用于COVID-19疾病

J Am Med Inform Assoc. 2024 Feb 16;31(3):574-582. doi: 10.1093/jamia/ocad241.

Boosting drug named entity recognition using an aggregate classifier.使用聚合分类器提升药物命名实体识别

Artif Intell Med. 2015 Oct;65(2):145-53. doi: 10.1016/j.artmed.2015.05.007. Epub 2015 Jun 17.

A maximum likelihood approach to electronic health record phenotyping using positive and unlabeled patients.一种使用阳性和未标记患者进行电子健康记录表型分析的最大似然方法。

J Am Med Inform Assoc. 2020 Jan 1;27(1):119-126. doi: 10.1093/jamia/ocz170.

Classifying injury narratives of large administrative databases for surveillance-A practical approach combining machine learning ensembles and human review.用于监测的大型行政数据库损伤叙述分类——一种结合机器学习集成和人工审核的实用方法。

Accid Anal Prev. 2017 Jan;98:359-371. doi: 10.1016/j.aap.2016.10.014. Epub 2016 Nov 15.

Learning statistical models of phenotypes using noisy labeled training data.使用带有噪声标签的训练数据学习表型的统计模型。

J Am Med Inform Assoc. 2016 Nov;23(6):1166-1173. doi: 10.1093/jamia/ocw028. Epub 2016 May 12.

Machine Learning Algorithm Helps Identify Non-Diagnosed Prodromal Alzheimer's Disease Patients in the General Population.机器学习算法有助于在普通人群中识别未被诊断的前驱期阿尔茨海默病患者。

J Prev Alzheimers Dis. 2019;6(3):185-191. doi: 10.14283/jpad.2019.10.

引用本文的文献

A deep learning derived prostate zonal volume-based biomarker from T2-weighted MRI to distinguish between prostate cancer and benign prostatic hyperplasia.一种基于深度学习从T2加权磁共振成像得出的前列腺带区体积生物标志物，用于区分前列腺癌和良性前列腺增生。

Med Phys. 2025 Aug;52(8):e18053. doi: 10.1002/mp.18053.

Weakly Semi-supervised phenotyping using Electronic Health records.基于电子健康记录的弱监督表型研究

J Biomed Inform. 2022 Oct;134:104175. doi: 10.1016/j.jbi.2022.104175. Epub 2022 Sep 5.

Phe2vec: Automated disease phenotyping based on unsupervised embeddings from electronic health records.Phe2vec：基于电子健康记录的无监督嵌入进行自动疾病表型分析。

Patterns (N Y). 2021 Sep 2;2(9):100337. doi: 10.1016/j.patter.2021.100337. eCollection 2021 Sep 10.

Generative transfer learning for measuring plausibility of EHR diagnosis records.基于生成式迁移学习的电子病历诊断记录可信度评估

J Am Med Inform Assoc. 2021 Mar 1;28(3):559-568. doi: 10.1093/jamia/ocaa215.

Transitive Sequencing Medical Records for Mining Predictive and Interpretable Temporal Representations.用于挖掘预测性和可解释性时间表征的传递性序列医疗记录

Patterns (N Y). 2020 Jul 10;1(4):100051. doi: 10.1016/j.patter.2020.100051. Epub 2020 Jun 18.

Cloud Services for Patient Cohort Identification Using the Informatics for Integrating Biology and the Bedside Platform.利用集成生物学和床边平台的信息学进行患者队列识别的云服务。

Biomed Res Int. 2020 Jul 7;2020:2851713. doi: 10.1155/2020/2851713. eCollection 2020.

本文引用的文献

Enabling phenotypic big data with PheNorm.利用 PheNorm 实现表型大数据。

J Am Med Inform Assoc. 2018 Jan 1;25(1):54-60. doi: 10.1093/jamia/ocx111.

Applying deep neural networks to unstructured text notes in electronic medical records for phenotyping youth depression.应用深度神经网络对电子病历中的非结构化文本记录进行青年抑郁表型分析。

Evid Based Ment Health. 2017 Aug;20(3):83-87. doi: 10.1136/eb-2017-102688. Epub 2017 Jul 24.

Pragmatic (trial) informatics: a perspective from the NIH Health Care Systems Research Collaboratory.实用（试验）信息学：来自美国国立卫生研究院医疗保健系统研究协作实验室的视角

J Am Med Inform Assoc. 2017 Sep 1;24(5):996-1001. doi: 10.1093/jamia/ocx016.

Development of Type 2 Diabetes Mellitus Phenotyping Framework Using Expert Knowledge and Machine Learning Approach.利用专家知识和机器学习方法开发2型糖尿病表型分析框架

J Diabetes Sci Technol. 2017 Jul;11(4):791-799. doi: 10.1177/1932296816681584. Epub 2016 Dec 7.

Surrogate-assisted feature extraction for high-throughput phenotyping.用于高通量表型分析的代理辅助特征提取

J Am Med Inform Assoc. 2017 Apr 1;24(e1):e143-e149. doi: 10.1093/jamia/ocw135.

Evaluating electronic health record data sources and algorithmic approaches to identify hypertensive individuals.评估电子健康记录数据源及识别高血压个体的算法方法。

J Am Med Inform Assoc. 2017 Jan;24(1):162-171. doi: 10.1093/jamia/ocw071. Epub 2016 Aug 7.

Learning statistical models of phenotypes using noisy labeled training data.使用带有噪声标签的训练数据学习表型的统计模型。

J Am Med Inform Assoc. 2016 Nov;23(6):1166-1173. doi: 10.1093/jamia/ocw028. Epub 2016 May 12.

Review and evaluation of electronic health records-driven phenotype algorithm authoring tools for clinical and translational research.用于临床和转化研究的电子健康记录驱动的表型算法创作工具的综述与评估

J Am Med Inform Assoc. 2015 Nov;22(6):1251-60. doi: 10.1093/jamia/ocv070. Epub 2015 Jul 29.

Toward high-throughput phenotyping: unbiased automated feature extraction and selection from knowledge sources.迈向高通量表型分析：从知识源中进行无偏自动特征提取与选择。

J Am Med Inform Assoc. 2015 Sep;22(5):993-1000. doi: 10.1093/jamia/ocv034. Epub 2015 Apr 29.

Emerging uses of patient generated health data in clinical research.患者生成的健康数据在临床研究中的新兴用途。

Mol Oncol. 2015 May;9(5):1018-24. doi: 10.1016/j.molonc.2014.08.006. Epub 2014 Aug 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

极性标记：用于训练疾病分类器的银标准算法。

Polar labeling: silver standard algorithm for training disease classifiers.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实施

补充信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献