• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

PheValuator:表型算法评估器的开发与评估。

PheValuator: Development and evaluation of a phenotype algorithm evaluator.

机构信息

Janssen Research & Development, 920 Route 202, Raritan, NJ 08869, USA; OHDSI Collaborators, Observational Health Data Sciences and Informatics (OHDSI), 622 West 168th Street, PH-20, New York, NY 10032, USA.

OHDSI Collaborators, Observational Health Data Sciences and Informatics (OHDSI), 622 West 168th Street, PH-20, New York, NY 10032, USA; Columbia University, 622 West 168th Street, PH20, New York, NY 10032, USA.

出版信息

J Biomed Inform. 2019 Sep;97:103258. doi: 10.1016/j.jbi.2019.103258. Epub 2019 Jul 29.

DOI:10.1016/j.jbi.2019.103258
PMID:31369862
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7736922/
Abstract

BACKGROUND

The primary approach for defining disease in observational healthcare databases is to construct phenotype algorithms (PAs), rule-based heuristics predicated on the presence, absence, and temporal logic of clinical observations. However, a complete evaluation of PAs, i.e., determining sensitivity, specificity, and positive predictive value (PPV), is rarely performed. In this study, we propose a tool (PheValuator) to efficiently estimate a complete PA evaluation.

METHODS

We used 4 administrative claims datasets: OptumInsight's de-identified Clinformatics™ Datamart (Eden Prairie,MN); IBM MarketScan Multi-State Medicaid); IBM MarketScan Medicare Supplemental Beneficiaries; and IBM MarketScan Commercial Claims and Encounters from 2000 to 2017. Using PheValuator involves (1) creating a diagnostic predictive model for the phenotype, (2) applying the model to a large set of randomly selected subjects, and (3) comparing each subject's predicted probability for the phenotype to inclusion/exclusion in PAs. We used the predictions as a 'probabilistic gold standard' measure to classify positive/negative cases. We examined 4 phenotypes: myocardial infarction, cerebral infarction, chronic kidney disease, and atrial fibrillation. We examined several PAs for each phenotype including 1-time (1X) occurrence of the diagnosis code in the subject's record and 1-time occurrence of the diagnosis in an inpatient setting with the diagnosis code as the primary reason for admission (1X-IP-1stPos).

RESULTS

Across phenotypes, the 1X PA showed the highest sensitivity/lowest PPV among all PAs. 1X-IP-1stPos yielded the highest PPV/lowest sensitivity. Specificity was very high across algorithms. We found similar results between algorithms across datasets.

CONCLUSION

PheValuator appears to show promise as a tool to estimate PA performance characteristics.

摘要

背景

在观察性医疗保健数据库中定义疾病的主要方法是构建表型算法(PA),这是一种基于临床观察的存在、缺失和时间逻辑的基于规则的启发式方法。然而,很少对 PA 进行全面评估,即确定敏感性、特异性和阳性预测值(PPV)。在这项研究中,我们提出了一种工具(PheValuator)来有效地估计完整的 PA 评估。

方法

我们使用了 4 个管理索赔数据集:OptumInsight 的去标识 Clinformatics™Datamart(明尼苏达州伊登草原);IBM MarketScan 多州医疗补助;IBM MarketScan 医疗保险补充受益人和 IBM MarketScan 商业索赔和就诊记录,时间范围为 2000 年至 2017 年。使用 PheValuator 涉及(1)为表型创建诊断预测模型,(2)将模型应用于大量随机选择的受试者,以及(3)将每个受试者的表型预测概率与 PA 的纳入/排除进行比较。我们使用预测作为“概率金标准”测量来对阳性/阴性病例进行分类。我们研究了 4 种表型:心肌梗死、脑梗死、慢性肾脏病和心房颤动。我们为每种表型研究了几种 PA,包括在受试者记录中出现诊断代码 1 次(1X)和在住院环境中出现诊断代码 1 次,诊断代码是入院的主要原因(1X-IP-1stPos)。

结果

在所有 PA 中,1X PA 显示出最高的敏感性/最低的 PPV。1X-IP-1stPos 产生了最高的 PPV/最低的敏感性。特异性在所有算法中都非常高。我们在数据集之间的算法中发现了类似的结果。

结论

PheValuator 似乎是一种估计 PA 性能特征的有前途的工具。

相似文献

1
PheValuator: Development and evaluation of a phenotype algorithm evaluator.PheValuator:表型算法评估器的开发与评估。
J Biomed Inform. 2019 Sep;97:103258. doi: 10.1016/j.jbi.2019.103258. Epub 2019 Jul 29.
2
PheValuator 2.0: Methodological improvements for the PheValuator approach to semi-automated phenotype algorithm evaluation.PheValuator 2.0:用于半自动化表型算法评估的 PheValuator 方法的方法学改进。
J Biomed Inform. 2022 Nov;135:104177. doi: 10.1016/j.jbi.2022.104177. Epub 2022 Aug 19.
3
Performance characteristics of code-based algorithms to identify urinary tract infections in large United States administrative claims databases.基于代码算法在大型美国行政索赔数据库中识别尿路感染的性能特征。
Pharmacoepidemiol Drug Saf. 2022 Sep;31(9):953-962. doi: 10.1002/pds.5492. Epub 2022 Jul 4.
4
Comparing broad and narrow phenotype algorithms: differences in performance characteristics and immortal time incurred.比较宽表型和窄表型算法:性能特征差异和所导致的永恒时间。
J Pharm Pharm Sci. 2024 Jan 3;26:12095. doi: 10.3389/jpps.2023.12095. eCollection 2023.
5
A systematic review of validated methods for identifying patients with rheumatoid arthritis using administrative or claims data.类风湿关节炎患者的行政或索赔数据识别方法的系统评价。
Vaccine. 2013 Dec 30;31 Suppl 10:K41-61. doi: 10.1016/j.vaccine.2013.03.075.
6
Evaluation of code-based algorithms to identify pulmonary arterial hypertension and chronic thromboembolic pulmonary hypertension patients in large administrative databases.在大型管理数据库中评估基于编码的算法以识别肺动脉高压和慢性血栓栓塞性肺动脉高压患者。
Pulm Circ. 2020 Nov 10;10(4):2045894020961713. doi: 10.1177/2045894020961713. eCollection 2020 Oct-Dec.
7
Phenotype Algorithms to Identify Hidradenitis Suppurativa Using Real-World Data: Development and Validation Study.利用真实世界数据识别化脓性汗腺炎的表型算法:开发与验证研究
JMIR Dermatol. 2022 Nov 30;5(4):e38783. doi: 10.2196/38783.
8
Validation of an algorithm for identifying MS cases in administrative health claims datasets.验证一种在行政健康索赔数据集中识别 MS 病例的算法。
Neurology. 2019 Mar 5;92(10):e1016-e1028. doi: 10.1212/WNL.0000000000007043. Epub 2019 Feb 15.
9
Positive predictive value of ICD-10 codes for acute myocardial infarction in Japan: a validation study at a single center.日本国际疾病分类第十版(ICD-10)编码对急性心肌梗死的阳性预测价值:一项单中心验证研究
BMC Health Serv Res. 2018 Nov 26;18(1):895. doi: 10.1186/s12913-018-3727-0.
10
Chiari malformation Type I surgery in pediatric patients. Part 1: validation of an ICD-9-CM code search algorithm.小儿患者的Ⅰ型Chiari畸形手术。第1部分:ICD-9-CM编码搜索算法的验证。
J Neurosurg Pediatr. 2016 May;17(5):519-24. doi: 10.3171/2015.10.PEDS15370. Epub 2016 Jan 22.

引用本文的文献

1
Identification of Adult Dermatomyositis Patients Using Real-World Data Sources.利用真实世界数据源识别成人皮肌炎患者。
Arthritis Care Res (Hoboken). 2025 Aug 12. doi: 10.1002/acr.25625.
2
Multi-domain rule-based phenotyping algorithms enable improved GWAS signal.基于多领域规则的表型分析算法可增强全基因组关联研究(GWAS)信号。
NPJ Digit Med. 2025 Aug 2;8(1):499. doi: 10.1038/s41746-025-01815-8.
3
An automation framework for clinical codelist development validated with UK data from patients with multiple long-term conditions.一个用于临床代码列表开发的自动化框架,已通过来自患有多种长期疾病患者的英国数据进行验证。

本文引用的文献

1
Potential Biases in Machine Learning Algorithms Using Electronic Health Record Data.利用电子健康记录数据的机器学习算法中的潜在偏差。
JAMA Intern Med. 2018 Nov 1;178(11):1544-1547. doi: 10.1001/jamainternmed.2018.3763.
2
Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data.利用观察性医疗保健数据生成和评估患者水平预测模型的标准化框架的设计与实现。
J Am Med Inform Assoc. 2018 Aug 1;25(8):969-975. doi: 10.1093/jamia/ocy032.
3
Learning statistical models of phenotypes using noisy labeled training data.
BMC Med Res Methodol. 2025 May 24;25(1):138. doi: 10.1186/s12874-025-02541-1.
4
Evaluating the Bias, type I error and statistical power of the prior Knowledge-Guided integrated likelihood estimation (PIE) for bias reduction in EHR based association studies.评估用于减少基于电子健康记录(EHR)的关联研究中偏差的先验知识引导综合似然估计(PIE)的偏差、I型错误和统计功效。
J Biomed Inform. 2025 Mar;163:104787. doi: 10.1016/j.jbi.2025.104787. Epub 2025 Feb 2.
5
CohortDiagnostics: Phenotype evaluation across a network of observational data sources using population-level characterization.队列诊断:使用人群水平特征在观察性数据源网络中进行表型评估。
PLoS One. 2025 Jan 16;20(1):e0310634. doi: 10.1371/journal.pone.0310634. eCollection 2025.
6
The necessity of validity diagnostics when drawing causal inferences from observational data: lessons from a multi-database evaluation of the risk of non-infectious uveitis among patients exposed to Remicade.从观察性数据得出因果推断时进行有效性诊断的必要性:来自一项针对接受类克治疗的患者发生非感染性葡萄膜炎风险的多数据库评估的经验教训。
BMC Med Res Methodol. 2024 Dec 27;24(1):322. doi: 10.1186/s12874-024-02428-7.
7
Value sets and the problem of redundancy in value set repositories.值集与值集存储库中的冗余问题。
PLoS One. 2024 Dec 9;19(12):e0312289. doi: 10.1371/journal.pone.0312289. eCollection 2024.
8
Towards automated phenotype definition extraction using large language models.迈向使用大语言模型进行自动化表型定义提取
Genomics Inform. 2024 Oct 31;22(1):21. doi: 10.1186/s44342-024-00023-2.
9
Validating claims-based algorithms for a systemic lupus erythematosus diagnosis in Medicare data for informed use of the Lupus Index: a tool for geospatial research.验证医疗保险数据中基于索赔的系统性红斑狼疮诊断算法,以便明智地使用狼疮指数:一种用于地理空间研究的工具。
Lupus Sci Med. 2024 Oct 14;11(2):e001329. doi: 10.1136/lupus-2024-001329.
10
Adopting a Framework for Rapid Real-World Data Analyses in Safety Signal Assessment.采用框架快速进行真实世界数据安全性信号评估分析。
Ther Innov Regul Sci. 2024 Nov;58(6):1014-1022. doi: 10.1007/s43441-024-00694-7. Epub 2024 Sep 6.
使用带有噪声标签的训练数据学习表型的统计模型。
J Am Med Inform Assoc. 2016 Nov;23(6):1166-1173. doi: 10.1093/jamia/ocw028. Epub 2016 May 12.
4
PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability.PheKB:一个用于创建可移植电子表型算法的目录和工作流程。
J Am Med Inform Assoc. 2016 Nov;23(6):1046-1052. doi: 10.1093/jamia/ocv202. Epub 2016 Mar 28.
5
Validity of Diagnostic Codes for Acute Stroke in Administrative Databases: A Systematic Review.行政数据库中急性卒中诊断编码的有效性:一项系统评价。
PLoS One. 2015 Aug 20;10(8):e0135834. doi: 10.1371/journal.pone.0135834. eCollection 2015.
6
Use of electronic health records to ascertain, validate and phenotype acute myocardial infarction: A systematic review and recommendations.利用电子健康记录来确定、验证急性心肌梗死并对其进行表型分析:一项系统综述与建议
Int J Cardiol. 2015;187:705-11. doi: 10.1016/j.ijcard.2015.03.075. Epub 2015 Mar 5.
7
Development and validation of an electronic phenotyping algorithm for chronic kidney disease.慢性肾脏病电子表型分析算法的开发与验证
AMIA Annu Symp Proc. 2014 Nov 14;2014:907-16. eCollection 2014.
8
Chronic kidney disease care in the US safety net.美国安全网中的慢性肾脏病护理。
Adv Chronic Kidney Dis. 2015 Jan;22(1):66-73. doi: 10.1053/j.ackd.2014.05.006.
9
Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement.透明报告个体预后或诊断的多变量预测模型(TRIPOD):TRIPOD 声明。
Ann Intern Med. 2015 Jan 6;162(1):55-63. doi: 10.7326/M14-0697.
10
Massive parallelization of serial inference algorithms for a complex generalized linear model.用于复杂广义线性模型的串行推理算法的大规模并行化。
ACM Trans Model Comput Simul. 2013 Jan;23(1). doi: 10.1145/2414416.2414791.