Wang Andrew, Fulton Rachel, Hwang Sy, Margolis David J, Mowery Danielle L
Department of Computer and Information Science, School of Engineering and Applied Sciences, University of Pennsylvania, Philadelphia, PA.
Lankenau Medical Center, Dermatology Services, Wynnewood, PA.
medRxiv. 2023 Dec 4:2023.08.25.23294636. doi: 10.1101/2023.08.25.23294636.
Atopic dermatitis (AD) is a chronic skin condition that millions of people around the world live with each day. Performing research studies into identifying the causes and treatment for this disease has great potential to provide benefit for these individuals. However, AD clinical trial recruitment is a non-trivial task due to variance in diagnostic precision and phenotypic definitions leveraged by different clinicians as well as time spent finding, recruiting, and enrolling patients by clinicians to become study subjects. Thus, there is a need for automatic and effective patient phenotyping for cohort recruitment.
Our study aims to present an approach for identifying patients whose electronic health records suggest that they may have AD.
We created a vectorized representation of each patient and trained various supervised machine learning methods to classify when a patient has AD. Each patient is represented by a vector of either probabilities or binary values where each value indicates whether they meet a different criteria for AD diagnosis.
The most accurate AD classifier performed with a class-balanced accuracy of 0.8036, a precision of 0.8400, and a recall of 0.7500 when using XGBoost (Extreme Gradient Boosting).
Creating an automated approach for identifying patient cohorts has the potential to accelerate, standardize, and automate the process of patient recruitment for AD studies; therefore, reducing clinician burden and informing knowledge discovery of better treatment options for AD.
特应性皮炎(AD)是一种慢性皮肤病,全球数百万人每天都受其困扰。开展相关研究以确定该疾病的病因和治疗方法,极有可能为这些患者带来益处。然而,由于不同临床医生所采用的诊断精度和表型定义存在差异,以及临床医生在寻找、招募和纳入患者成为研究对象方面所花费的时间,AD临床试验的招募工作并非易事。因此,需要一种自动且有效的患者表型分析方法来进行队列招募。
我们的研究旨在提出一种方法,用于识别那些电子健康记录显示可能患有AD的患者。
我们为每位患者创建了一个矢量化表示,并训练了各种监督式机器学习方法来对患者是否患有AD进行分类。每位患者由一个概率向量或二进制值向量表示,其中每个值表示他们是否符合AD诊断的不同标准。
在使用XGBoost(极端梯度提升)时,最准确的AD分类器的类平衡准确率为0.8036,精确率为0.8400,召回率为0.7500。
创建一种自动识别患者队列的方法,有可能加速、标准化和自动化AD研究的患者招募过程;因此,减轻临床医生的负担,并为发现更好的AD治疗方案提供知识支持。