Department of Bioengineering, Stanford University, Stanford, CA, USA.
Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
J Am Med Inform Assoc. 2023 Jan 18;30(2):245-255. doi: 10.1093/jamia/ocac226.
For the UK Biobank, standardized phenotype codes are associated with patients who have been hospitalized but are missing for many patients who have been treated exclusively in an outpatient setting. We describe a method for phenotype recognition that imputes phenotype codes for all UK Biobank participants.
POPDx (Population-based Objective Phenotyping by Deep Extrapolation) is a bilinear machine learning framework for simultaneously estimating the probabilities of 1538 phenotype codes. We extracted phenotypic and health-related information of 392 246 individuals from the UK Biobank for POPDx development and evaluation. A total of 12 803 ICD-10 diagnosis codes of the patients were converted to 1538 phecodes as gold standard labels. The POPDx framework was evaluated and compared to other available methods on automated multiphenotype recognition.
POPDx can predict phenotypes that are rare or even unobserved in training. We demonstrate substantial improvement of automated multiphenotype recognition across 22 disease categories, and its application in identifying key epidemiological features associated with each phenotype.
POPDx helps provide well-defined cohorts for downstream studies. It is a general-purpose method that can be applied to other biobanks with diverse but incomplete data.
对于英国生物银行(UK Biobank),标准化的表型代码与已住院的患者相关联,但对于许多仅在门诊治疗的患者来说,这些代码是缺失的。我们描述了一种用于表型识别的方法,该方法可以为所有 UK Biobank 参与者推断表型代码。
POPDx(基于人群的通过深度外推进行客观表型识别)是一种双线性机器学习框架,用于同时估计 1538 种表型代码的概率。我们从 UK Biobank 中提取了 392246 个人的表型和与健康相关的信息,用于 POPDx 的开发和评估。患者的总共 12803 个 ICD-10 诊断代码被转换为 1538 个 phecodes 作为金标准标签。我们评估了 POPDx 框架,并将其与其他可用的自动多表型识别方法进行了比较。
POPDx 可以预测在训练中罕见甚至未观察到的表型。我们在 22 种疾病类别中证明了自动多表型识别的显著改进,以及它在识别与每种表型相关的关键流行病学特征方面的应用。
POPDx 有助于为下游研究提供明确界定的队列。它是一种通用方法,可应用于具有不同但不完整数据的其他生物库。