Suppr超能文献

一种条件多标签模型,用于提高罕见结局预测的准确性:以预测自闭症诊断为例。

A conditional multi-label model to improve prediction of a rare outcome: An illustration predicting autism diagnosis.

机构信息

Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, NC, USA; AI Health, Duke University School of Medicine, Durham, North Carolina, USA.

Department of Psychiatry, Duke University School of Medicine, Durham, NC, USA.

出版信息

J Biomed Inform. 2024 Sep;157:104711. doi: 10.1016/j.jbi.2024.104711. Epub 2024 Aug 30.

Abstract

OBJECTIVE

This study aimed to develop a novel approach using routinely collected electronic health records (EHRs) data to improve the prediction of a rare event. We illustrated this using an example of improving early prediction of an autism diagnosis, given its low prevalence, by leveraging correlations between autism and other neurodevelopmental conditions (NDCs).

METHODS

To achieve this, we introduced a conditional multi-label model by merging conditional learning and multi-label methodologies. The conditional learning approach breaks a hard task into more manageable pieces in each stage, and the multi-label approach utilizes information from related neurodevelopmental conditions to learn predictive latent features. The study involved forecasting autism diagnosis by age 5.5 years, utilizing data from the first 18 months of life, and the analysis of feature importance correlations to explore the alignment within the feature space across different conditions.

RESULTS

Upon analysis of health records from 18,156 children, we are able to generate a model that predicts a future autism diagnosis with moderate performance (AUROC=0.76). The proposed conditional multi-label method significantly improves predictive performance with an AUROC of 0.80 (p < 0.001). Further examination shows that both the conditional and multi-label approach alone provided marginal lift to the model performance compared to a one-stage one-label approach. We also demonstrated the generalizability and applicability of this method using simulated data with high correlation between feature vectors for different labels.

CONCLUSION

Our findings underscore the effectiveness of the developed conditional multi-label model for early prediction of an autism diagnosis. The study introduces a versatile strategy applicable to prediction tasks involving limited target populations but sharing underlying features or etiology among related groups.

摘要

目的

本研究旨在开发一种新方法,利用常规收集的电子健康记录(EHR)数据来改善罕见事件的预测。我们通过利用自闭症与其他神经发育障碍(NDC)之间的相关性来提高早期自闭症诊断预测的示例来说明这一点。

方法

为了实现这一目标,我们通过合并条件学习和多标签方法引入了一种条件多标签模型。条件学习方法在每个阶段将困难任务分解为更易于管理的部分,而多标签方法利用相关神经发育障碍的信息来学习预测潜在特征。该研究涉及通过 5.5 岁之前的年龄预测自闭症诊断,利用生命头 18 个月的数据,并分析特征重要性相关性,以探索不同条件下特征空间内的一致性。

结果

在分析了来自 18156 名儿童的健康记录后,我们能够生成一个具有中等性能(AUROC=0.76)的预测未来自闭症诊断的模型。所提出的条件多标签方法显著提高了预测性能,AUROC 为 0.80(p<0.001)。进一步的检查表明,与单一阶段单一标签方法相比,条件和多标签方法本身对模型性能的提升都很有限。我们还使用具有不同标签特征向量之间高相关性的模拟数据证明了这种方法的通用性和适用性。

结论

我们的研究结果强调了所开发的条件多标签模型在早期自闭症诊断预测中的有效性。该研究提出了一种灵活的策略,适用于涉及有限目标人群但在相关群体中具有潜在特征或病因的预测任务。

相似文献

本文引用的文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验