Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
TriNetX, LLC, Cambridge, MA, 02140, USA.
EBioMedicine. 2023 Dec;98:104888. doi: 10.1016/j.ebiom.2023.104888. Epub 2023 Nov 25.
Pancreatic Duct Adenocarcinoma (PDAC) screening can enable early-stage disease detection and long-term survival. Current guidelines use inherited predisposition, with about 10% of PDAC cases eligible for screening. Using Electronic Health Record (EHR) data from a multi-institutional federated network, we developed and validated a PDAC RISk Model (Prism) for the general US population to extend early PDAC detection.
Neural Network (PrismNN) and Logistic Regression (PrismLR) were developed using EHR data from 55 US Health Care Organisations (HCOs) to predict PDAC risk 6-18 months before diagnosis for patients 40 years or older. Model performance was assessed using Area Under the Curve (AUC) and calibration plots. Models were internal-externally validated by geographic location, race, and time. Simulated model deployment evaluated Standardised Incidence Ratio (SIR) and other metrics.
With 35,387 PDAC cases, 1,500,081 controls, and 87 features per patient, PrismNN obtained a test AUC of 0.826 (95% CI: 0.824-0.828) (PrismLR: 0.800 (95% CI: 0.798-0.802)). PrismNN's average internal-external validation AUCs were 0.740 for locations, 0.828 for races, and 0.789 (95% CI: 0.762-0.816) for time. At SIR = 5.10 (exceeding the current screening inclusion threshold) in simulated model deployment, PrismNN sensitivity was 35.9% (specificity 95.3%).
Prism models demonstrated good accuracy and generalizability across diverse populations. PrismNN could find 3.5 times more cases at comparable risk than current screening guidelines. The small number of features provided a basis for model interpretation. Integration with the federated network provided data from a large, heterogeneous patient population and a pathway to future clinical deployment.
Prevent Cancer Foundation, TriNetX, Boeing, DARPA, NSF, and Aarno Labs.
胰腺导管腺癌 (PDAC) 筛查可以实现疾病的早期发现和长期生存。目前的指南使用遗传易感性,约有 10%的 PDAC 病例符合筛查条件。我们利用来自多机构联邦网络的电子健康记录 (EHR) 数据,为一般美国人群开发并验证了 PDAC RISk 模型 (Prism),以扩大早期 PDAC 的检测范围。
使用来自 55 个美国医疗保健组织 (HCO) 的 EHR 数据,使用神经网络 (PrismNN) 和逻辑回归 (PrismLR) 来预测 40 岁及以上患者诊断前 6-18 个月的 PDAC 风险。使用曲线下面积 (AUC) 和校准图评估模型性能。通过地理位置、种族和时间对模型进行内部-外部验证。模拟模型部署评估标准化发病比 (SIR) 和其他指标。
研究纳入了 35387 例 PDAC 病例、1500081 例对照和每位患者 87 个特征,PrismNN 的测试 AUC 为 0.826(95%CI:0.824-0.828)(PrismLR:0.800(95%CI:0.798-0.802))。PrismNN 的平均内部-外部验证 AUC 分别为地理位置 0.740、种族 0.828 和时间 0.789(95%CI:0.762-0.816)。在模拟模型部署中,当 SIR=5.10(超过当前筛查纳入标准)时,PrismNN 的敏感性为 35.9%(特异性 95.3%)。
Prism 模型在不同人群中表现出良好的准确性和泛化能力。PrismNN 可以在可比风险下发现 3.5 倍的病例,而不是当前的筛查指南。较少的特征为模型解释提供了基础。与联邦网络的整合提供了来自大型异质患者群体的数据,并为未来的临床部署提供了途径。
预防癌症基金会、TriNetX、波音公司、DARPA、NSF 和 Aarno Labs。