Leeds Institute for Data Analytics, University of Leeds, Leeds, UK
Leeds Institute of Cardiovascular and Metabolic Medicine, University of Leeds, Leeds, UK.
BMJ Open. 2021 Nov 2;11(11):e052887. doi: 10.1136/bmjopen-2021-052887.
Atrial fibrillation (AF) is a major cardiovascular health problem: it is common, chronic and incurs substantial healthcare expenditure because of stroke. Oral anticoagulation reduces the risk of thromboembolic stroke in those at higher risk; but for a number of patients, stroke is the first manifestation of undetected AF. There is a rationale for the early diagnosis of AF, before the first complication occurs, but population-based screening is not recommended. Previous prediction models have been limited by their data sources and methodologies. An accurate model that uses existing routinely collected data is needed to inform clinicians of patient-level risk of AF, inform national screening policy and highlight predictors that may be amenable to primary prevention.
We will investigate the application of a range of deep learning techniques, including an adapted convolutional neural network, recurrent neural network and Transformer, on routinely collected primary care data to create a personalised model predicting the risk of new-onset AF over a range of time periods. The Clinical Practice Research Datalink (CPRD)-GOLD dataset will be used for derivation, and the CPRD-AURUM dataset will be used for external geographical validation. Both comprise a sizeable representative population and are linked at patient-level to secondary care databases. The performance of the deep learning models will be compared against classic machine learning and traditional statistical predictive modelling methods. We will only use risk factors accessible in primary care and endow the model with the ability to update risk prediction as it is presented with new data, to make the model more useful in clinical practice.
Permissions for CPRD-GOLD and CPRD-AURUM datasets were obtained from CPRD (ref no: 19_076). The CPRD ethical approval committee approved the study. The results will be submitted as a research paper for publication to a peer-reviewed journal and presented at peer-reviewed conferences.
A systematic review to incorporate within the overall project was registered on PROSPERO (registration number CRD42021245093). The study was registered on ClinicalTrials.gov (NCT04657900).
心房颤动(AF)是一个主要的心血管健康问题:它很常见,是慢性的,并由于中风而导致大量的医疗保健支出。口服抗凝剂可降低高危人群血栓栓塞性中风的风险;但对于许多患者来说,中风是未发现的 AF 的首次表现。在第一次并发症发生之前,对 AF 进行早期诊断是有道理的,但不建议进行人群筛查。以前的预测模型受到其数据源和方法的限制。需要一个使用现有常规收集数据的准确模型,以便向临床医生提供患者发生 AF 的风险信息,为国家筛查政策提供信息,并突出可能易于进行一级预防的预测因子。
我们将研究一系列深度学习技术的应用,包括经过改进的卷积神经网络、循环神经网络和 Transformer,以常规收集的初级保健数据为基础,创建一个预测新发性 AF 风险的个性化模型,涵盖一系列时间段。CPRD-GOLD 数据集将用于推导,而 CPRD-AURUM 数据集将用于外部地理验证。这两个数据集都包含相当大的代表性人群,并在患者层面与二级保健数据库相关联。深度学习模型的性能将与经典机器学习和传统统计预测建模方法进行比较。我们将只使用初级保健中可获得的风险因素,并赋予模型随着新数据的出现更新风险预测的能力,以使模型在临床实践中更有用。
CPRD-GOLD 和 CPRD-AURUM 数据集的使用权限已从 CPRD 获得(编号:19_076)。CPRD 伦理委员会批准了该研究。研究结果将作为研究论文提交给同行评议期刊发表,并在同行评议会议上展示。
该项目整体包含的系统评价已在 PROSPERO(注册号:CRD42021245093)上注册。该研究已在 ClinicalTrials.gov 上注册(NCT04657900)。