Kiely David G, Doyle Orla, Drage Edmund, Jenner Harvey, Salvatelli Valentina, Daniels Flora A, Rigg John, Schmitt Claude, Samyshkin Yevgeniy, Lawrie Allan, Bergemann Rito
Sheffield Pulmonary Vascular Disease Unit, Royal Hallamshire Hospital, Sheffield, UK.
Department of Infection, Immunity & Cardiovascular Disease, University of Sheffield, Sheffield, UK.
Pulm Circ. 2019 Nov 20;9(4):2045894019890549. doi: 10.1177/2045894019890549. eCollection 2019 Oct-Dec.
Idiopathic pulmonary arterial hypertension is a rare and life-shortening condition often diagnosed at an advanced stage. Despite increased awareness, the delay to diagnosis remains unchanged. This study explores whether a predictive model based on healthcare resource utilisation can be used to screen large populations to identify patients at high risk of idiopathic pulmonary arterial hypertension. Hospital Episode Statistics from the National Health Service in England, providing close to full national coverage, were used as a measure of healthcare resource utilisation. Data for patients with idiopathic pulmonary arterial hypertension from the National Pulmonary Hypertension Service in Sheffield were linked to pre-diagnosis Hospital Episode Statistics records. A non-idiopathic pulmonary arterial hypertension control cohort was selected from the Hospital Episode Statistics population. Patient history was limited to ≤5 years pre-diagnosis. Information on demographics, timing/frequency of diagnoses, medical specialities visited and procedures undertaken was captured. For modelling, a bagged gradient boosting trees algorithm was used to discriminate between cohorts. Between 2008 and 2016, 709 patients with idiopathic pulmonary arterial hypertension were identified and compared with a stratified cohort of 2,812,458 patients classified as non-idiopathic pulmonary arterial hypertension with ≥1 ICD-10 coded diagnosis of relevance to idiopathic pulmonary arterial hypertension. A predictive model was developed and validated using cross-validation. The timing and frequency of the clinical speciality seen, secondary diagnoses and age were key variables driving the algorithm's performance. To identify the 100 patients at highest risk of idiopathic pulmonary arterial hypertension, 969 patients would need to be screened with a specificity of 99.99% and sensitivity of 14.10% based on a prevalence of 5.5/million. The positive predictive and negative predictive values were 10.32% and 99.99%, respectively. This study highlights the potential application of artificial intelligence to readily available real-world data to screen for rare diseases such as idiopathic pulmonary arterial hypertension. This algorithm could provide low-cost screening at a population level, facilitating earlier diagnosis, improved diagnostic rates and patient outcomes. Studies to further validate this approach are warranted.
特发性肺动脉高压是一种罕见且会缩短寿命的疾病,通常在晚期才被诊断出来。尽管人们的认识有所提高,但诊断延迟情况仍未改变。本研究探讨基于医疗资源利用情况的预测模型是否可用于筛查大量人群,以识别特发性肺动脉高压的高危患者。来自英国国民医疗服务体系的医院事件统计数据提供了近乎全国范围的覆盖,被用作衡量医疗资源利用情况的指标。谢菲尔德国家肺动脉高压服务中心的特发性肺动脉高压患者数据与诊断前的医院事件统计记录相关联。从医院事件统计人群中选取了非特发性肺动脉高压对照队列。患者病史限于诊断前≤5年。收集了有关人口统计学、诊断时间/频率、就诊科室和所进行的检查的信息。为进行建模,使用袋装梯度提升树算法区分队列。在2008年至2016年期间,识别出709例特发性肺动脉高压患者,并与2812458例分类为非特发性肺动脉高压且有≥1个与特发性肺动脉高压相关的ICD - 10编码诊断的分层队列患者进行比较。使用交叉验证开发并验证了一个预测模型。就诊临床科室的时间和频率、二级诊断和年龄是驱动该算法性能的关键变量。要识别出特发性肺动脉高压风险最高的100名患者,基于每百万5.5的患病率,需要对969名患者进行筛查,特异性为99.99%,敏感性为14.10%。阳性预测值和阴性预测值分别为10.32%和99.99%。本研究强调了人工智能在现成的真实世界数据中筛查特发性肺动脉高压等罕见疾病的潜在应用。该算法可在人群层面提供低成本筛查,有助于早期诊断、提高诊断率和改善患者预后。有必要开展进一步验证这种方法的研究。