Cox Andrew Paul, Raluy-Callado Mireia, Wang Meng, Bakheit Abdel Magid, Moore Austen Peter, Dinet Jerome
Evidera, Metro Building, 6th Floor, 1 Butterwick, London W6 8DL, United Kingdom.
Evidera, Metro Building, 6th Floor, 1 Butterwick, London W6 8DL, United Kingdom.
J Biomed Inform. 2016 Apr;60:328-33. doi: 10.1016/j.jbi.2016.02.012. Epub 2016 Feb 27.
Spasticity is one of the well-recognized complications of stroke which may give rise to pain and limit patients' ability to perform daily activities. The predisposing factors and direct effects of post-stroke spasticity also involve high management costs in terms of healthcare resources, and case-control designs are required for establishing such differences. Using 'The Health Improvement Network' (THIN) database, such a study would not provide reliable estimates since the prevalence of post-stroke spasticity was found to be 2%, substantially below the most conservative previously reported estimates. The objective of this study was to use predictive analysis techniques to determine if there are a substantial number of potentially under-recorded patients with post-stroke spasticity.
This study used retrospective data from adult patients with a diagnostic code for stroke between 2007 and 2011 registered in THIN. Two algorithm approaches were developed and compared, a statistically validated data-trained algorithm and a clinician-trained algorithm.
A data-trained algorithm using Random Forest showed better prediction performance than clinician-trained algorithm, with higher sensitivity and only marginally lower specificity. Overall accuracy was 75% and 72%, respectively. The data-trained algorithm predicted an additional 3912 records consistent with patients developing spasticity in the 12months following a stroke.
Using machine learning techniques, additional unrecorded post-stroke spasticity patients were identified, increasing the condition's prevalence in THIN from 2% to 13%. This work shows the potential for under-reporting of PSS in primary care data, and provides a method for improved identification of cases and control records for future studies.
痉挛是脑卒中公认的并发症之一,可能导致疼痛并限制患者进行日常活动的能力。脑卒中后痉挛的诱发因素和直接影响在医疗资源方面也涉及高昂的管理成本,建立此类差异需要病例对照设计。使用“健康改善网络”(THIN)数据库进行此类研究无法提供可靠的估计,因为发现脑卒中后痉挛的患病率为2%,远低于此前报道的最保守估计。本研究的目的是使用预测分析技术来确定是否存在大量可能未被记录的脑卒中后痉挛患者。
本研究使用了2007年至2011年在THIN注册的患有脑卒中诊断代码的成年患者的回顾性数据。开发并比较了两种算法方法,一种是经过统计验证的数据训练算法和一种临床医生训练算法。
使用随机森林的数据训练算法显示出比临床医生训练算法更好的预测性能,具有更高的敏感性且特异性仅略低。总体准确率分别为75%和72%。数据训练算法预测了另外3912条与脑卒中后12个月内发生痉挛的患者一致的记录。
使用机器学习技术,识别出了额外的未记录的脑卒中后痉挛患者,使THIN中该疾病的患病率从2%提高到了13%。这项工作显示了初级保健数据中可能存在脑卒中后痉挛报告不足的情况,并提供了一种方法,可用于改进未来研究中病例和对照记录的识别。