Zhelev Zhivko, Walker Greg, Henschke Nicholas, Fridhandler Jonathan, Yip Samuel
NIHR CLAHRC South West Peninsula (PenCLAHRC), University of Exeter Medical School, University of Exeter, St Luke's Campus, South Cloisters (Room 3.09), Exeter, Devon, UK, EX1 2LU.
Cochrane Database Syst Rev. 2019 Apr 9;4(4):CD011427. doi: 10.1002/14651858.CD011427.pub2.
Rapid and accurate detection of stroke by paramedics or other emergency clinicians at the time of first contact is crucial for timely initiation of appropriate treatment. Several stroke recognition scales have been developed to support the initial triage. However, their accuracy remains uncertain and there is no agreement which of the scales perform better.
To systematically identify and review the evidence pertaining to the test accuracy of validated stroke recognition scales, as used in a prehospital or emergency room (ER) setting to screen people suspected of having stroke.
We searched CENTRAL, MEDLINE (Ovid), Embase (Ovid) and the Science Citation Index to 30 January 2018. We handsearched the reference lists of all included studies and other relevant publications and contacted experts in the field to identify additional studies or unpublished data.
We included studies evaluating the accuracy of stroke recognition scales used in a prehospital or ER setting to identify stroke and transient Ischemic attack (TIA) in people suspected of stroke. The scales had to be applied to actual people and the results compared to a final diagnosis of stroke or TIA. We excluded studies that applied scales to patient records; enrolled only screen-positive participants and without complete 2 × 2 data.
Two review authors independently conducted a two-stage screening of all publications identified by the searches, extracted data and assessed the methodologic quality of the included studies using a tailored version of QUADAS-2. A third review author acted as an arbiter. We recalculated study-level sensitivity and specificity with 95% confidence intervals (CI), and presented them in forest plots and in the receiver operating characteristics (ROC) space. When a sufficient number of studies reported the accuracy of the test in the same setting (prehospital or ER) and the level of heterogeneity was relatively low, we pooled the results using the bivariate random-effects model. We plotted the results in the summary ROC (SROC) space presenting an estimate point (mean sensitivity and specificity) with 95% CI and prediction regions. Because of the small number of studies, we did not conduct meta-regression to investigate between-study heterogeneity and the relative accuracy of the scales. Instead, we summarized the results in tables and diagrams, and presented our findings narratively.
We selected 23 studies for inclusion (22 journal articles and one conference abstract). We evaluated the following scales: Cincinnati Prehospital Stroke Scale (CPSS; 11 studies), Recognition of Stroke in the Emergency Room (ROSIER; eight studies), Face Arm Speech Time (FAST; five studies), Los Angeles Prehospital Stroke Scale (LAPSS; five studies), Melbourne Ambulance Stroke Scale (MASS; three studies), Ontario Prehospital Stroke Screening Tool (OPSST; one study), Medic Prehospital Assessment for Code Stroke (MedPACS; one study) and PreHospital Ambulance Stroke Test (PreHAST; one study). Nine studies compared the accuracy of two or more scales. We considered 12 studies at high risk of bias and one with applicability concerns in the patient selection domain; 14 at unclear risk of bias and one with applicability concerns in the reference standard domain; and the risk of bias in the flow and timing domain was high in one study and unclear in another 16.We pooled the results from five studies evaluating ROSIER in the ER and five studies evaluating LAPSS in a prehospital setting. The studies included in the meta-analysis of ROSIER were of relatively good methodologic quality and produced a summary sensitivity of 0.88 (95% CI 0.84 to 0.91), with the prediction interval ranging from approximately 0.75 to 0.95. This means that the test will miss on average 12% of people with stroke/TIA which, depending on the circumstances, could range from 5% to 25%. We could not obtain a reliable summary estimate of specificity due to extreme heterogeneity in study-level results. The summary sensitivity of LAPSS was 0.83 (95% CI 0.75 to 0.89) and summary specificity 0.93 (95% CI 0.88 to 0.96). However, we were uncertain in the validity of these results as four of the studies were at high and one at uncertain risk of bias. We did not report summary estimates for the rest of the scales, as the number of studies per test per setting was small, the risk of bias was high or uncertain, the results were highly heterogenous, or a combination of these.Studies comparing two or more scales in the same participants reported that ROSIER and FAST had similar accuracy when used in the ER. In the field, CPSS was more sensitive than MedPACS and LAPSS, but had similar sensitivity to that of MASS; and MASS was more sensitive than LAPSS. In contrast, MASS, ROSIER and MedPACS were more specific than CPSS; and the difference in the specificities of MASS and LAPSS was not statistically significant.
AUTHORS' CONCLUSIONS: In the field, CPSS had consistently the highest sensitivity and, therefore, should be preferred to other scales. Further evidence is needed to determine its absolute accuracy and whether alternatives scales, such as MASS and ROSIER, which might have comparable sensitivity but higher specificity, should be used instead, to achieve better overall accuracy. In the ER, ROSIER should be the test of choice, as it was evaluated in more studies than FAST and showed consistently high sensitivity. In a cohort of 100 people of whom 62 have stroke/TIA, the test will miss on average seven people with stroke/TIA (ranging from three to 16). We were unable to obtain an estimate of its summary specificity. Because of the small number of studies per test per setting, high risk of bias, substantial differences in study characteristics and large between-study heterogeneity, these findings should be treated as provisional hypotheses that need further verification in better-designed studies.
护理人员或其他急诊临床医生在首次接触时快速准确地检测中风对于及时开始适当治疗至关重要。已经开发了几种中风识别量表来支持初始分诊。然而,它们的准确性仍然不确定,并且对于哪种量表表现更好没有一致意见。
系统地识别和综述与经过验证的中风识别量表的测试准确性相关的证据,这些量表用于院前或急诊室(ER)环境中筛查疑似中风的人群。
我们检索了截至2018年1月30日的CENTRAL、MEDLINE(Ovid)、Embase(Ovid)和科学引文索引。我们手工检索了所有纳入研究和其他相关出版物的参考文献列表,并联系了该领域的专家以识别其他研究或未发表的数据。
我们纳入了评估用于院前或急诊室环境中识别疑似中风人群中的中风和短暂性脑缺血发作(TIA)的中风识别量表准确性的研究。这些量表必须应用于实际人群,并将结果与中风或TIA的最终诊断进行比较。我们排除了将量表应用于患者记录的研究;仅纳入筛查阳性参与者且没有完整的2×2数据的研究。
两位综述作者独立对检索到的所有出版物进行两阶段筛选,提取数据,并使用定制版的QUADAS - 2评估纳入研究的方法学质量。第三位综述作者担任仲裁者。我们重新计算了研究水平的敏感性和特异性以及95%置信区间(CI),并将它们呈现在森林图和受试者工作特征(ROC)空间中。当有足够数量的研究报告了在相同环境(院前或急诊室)中测试的准确性且异质性水平相对较低时,我们使用双变量随机效应模型汇总结果。我们将结果绘制在汇总ROC(SROC)空间中,呈现一个带有95%CI的估计点(平均敏感性和特异性)以及预测区域。由于研究数量较少,我们未进行Meta回归以研究研究间异质性和量表的相对准确性。相反,我们在表格和图表中总结了结果,并以叙述方式呈现我们的发现。
我们选择了23项研究纳入(22篇期刊文章和1篇会议摘要)。我们评估了以下量表:辛辛那提前院中风量表(CPSS;11项研究)、急诊室中风识别量表(ROSIER;8项研究)、面臂言语时间量表(FAST;5项研究)、洛杉矶院前中风量表(LAPSS;5项研究)、墨尔本救护车中风量表(MASS;3项研究)、安大略院前中风筛查工具(OPSST;1项研究)、Medic院前中风评估量表(MedPACS;1项研究)和院前救护车中风测试量表(PreHAST;1项研究)。9项研究比较了两种或更多量表的准确性。我们认为12项研究存在高偏倚风险,1项在患者选择领域存在适用性问题;14项偏倚风险不明确,1项在参考标准领域存在适用性问题;在流程和时间领域,1项研究的偏倚风险高,另外16项不明确。我们汇总了5项在急诊室评估ROSIER的研究结果和5项在院前环境评估LAPSS的研究结果。纳入ROSIER荟萃分析的研究方法学质量相对较好,汇总敏感性为0.88(95%CI 0.84至0.91),预测区间约为0.75至0.95。这意味着该测试平均会漏诊12%的中风/TIA患者,具体漏诊比例根据情况可能在5%至25%之间。由于研究水平结果存在极端异质性,我们无法获得可靠的汇总特异性估计值。LAPSS的汇总敏感性为0.83(95%CI 0.75至0.89),汇总特异性为0.93(95%CI 0.88至0.96)。然而,我们对这些结果的有效性不确定,因为其中4项研究存在高偏倚风险,1项存在不确定的偏倚风险。我们未报告其余量表的汇总估计值,因为每个测试在每个环境中的研究数量较少、偏倚风险高或不确定、结果高度异质,或存在这些情况的组合。在同一参与者中比较两种或更多量表的研究报告称,在急诊室使用时ROSIER和FAST的准确性相似。在现场,CPSS比MedPACS和LAPSS更敏感,但与MASS的敏感性相似;MASS比LAPSS更敏感。相比之下,MASS、ROSIER和MedPACS比CPSS更具特异性;MASS和LAPSS特异性的差异无统计学意义。
在现场,CPSS始终具有最高的敏感性,因此应优先于其他量表。需要进一步的证据来确定其绝对准确性,以及是否应使用可能具有可比敏感性但更高特异性的替代量表,如MASS和ROSIER,以实现更好的总体准确性。在急诊室,ROSIER应作为首选测试,因为对其评估的研究比FAST更多,且始终显示出高敏感性。在一组100人中,其中62人患有中风/TIA,该测试平均会漏诊约7名中风/TIA患者(范围为3至16名)。我们无法获得其汇总特异性的估计值。由于每个测试在每个环境中的研究数量较少、高偏倚风险、研究特征存在实质性差异以及研究间异质性大,这些发现应被视为需要在设计更好的研究中进一步验证的临时假设。