Deeks Jonathan J, Dinnes Jacqueline, Takwoingi Yemisi, Davenport Clare, Spijker René, Taylor-Phillips Sian, Adriano Ada, Beese Sophie, Dretzke Janine, Ferrante di Ruffano Lavinia, Harris Isobel M, Price Malcolm J, Dittrich Sabine, Emperador Devy, Hooft Lotty, Leeflang Mariska Mg, Van den Bruel Ann
Test Evaluation Research Group, Institute of Applied Health Research, University of Birmingham, Birmingham, UK.
NIHR Birmingham Biomedical Research Centre, University Hospitals Birmingham NHS Foundation Trust and University of Birmingham, Birmingham, UK.
Cochrane Database Syst Rev. 2020 Jun 25;6(6):CD013652. doi: 10.1002/14651858.CD013652.
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus and resulting COVID-19 pandemic present important diagnostic challenges. Several diagnostic strategies are available to identify current infection, rule out infection, identify people in need of care escalation, or to test for past infection and immune response. Serology tests to detect the presence of antibodies to SARS-CoV-2 aim to identify previous SARS-CoV-2 infection, and may help to confirm the presence of current infection.
To assess the diagnostic accuracy of antibody tests to determine if a person presenting in the community or in primary or secondary care has SARS-CoV-2 infection, or has previously had SARS-CoV-2 infection, and the accuracy of antibody tests for use in seroprevalence surveys.
We undertook electronic searches in the Cochrane COVID-19 Study Register and the COVID-19 Living Evidence Database from the University of Bern, which is updated daily with published articles from PubMed and Embase and with preprints from medRxiv and bioRxiv. In addition, we checked repositories of COVID-19 publications. We did not apply any language restrictions. We conducted searches for this review iteration up to 27 April 2020.
We included test accuracy studies of any design that evaluated antibody tests (including enzyme-linked immunosorbent assays, chemiluminescence immunoassays, and lateral flow assays) in people suspected of current or previous SARS-CoV-2 infection, or where tests were used to screen for infection. We also included studies of people either known to have, or not to have SARS-CoV-2 infection. We included all reference standards to define the presence or absence of SARS-CoV-2 (including reverse transcription polymerase chain reaction tests (RT-PCR) and clinical diagnostic criteria).
We assessed possible bias and applicability of the studies using the QUADAS-2 tool. We extracted 2x2 contingency table data and present sensitivity and specificity for each antibody (or combination of antibodies) using paired forest plots. We pooled data using random-effects logistic regression where appropriate, stratifying by time since post-symptom onset. We tabulated available data by test manufacturer. We have presented uncertainty in estimates of sensitivity and specificity using 95% confidence intervals (CIs).
We included 57 publications reporting on a total of 54 study cohorts with 15,976 samples, of which 8526 were from cases of SARS-CoV-2 infection. Studies were conducted in Asia (n = 38), Europe (n = 15), and the USA and China (n = 1). We identified data from 25 commercial tests and numerous in-house assays, a small fraction of the 279 antibody assays listed by the Foundation for Innovative Diagnostics. More than half (n = 28) of the studies included were only available as preprints. We had concerns about risk of bias and applicability. Common issues were use of multi-group designs (n = 29), inclusion of only COVID-19 cases (n = 19), lack of blinding of the index test (n = 49) and reference standard (n = 29), differential verification (n = 22), and the lack of clarity about participant numbers, characteristics and study exclusions (n = 47). Most studies (n = 44) only included people hospitalised due to suspected or confirmed COVID-19 infection. There were no studies exclusively in asymptomatic participants. Two-thirds of the studies (n = 33) defined COVID-19 cases based on RT-PCR results alone, ignoring the potential for false-negative RT-PCR results. We observed evidence of selective publication of study findings through omission of the identity of tests (n = 5). We observed substantial heterogeneity in sensitivities of IgA, IgM and IgG antibodies, or combinations thereof, for results aggregated across different time periods post-symptom onset (range 0% to 100% for all target antibodies). We thus based the main results of the review on the 38 studies that stratified results by time since symptom onset. The numbers of individuals contributing data within each study each week are small and are usually not based on tracking the same groups of patients over time. Pooled results for IgG, IgM, IgA, total antibodies and IgG/IgM all showed low sensitivity during the first week since onset of symptoms (all less than 30.1%), rising in the second week and reaching their highest values in the third week. The combination of IgG/IgM had a sensitivity of 30.1% (95% CI 21.4 to 40.7) for 1 to 7 days, 72.2% (95% CI 63.5 to 79.5) for 8 to 14 days, 91.4% (95% CI 87.0 to 94.4) for 15 to 21 days. Estimates of accuracy beyond three weeks are based on smaller sample sizes and fewer studies. For 21 to 35 days, pooled sensitivities for IgG/IgM were 96.0% (95% CI 90.6 to 98.3). There are insufficient studies to estimate sensitivity of tests beyond 35 days post-symptom onset. Summary specificities (provided in 35 studies) exceeded 98% for all target antibodies with confidence intervals no more than 2 percentage points wide. False-positive results were more common where COVID-19 had been suspected and ruled out, but numbers were small and the difference was within the range expected by chance. Assuming a prevalence of 50%, a value considered possible in healthcare workers who have suffered respiratory symptoms, we would anticipate that 43 (28 to 65) would be missed and 7 (3 to 14) would be falsely positive in 1000 people undergoing IgG/IgM testing at days 15 to 21 post-symptom onset. At a prevalence of 20%, a likely value in surveys in high-risk settings, 17 (11 to 26) would be missed per 1000 people tested and 10 (5 to 22) would be falsely positive. At a lower prevalence of 5%, a likely value in national surveys, 4 (3 to 7) would be missed per 1000 tested, and 12 (6 to 27) would be falsely positive. Analyses showed small differences in sensitivity between assay type, but methodological concerns and sparse data prevent comparisons between test brands.
AUTHORS' CONCLUSIONS: The sensitivity of antibody tests is too low in the first week since symptom onset to have a primary role for the diagnosis of COVID-19, but they may still have a role complementing other testing in individuals presenting later, when RT-PCR tests are negative, or are not done. Antibody tests are likely to have a useful role for detecting previous SARS-CoV-2 infection if used 15 or more days after the onset of symptoms. However, the duration of antibody rises is currently unknown, and we found very little data beyond 35 days post-symptom onset. We are therefore uncertain about the utility of these tests for seroprevalence surveys for public health management purposes. Concerns about high risk of bias and applicability make it likely that the accuracy of tests when used in clinical care will be lower than reported in the included studies. Sensitivity has mainly been evaluated in hospitalised patients, so it is unclear whether the tests are able to detect lower antibody levels likely seen with milder and asymptomatic COVID-19 disease. The design, execution and reporting of studies of the accuracy of COVID-19 tests requires considerable improvement. Studies must report data on sensitivity disaggregated by time since onset of symptoms. COVID-19-positive cases who are RT-PCR-negative should be included as well as those confirmed RT-PCR, in accordance with the World Health Organization (WHO) and China National Health Commission of the People's Republic of China (CDC) case definitions. We were only able to obtain data from a small proportion of available tests, and action is needed to ensure that all results of test evaluations are available in the public domain to prevent selective reporting. This is a fast-moving field and we plan ongoing updates of this living systematic review.
严重急性呼吸综合征冠状病毒2(SARS-CoV-2)及由此引发的新型冠状病毒肺炎(COVID-19)大流行带来了重大诊断挑战。有多种诊断策略可用于识别当前感染、排除感染、确定需要升级护理的人群,或检测既往感染及免疫反应。检测SARS-CoV-2抗体的血清学检测旨在识别既往SARS-CoV-2感染,可能有助于确认当前感染情况。
评估抗体检测用于确定社区、初级或二级医疗机构中个体是否感染SARS-CoV-2或既往是否感染过SARS-CoV-2的诊断准确性,以及抗体检测用于血清流行率调查的准确性。
我们在Cochrane COVID-19研究注册库以及伯尔尼大学的COVID-19实时证据数据库中进行了电子检索,该数据库每日更新来自PubMed和Embase的已发表文章以及来自medRxiv和bioRxiv的预印本。此外,我们还检查了COVID-19出版物库。我们未设置任何语言限制。我们对本次综述的检索截至2020年4月27日。
我们纳入了任何设计的检测准确性研究,这些研究评估了针对疑似当前或既往感染SARS-CoV-2的个体或用于筛查感染的抗体检测(包括酶联免疫吸附测定、化学发光免疫测定和侧向流动测定)。我们还纳入了已知感染或未感染SARS-CoV-2的个体的研究。我们纳入了所有用于定义SARS-CoV-2感染与否的参考标准(包括逆转录聚合酶链反应检测(RT-PCR)和临床诊断标准)。
我们使用QUADAS-2工具评估研究的可能偏倚和适用性。我们提取了2×2列联表数据,并使用配对森林图展示每种抗体(或抗体组合)的敏感性和特异性。在适当情况下,我们使用随机效应逻辑回归合并数据,并按症状出现后的时间进行分层。我们按检测制造商列出了可用数据。我们使用95%置信区间(CI)表示敏感性和特异性估计值的不确定性。
我们纳入了57篇出版物,报告了总共54个研究队列的15976个样本,其中8526个来自SARS-CoV-2感染病例。研究在亚洲(n = 38)、欧洲(n = 15)以及美国和中国(n = 1)进行。我们识别了来自25种商业检测和众多内部检测的数据,这只是创新诊断基金会列出的279种抗体检测中的一小部分。纳入的研究中超过一半(n = 28)仅以预印本形式提供。我们对偏倚风险和适用性存在担忧。常见问题包括使用多组设计(n = 29)、仅纳入COVID-19病例(n = 19)、索引检测(n = 49)和参考标准(n = 29)缺乏盲法、差异验证(n = 22)以及参与者数量、特征和研究排除标准不明确(n = 47)。大多数研究(n = 44)仅纳入了因疑似或确诊COVID-19感染而住院的患者。没有专门针对无症状参与者的研究。三分之二的研究(n = 33)仅根据RT-PCR结果定义COVID-19病例,忽略了RT-PCR结果可能出现假阴性的可能性。我们观察到通过省略检测标识存在选择性发表研究结果的证据(n = 5)。我们观察到在症状出现后的不同时间段汇总结果时,IgA、IgM和IgG抗体或其组合的敏感性存在很大异质性(所有目标抗体的范围为0%至100%)。因此,本综述的主要结果基于38项按症状出现时间分层结果的研究。每项研究中每周提供数据的个体数量较少,且通常不是基于对同一组患者的长期跟踪。IgG、IgM、IgA、总抗体和IgG/IgM的合并结果显示,在症状出现后的第一周敏感性均较低(均低于30.1%),在第二周上升,并在第三周达到最高值。IgG/IgM组合在症状出现后1至7天的敏感性为30.1%(95%CI 21.4至40.7),8至14天为72.2%(95%CI 63.5至79.5),15至21天为91.4%(95%CI 87.0至94.4)。超过三周的准确性估计基于较小的样本量和较少的研究。对于21至35天,IgG/IgM的合并敏感性为96.0%(95%CI 90.6至98.3)。没有足够的研究来估计症状出现后35天以上检测的敏感性。35项研究提供的汇总特异性超过了所有目标抗体的98%,置信区间宽度不超过2个百分点。在疑似COVID-19并已排除的情况下,假阳性结果更为常见,但数量较少,差异在偶然预期范围内。假设患病率为50%(这在出现呼吸道症状的医护人员中被认为是可能的),我们预计在症状出现后15至21天进行IgG/IgM检测的1000人中,有43人(28至65人)会被漏诊,7人(3至14人)会出现假阳性。在患病率为20%(这在高风险环境调查中可能出现)时,每1000名检测者中有17人(11至26人)会被漏诊,10人(5至22人)会出现假阳性。在患病率为5%(这在全国调查中可能出现)时,每1000名检测者中有4人(3至7人)会被漏诊,12人(6至27人)会出现假阳性。分析显示不同检测类型之间的敏感性差异较小,但方法学问题和数据稀少阻碍了不同检测品牌之间的比较。
在症状出现后的第一周,抗体检测的敏感性过低,无法在COVID-19诊断中发挥主要作用,但在症状出现较晚、RT-PCR检测为阴性或未进行检测的个体中,它们仍可能在补充其他检测方面发挥作用。如果在症状出现15天或更长时间后使用,抗体检测可能在检测既往SARS-CoV-2感染方面发挥有用作用。然而,抗体升高的持续时间目前尚不清楚,我们在症状出现后35天以上几乎没有发现相关数据。因此,我们不确定这些检测在用于公共卫生管理目的的血清流行率调查中的效用。对高偏倚风险和适用性的担忧使得这些检测在临床护理中使用时的准确性可能低于纳入研究中报告的准确性。敏感性主要在住院患者中进行了评估,因此尚不清楚这些检测是否能够检测到轻度和无症状COVID-19疾病可能出现的较低抗体水平。COVID-19检测准确性研究的设计、实施和报告需要大幅改进。研究必须报告按症状出现时间分层的敏感性数据。应根据世界卫生组织(WHO)和中华人民共和国国家卫生健康委员会(CDC)的病例定义,纳入RT-PCR阴性的COVID-19阳性病例以及RT-PCR确诊病例。我们仅能从一小部分可用检测中获取数据,需要采取行动确保所有检测评估结果在公共领域可用,以防止选择性报告。这是一个快速发展的领域,我们计划对本实时系统综述进行持续更新。