Beynon Rebecca, Leeflang Mariska M G, McDonald Steve, Eisinga Anne, Mitchell Ruth L, Whiting Penny, Glanville Julie M
School of Social and Community Medicine, Canynge Hall, 39 Whatley Road, Bristol, UK, BS8 2PS.
Cochrane Database Syst Rev. 2013 Sep 11;2013(9):MR000022. doi: 10.1002/14651858.MR000022.pub3.
A systematic and extensive search for as many eligible studies as possible is essential in any systematic review. When searching for diagnostic test accuracy (DTA) studies in bibliographic databases, it is recommended that terms for disease (target condition) are combined with terms for the diagnostic test (index test). Researchers have developed methodological filters to try to increase the precision of these searches. These consist of text words and database indexing terms and would be added to the target condition and index test searches.Efficiently identifying reports of DTA studies presents challenges because the methods are often not well reported in their titles and abstracts, suitable indexing terms may not be available and relevant indexing terms do not seem to be consistently assigned. A consequence of using search filters to identify records for diagnostic reviews is that relevant studies might be missed, while the number of irrelevant studies that need to be assessed may not be reduced. The current guidance for Cochrane DTA reviews recommends against the addition of a methodological search filter to target condition and index test search, as the only search approach.
To systematically review empirical studies that report the development or evaluation, or both, of methodological search filters designed to retrieve DTA studies in MEDLINE and EMBASE.
We searched MEDLINE (1950 to week 1 November 2012); EMBASE (1980 to 2012 Week 48); the Cochrane Methodology Register (Issue 3, 2012); ISI Web of Science (11 January 2013); PsycINFO (13 March 2013); Library and Information Science Abstracts (LISA) (31 May 2010); and Library, Information Science & Technology Abstracts (LISTA) (13 March 2013). We undertook citation searches on Web of Science, checked the reference lists of relevant studies, and searched the Search Filters Resource website of the InterTASC Information Specialists' Sub-Group (ISSG).
Studies reporting the development or evaluation, or both, of a MEDLINE or EMBASE search filter aimed at retrieving DTA studies, which reported a measure of the filter's performance were eligible.
The main outcome was a measure of filter performance, such as sensitivity or precision. We extracted data on the identification of the reference set (including the gold standard and, if used, the non-gold standard records), how the reference set was used and any limitations, the identification and combination of the search terms in the filters, internal and external validity testing, the number of filters evaluated, the date the study was conducted, the date the searches were completed, and the databases and search interfaces used. Where 2 x 2 data were available on filter performance, we used these to calculate sensitivity, specificity, precision and Number Needed to Read (NNR), and 95% confidence intervals (CIs). We compared the performance of a filter as reported by the original development study and any subsequent studies that evaluated the same filter.
Ninteen studies were included, reporting on 57 MEDLINE filters and 13 EMBASE filters. Thirty MEDLINE and four EMBASE filters were tested in an evaluation study where the performance of one or more filters was tested against one or more gold standards. The reported outcome measures varied. Some studies reported specificity as well as sensitivity if a reference set containing non-gold standard records in addition to gold standard records was used. In some cases, the original development study did not report any performance data on the filters. Original performance from the development study was not available for 17 filters that were subsequently tested in evaluation studies. All 19 studies reported the sensitivity of the filters that they developed or evaluated, nine studies reported the specificities and 14 studies reported the precision.No filter which had original performance data from its development study, and was subsequently tested in an evaluation study, had what we defined a priori as acceptable sensitivity (> 90%) and precision (> 10%). In studies that developed MEDLINE filters that were evaluated in another study (n = 13), the sensitivity ranged from 55% to 100% (median 86%) and specificity from 73% to 98% (median 95%). Estimates of performance were lower in eight studies that evaluated the same 13 MEDLINE filters, with sensitivities ranging from 14% to 100% (median 73%) and specificities ranging from 15% to 96% (median 81%). Precision ranged from 1.1% to 40% (median 9.5%) in studies that developed MEDLINE filters and from 0.2% to 16.7% (median 4%) in studies that evaluated these filters. A similar range of specificities and precision were reported amongst the evaluation studies for MEDLINE filters without an original performance measure. Sensitivities ranged from 31% to 100% (median 71%), specificity ranged from 13% to 90% (median 55.5%) and precision from 1.0% to 11.0% (median 3.35%).For the EMBASE filters, the original sensitivities reported in two development studies ranged from 74% to 100% (median 90%) for three filters, and precision ranged from 1.2% to 17.6% (median 3.7%). Evaluation studies of these filters had sensitivities from 72% to 97% (median 86%) and precision from 1.2% to 9% (median 3.7%). The performance of EMBASE search filters in development and evaluation studies were more alike than the performance of MEDLINE filters in development and evaluation studies. None of the EMBASE filters in either type of study had a sensitivity above 90% and precision above 10%.
AUTHORS' CONCLUSIONS: None of the current methodological filters designed to identify reports of primary DTA studies in MEDLINE or EMBASE combine sufficiently high sensitivity, required for systematic reviews, with a reasonable degree of precision. This finding supports the current recommendation in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy that the combination of methodological filter search terms with terms for the index test and target condition should not be used as the only approach when conducting formal searches to inform systematic reviews of DTA.
在任何系统评价中,尽可能系统且广泛地检索尽可能多的合格研究至关重要。在书目数据库中检索诊断试验准确性(DTA)研究时,建议将疾病(目标状况)的检索词与诊断试验(索引试验)的检索词相结合。研究人员已开发出方法学筛选器,试图提高这些检索的精确性。这些筛选器由文本词和数据库索引词组成,并将添加到目标状况和索引试验检索中。有效识别DTA研究报告存在挑战,因为这些方法在标题和摘要中往往报告不充分,可能没有合适的索引词,且相关索引词的分配似乎也不一致。使用检索筛选器来识别诊断评价记录的一个后果是,可能会遗漏相关研究,而需要评估的不相关研究数量可能并未减少。Cochrane DTA评价的当前指南建议,在进行正式检索以指导DTA的系统评价时,不应将方法学检索筛选器与索引试验和目标状况的检索词相结合作为唯一的检索方法。
系统评价报告为检索MEDLINE和EMBASE中的DTA研究而开发或评估,或两者兼有的方法学检索筛选器的实证研究。
我们检索了MEDLINE(1950年至2012年11月第1周);EMBASE(1980年至2012年第48周);Cochrane方法学注册库(2012年第3期);ISI科学网(2013年1月11日);PsycINFO(2013年3月13日);图书馆与信息科学文摘(LISA)(2010年5月31日);以及图书馆/信息科学与技术文摘(LISTA)(2013年3月13日)。我们在科学网上进行了引文检索,检查了相关研究的参考文献列表,并检索了InterTASC信息专家小组(ISSG)的检索筛选器资源网站。
报告为检索DTA研究而开发或评估,或两者兼有的MEDLINE或EMBASE检索筛选器,且报告了筛选器性能指标的研究符合要求。
主要结果是筛选器性能的一项指标,如敏感度或精确性。我们提取了关于参考集识别(包括金标准以及若使用时的非金标准记录)的数据、参考集的使用方式及任何局限性、筛选器中检索词的识别与组合、内部和外部效度测试、评估的筛选器数量、研究开展日期、检索完成日期以及使用的数据库和检索界面。若有关于筛选器性能的2×2数据,我们用其计算敏感度、特异度、精确性和需阅读文献数(NNR)以及95%置信区间(CIs)。我们比较了原始开发研究报告的筛选器性能与随后评估同一筛选器的任何研究的性能。
纳入了19项研究,报告了57个MEDLINE筛选器和13个EMBASE筛选器。在一项评估研究中测试了30个MEDLINE筛选器和4个EMBASE筛选器,其中一个或多个筛选器的性能是针对一个或多个金标准进行测试的。报告的结果指标各不相同。如果使用了除金标准记录外还包含非金标准记录的参考集,一些研究既报告了特异度也报告了敏感度。在某些情况下,原始开发研究未报告筛选器的任何性能数据。在随后的评估研究中测试的17个筛选器没有来自开发研究的原始性能数据。所有19项研究都报告了他们开发或评估的筛选器的敏感度,9项研究报告了特异度,14项研究报告了精确性。没有一个筛选器在其开发研究中有原始性能数据,且在随后的评估研究中测试时,具有我们事先定义的可接受敏感度(>90%)和精确性(>10%)。在开发并在另一项研究中评估的MEDLINE筛选器的研究(n = 13)中,敏感度范围为55%至100%(中位数86%),特异度范围为73%至98%(中位数95%)。在评估相同13个MEDLINE筛选器的8项研究中,性能估计值较低,敏感度范围为14%至100%(中位数73%),特异度范围为15%至96%(中位数81%)。在开发MEDLINE筛选器的研究中,精确性范围为1.1%至40%(中位数9.5%),在评估这些筛选器的研究中,精确性范围为0.2%至16.7%(中位数4%)。在没有原始性能测量的MEDLINE筛选器的评估研究中,报告了类似范围的特异度和精确性。敏感度范围为31%至100%(中位数71%),特异度范围为13%至90%(中位数55.5%),精确性范围为1.0%至11.0%(中位数3.35%)。对于EMBASE筛选器,两项开发研究报告中的原始敏感度,三个筛选器的范围为74%至100%(中位数90%),精确性范围为1.2%至17.6%(中位数3.7%)。对这些筛选器的评估研究中,敏感度为72%至97%(中位数86%),精确性为1.2%至9%(中位数3.7%)。EMBASE检索筛选器在开发和评估研究中的性能比MEDLINE筛选器在开发和评估研究中的性能更相似。在任何一种类型研究中的EMBASE筛选器都没有敏感度高于90%且精确性高于10%的情况。
目前旨在识别MEDLINE或EMBASE中主要DTA研究报告的方法学筛选器,没有一个能将系统评价所需的足够高的敏感度与合理程度的精确性相结合。这一发现支持了Cochrane诊断试验准确性系统评价手册中的当前建议,即在进行正式检索以指导DTA的系统评价时,不应将方法学筛选器检索词与索引试验和目标状况的检索词相结合作为唯一方法。