Lazarus Jeffrey V, Palayew Adam, Rasmussen Lauge Neimann, Andersen Tue Helms, Nicholson Joey, Norgaard Ole
Barcelona Institute for Global Health (ISGlobal), Hospital Clínic, University of Barcelona, Barcelona, Spain.
Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montreal, QC, Canada.
J Med Internet Res. 2020 Nov 26;22(11):e23449. doi: 10.2196/23449.
Since it was declared a pandemic on March 11, 2020, COVID-19 has dominated headlines around the world and researchers have generated thousands of scientific articles about the disease. The fast speed of publication has challenged researchers and other stakeholders to keep up with the volume of published articles. To search the literature effectively, researchers use databases such as PubMed.
The aim of this study is to evaluate the performance of different searches for COVID-19 records in PubMed and to assess the complexity of searches required.
We tested PubMed searches for COVID-19 to identify which search string performed best according to standard metrics (sensitivity, precision, and F-score). We evaluated the performance of 8 different searches in PubMed during the first 10 weeks of the COVID-19 pandemic to investigate how complex a search string is needed. We also tested omitting hyphens and space characters as well as applying quotation marks.
The two most comprehensive search strings combining several free-text and indexed search terms performed best in terms of sensitivity (98.4%/98.7%) and F-score (96.5%/95.7%), but the single-term search COVID-19 performed best in terms of precision (95.3%) and well in terms of sensitivity (94.4%) and F-score (94.8%). The term Wuhan virus performed the worst: 7.7% for sensitivity, 78.1% for precision, and 14.0% for F-score. We found that deleting a hyphen or space character could omit a substantial number of records, especially when searching with SARS-CoV-2 as a single term.
Comprehensive search strings combining free-text and indexed search terms performed better than single-term searches in PubMed, but not by a large margin compared to the single term COVID-19. For everyday searches, certain single-term searches that are entered correctly are probably sufficient, whereas more comprehensive searches should be used for systematic reviews. Still, we suggest additional measures that the US National Library of Medicine could take to support all PubMed users in searching the COVID-19 literature.
自2020年3月11日被宣布为大流行病以来,新型冠状病毒肺炎(COVID-19)一直占据着世界各地的新闻头条,研究人员已就该疾病发表了数千篇科学文章。如此快的发表速度给研究人员和其他利益相关者带来了挑战,要跟上已发表文章的数量。为了有效地检索文献,研究人员使用诸如PubMed之类的数据库。
本研究的目的是评估在PubMed中对COVID-19记录进行不同检索的性能,并评估所需检索的复杂性。
我们测试了在PubMed中对COVID-19的检索,以根据标准指标(敏感性、精确性和F值)确定哪种检索词表现最佳。我们评估了在COVID-19大流行的前10周内PubMed中8种不同检索的性能,以研究需要多复杂的检索词。我们还测试了省略连字符和空格字符以及使用引号的情况。
结合多个自由文本和索引检索词的两个最全面的检索词在敏感性(98.4%/98.7%)和F值(96.5%/95.7%)方面表现最佳,但单检索词“COVID-19”在精确性方面表现最佳(95.3%),在敏感性(94.4%)和F值(94.8%)方面也表现良好。检索词“武汉病毒”表现最差:敏感性为7.7%,精确性为78.1%,F值为14.0%。我们发现删除连字符或空格字符可能会遗漏大量记录,尤其是当将“严重急性呼吸综合征冠状病毒2(SARS-CoV-2)”作为单个检索词进行检索时。
在PubMed中,结合自由文本和索引检索词的全面检索词比单检索词检索表现更好,但与单检索词“COVID-19”相比优势并不明显。对于日常检索,某些正确输入的单检索词可能就足够了,而对于系统评价则应使用更全面的检索。尽管如此,我们建议美国国立医学图书馆可以采取额外措施来支持所有PubMed用户检索COVID-19文献。