Suppr超能文献

COVID-19-CT-CXR:一个可免费获取的、基于生物医学文献的关于COVID-19的弱标注胸部X光和CT图像集。

COVID-19-CT-CXR: A Freely Accessible and Weakly Labeled Chest X-Ray and CT Image Collection on COVID-19 From Biomedical Literature.

作者信息

Peng Yifan, Tang Yuxing, Lee Sungwon, Zhu Yingying, Summers Ronald M, Lu Zhiyong

机构信息

NCBI/NLM/NIH and Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065 USA.

Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Radiology and Imaging Sciences Department, National Institutes of Health (NIH) Clinical Center, Bethesda, MD 20892 USA.

出版信息

IEEE Trans Big Data. 2021 Mar 1;7(1):3-12. doi: 10.1109/tbdata.2020.3035935. Epub 2020 Nov 4.

Abstract

The latest threat to global health is the COVID-19 outbreak. Although there exist large datasets of chest X-rays (CXR) and computed tomography (CT) scans, few COVID-19 image collections are currently available due to patient privacy. At the same time, there is a rapid growth of COVID-19-relevant articles in the biomedical literature, including those that report findings on radiographs. Here, we present COVID-19-CT-CXR, a public database of COVID-19 CXR and CT images, which are automatically extracted from COVID-19-relevant articles from the PubMed Central Open Access (PMC-OA) Subset. We extracted figures, associated captions, and relevant figure descriptions in the article and separated compound figures into subfigures. Because a large portion of figures in COVID-19 articles are not CXR or CT, we designed a deep-learning model to distinguish them from other figure types and to classify them accordingly. The final database includes 1,327 CT and 263 CXR images (as of May 9, 2020) with their relevant text. To demonstrate the utility of COVID-19-CT-CXR, we conducted four case studies. (1) We show that COVID-19-CT-CXR, when used as additional training data, is able to contribute to improved deep-learning (DL) performance for the classification of COVID-19 and non-COVID-19 CT. (2) We collected CT images of influenza, another common infectious respiratory illness that may present similarly to COVID-19, and fine-tuned a baseline deep neural network to distinguish a diagnosis of COVID-19, influenza, or normal or other types of diseases on CT. (3) We fine-tuned an unsupervised one-class classifier from non-COVID-19 CXR and performed anomaly detection to detect COVID-19 CXR. (4) From text-mined captions and figure descriptions, we compared 15 clinical symptoms and 20 clinical findings of COVID-19 versus those of influenza to demonstrate the disease differences in the scientific publications. Our database is unique, as the figures are retrieved along with relevant text with fine-grained descriptions, and it can be extended easily in the future. We believe that our work is complementary to existing resources and hope that it will contribute to medical image analysis of the COVID-19 pandemic. The dataset, code, and DL models are publicly available at https://github.com/ncbi-nlp/COVID-19-CT-CXR.

摘要

全球健康面临的最新威胁是新型冠状病毒肺炎(COVID-19)疫情。尽管存在大量胸部X光(CXR)和计算机断层扫描(CT)数据集,但由于患者隐私问题,目前可用的COVID-19图像集很少。与此同时,生物医学文献中与COVID-19相关的文章数量迅速增长,包括那些报告X光片检查结果的文章。在此,我们展示了COVID-19-CT-CXR,这是一个COVID-19 CXR和CT图像的公共数据库,这些图像是从美国国立医学图书馆开放获取(PMC-OA)子集中与COVID-19相关的文章中自动提取的。我们提取了文章中的图表、相关标题和相关的图表描述,并将复合图分成子图。由于COVID-19文章中的大部分图表不是CXR或CT,我们设计了一个深度学习模型来将它们与其他图表类型区分开来,并进行相应分类。最终数据库包括1327张CT图像和263张CXR图像(截至2020年5月日)及其相关文本。为了证明COVID-19-CT-CXR的实用性,我们进行了四个案例研究。(1)我们表明,COVID-19-CT-CXR用作额外的训练数据时,能够有助于提高深度学习(DL)对COVID-19和非COVID-19 CT分类的性能。(2)我们收集了流感的CT图像,流感是另一种常见的传染性呼吸道疾病,其表现可能与COVID-19相似,并对一个基线深度神经网络进行微调,以区分COVID-19、流感或正常或其他类型疾病的CT诊断。(3)我们从非COVID-19 CXR中微调了一个无监督单类分类器,并进行异常检测以检测COVID-19 CXR。(4)从文本挖掘的标题和图表描述中,我们比较了COVID-19与流感的15种临床症状和20种临床检查结果,以证明科学出版物中的疾病差异。我们的数据库是独一无二的,因为图表是与带有细粒度描述的相关文本一起检索的,并且将来可以轻松扩展。我们相信我们的工作是对现有资源的补充,并希望它将有助于COVID-19疫情的医学图像分析。该数据集、代码和DL模型可在https://github.com/ncbi-nlp/COVID-19-CT-CXR上公开获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eae1/8769023/df683df05f4d/peng1-3035935.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验