Suppr超能文献

COVID-19-CT-CXR:一个可免费获取的、基于生物医学文献的关于COVID-19的弱标注胸部X光和CT图像集。

COVID-19-CT-CXR: a freely accessible and weakly labeled chest X-ray and CT image collection on COVID-19 from biomedical literature.

作者信息

Peng Yifan, Tang Yu-Xing, Lee Sungwon, Zhu Yingying, Summers Ronald M, Lu Zhiyong

出版信息

ArXiv. 2020 Oct 22:arXiv:2006.06177v2.

Abstract

The latest threat to global health is the COVID-19 outbreak. Although there exist large datasets of chest X-rays (CXR) and computed tomography (CT) scans, few COVID-19 image collections are currently available due to patient privacy. At the same time, there is a rapid growth of COVID-19-relevant articles in the biomedical literature. Here, we present COVID-19-CT-CXR, a public database of COVID-19 CXR and CT images, which are automatically extracted from COVID-19-relevant articles from the PubMed Central Open Access (PMC-OA) Subset. We extracted figures, associated captions, and relevant figure descriptions in the article and separated compound figures into subfigures. We also designed a deep-learning model to distinguish them from other figure types and to classify them accordingly. The final database includes 1,327 CT and 263 CXR images (as of May 9, 2020) with their relevant text. To demonstrate the utility of COVID-19-CT-CXR, we conducted four case studies. (1) We show that COVID-19-CT-CXR, when used as additional training data, is able to contribute to improved DL performance for the classification of COVID-19 and non-COVID-19 CT. (2) We collected CT images of influenza and trained a DL baseline to distinguish a diagnosis of COVID-19, influenza, or normal or other types of diseases on CT. (3) We trained an unsupervised one-class classifier from non-COVID-19 CXR and performed anomaly detection to detect COVID-19 CXR. (4) From text-mined captions and figure descriptions, we compared clinical symptoms and clinical findings of COVID-19 vs. those of influenza to demonstrate the disease differences in the scientific publications. We believe that our work is complementary to existing resources and hope that it will contribute to medical image analysis of the COVID-19 pandemic. The dataset, code, and DL models are publicly available at https://github.com/ncbi-nlp/COVID-19-CT-CXR.

摘要

全球健康面临的最新威胁是新冠疫情。尽管存在大量胸部X光(CXR)和计算机断层扫描(CT)扫描数据集,但由于患者隐私问题,目前可用的新冠图像集很少。与此同时,生物医学文献中与新冠相关的文章数量迅速增长。在此,我们展示了COVID-19-CT-CXR,这是一个新冠CXR和CT图像的公共数据库,这些图像是从美国国立医学图书馆中央公开获取(PMC-OA)子集中与新冠相关的文章中自动提取的。我们提取了文章中的图表、相关标题以及相关的图表描述,并将复合图分割为子图。我们还设计了一个深度学习模型,以将它们与其他图类型区分开来并进行相应分类。最终数据库包括1327张CT图像和263张CXR图像(截至2020年5月9日)及其相关文本。为了证明COVID-19-CT-CXR的实用性,我们进行了四个案例研究。(1)我们表明,COVID-19-CT-CXR用作额外的训练数据时,能够有助于提高深度学习对新冠和非新冠CT分类的性能。(2)我们收集了流感的CT图像,并训练了一个深度学习基线模型,以区分新冠、流感、正常或其他类型疾病的CT诊断。(3)我们从非新冠CXR中训练了一个无监督的单类分类器,并进行异常检测以检测新冠CXR。(4)从文本挖掘的标题和图表描述中,我们比较了新冠与流感的临床症状和临床发现,以展示科学出版物中的疾病差异。我们相信我们的工作是对现有资源的补充,并希望它将有助于新冠疫情的医学图像分析。该数据集、代码和深度学习模型可在https://github.com/ncbi-nlp/COVID-19-CT-CXR上公开获取。

相似文献

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验