• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

构建放射学网络:一种用于注释大规模多模态医学数据库的无监督方法。

Building RadiologyNET: an unsupervised approach to annotating a large-scale multimodal medical database.

作者信息

Napravnik Mateja, Hržić Franko, Tschauner Sebastian, Štajduhar Ivan

机构信息

Faculty of Engineering, University of Rijeka, Vukovarska 58, Rijeka, 51000, Croatia.

Center for Artificial Intelligence and Cybersecurity, Radmile Matejcic 2, Rijeka, 51000, Croatia.

出版信息

BioData Min. 2024 Jul 12;17(1):22. doi: 10.1186/s13040-024-00373-1.

DOI:10.1186/s13040-024-00373-1
PMID:38997749
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11245804/
Abstract

BACKGROUND

The use of machine learning in medical diagnosis and treatment has grown significantly in recent years with the development of computer-aided diagnosis systems, often based on annotated medical radiology images. However, the lack of large annotated image datasets remains a major obstacle, as the annotation process is time-consuming and costly. This study aims to overcome this challenge by proposing an automated method for annotating a large database of medical radiology images based on their semantic similarity.

RESULTS

An automated, unsupervised approach is used to create a large annotated dataset of medical radiology images originating from the Clinical Hospital Centre Rijeka, Croatia. The pipeline is built by data-mining three different types of medical data: images, DICOM metadata and narrative diagnoses. The optimal feature extractors are then integrated into a multimodal representation, which is then clustered to create an automated pipeline for labelling a precursor dataset of 1,337,926 medical images into 50 clusters of visually similar images. The quality of the clusters is assessed by examining their homogeneity and mutual information, taking into account the anatomical region and modality representation.

CONCLUSIONS

The results indicate that fusing the embeddings of all three data sources together provides the best results for the task of unsupervised clustering of large-scale medical data and leads to the most concise clusters. Hence, this work marks the initial step towards building a much larger and more fine-grained annotated dataset of medical radiology images.

摘要

背景

近年来,随着计算机辅助诊断系统的发展,机器学习在医学诊断和治疗中的应用显著增加,这些系统通常基于带注释的医学放射图像。然而,由于注释过程既耗时又昂贵,缺乏大型带注释图像数据集仍然是一个主要障碍。本研究旨在通过提出一种基于语义相似性对大型医学放射图像数据库进行注释的自动化方法来克服这一挑战。

结果

采用一种自动化的无监督方法创建了一个来自克罗地亚里耶卡临床医院中心的大型带注释医学放射图像数据集。该流程通过挖掘三种不同类型的医学数据构建:图像、DICOM元数据和叙述性诊断。然后将最佳特征提取器集成到多模态表示中,接着进行聚类,以创建一个自动化流程,将1337926张医学图像的前体数据集标记为50个视觉上相似图像的聚类。通过检查聚类的同质性和互信息来评估聚类质量,同时考虑解剖区域和模态表示。

结论

结果表明,将所有三个数据源的嵌入融合在一起,对于大规模医学数据的无监督聚类任务能提供最佳结果,并能得到最简洁的聚类。因此,这项工作标志着朝着构建一个更大、更细粒度的医学放射图像带注释数据集迈出了第一步。

相似文献

1
Building RadiologyNET: an unsupervised approach to annotating a large-scale multimodal medical database.构建放射学网络:一种用于注释大规模多模态医学数据库的无监督方法。
BioData Min. 2024 Jul 12;17(1):22. doi: 10.1186/s13040-024-00373-1.
2
Adapting content-based image retrieval techniques for the semantic annotation of medical images.将基于内容的图像检索技术应用于医学图像的语义标注。
Comput Med Imaging Graph. 2016 Apr;49:37-45. doi: 10.1016/j.compmedimag.2016.01.001. Epub 2016 Feb 4.
3
Unsupervised Visual-Textual Correlation Learning With Fine-Grained Semantic Alignment.无监督视觉-文本关联学习与细粒度语义对齐。
IEEE Trans Cybern. 2022 May;52(5):3669-3683. doi: 10.1109/TCYB.2020.3015084. Epub 2022 May 19.
4
Deep Semi-Supervised Algorithm for Learning Cluster-Oriented Representations of Medical Images Using Partially Observable DICOM Tags and Images.使用部分可观察的DICOM标签和图像学习医学图像面向聚类表示的深度半监督算法
Diagnostics (Basel). 2021 Oct 17;11(10):1920. doi: 10.3390/diagnostics11101920.
5
Information extraction from multi-institutional radiology reports.从多机构放射学报告中提取信息。
Artif Intell Med. 2016 Jan;66:29-39. doi: 10.1016/j.artmed.2015.09.007. Epub 2015 Oct 3.
6
On the objectivity, reliability, and validity of deep learning enabled bioimage analyses.深度学习赋能的生物影像分析的客观性、可靠性和有效性。
Elife. 2020 Oct 19;9:e59780. doi: 10.7554/eLife.59780.
7
Convolutional sparse kernel network for unsupervised medical image analysis.卷积稀疏核网络在医学图像无监督分析中的应用。
Med Image Anal. 2019 Aug;56:140-151. doi: 10.1016/j.media.2019.06.005. Epub 2019 Jun 12.
8
Deep contrastive learning based tissue clustering for annotation-free histopathology image analysis.基于深度对比学习的无标注组织聚类在病理图像分析中的应用。
Comput Med Imaging Graph. 2022 Apr;97:102053. doi: 10.1016/j.compmedimag.2022.102053. Epub 2022 Mar 12.
9
ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal Image Dataset.ROCOv2:上下文版本 2 的放射学对象,一个更新的多模态图像数据集。
Sci Data. 2024 Jun 26;11(1):688. doi: 10.1038/s41597-024-03496-6.
10
The Utility of Unsupervised Machine Learning in Anatomic Pathology.无监督机器学习在解剖病理学中的应用。
Am J Clin Pathol. 2022 Jan 6;157(1):5-14. doi: 10.1093/ajcp/aqab085.

引用本文的文献

1
A simple and effective approach for body part recognition on CT scans based on projection estimation.一种基于投影估计的CT扫描图像上身体部位识别的简单有效方法。
Sci Rep. 2025 Aug 28;15(1):31788. doi: 10.1038/s41598-025-17174-z.
2
Lessons learned from RadiologyNET foundation models for transfer learning in medical radiology.从放射学NET基础模型中汲取的经验教训,用于医学放射学中的迁移学习。
Sci Rep. 2025 Jul 1;15(1):21622. doi: 10.1038/s41598-025-05009-w.

本文引用的文献

1
Assessment of emerging pretraining strategies in interpretable multimodal deep learning for cancer prognostication.用于癌症预后的可解释多模态深度学习中新兴预训练策略的评估。
BioData Min. 2023 Jul 22;16(1):23. doi: 10.1186/s13040-023-00338-w.
2
Changing word meanings in biomedical literature reveal pandemics and new technologies.生物医学文献中词汇意义的变化揭示了大流行病和新技术。
BioData Min. 2023 May 5;16(1):16. doi: 10.1186/s13040-023-00332-2.
3
Foundation models for generalist medical artificial intelligence.通用型医学人工智能的基础模型。
Nature. 2023 Apr;616(7956):259-265. doi: 10.1038/s41586-023-05881-4. Epub 2023 Apr 12.
4
Clinical assistant decision-making model of tuberculosis based on electronic health records.基于电子健康记录的结核病临床辅助决策模型
BioData Min. 2023 Mar 16;16(1):11. doi: 10.1186/s13040-023-00328-y.
5
A pediatric wrist trauma X-ray dataset (GRAZPEDWRI-DX) for machine learning.儿科腕部创伤 X 射线数据集(GRAZPEDWRI-DX)用于机器学习。
Sci Data. 2022 May 20;9(1):222. doi: 10.1038/s41597-022-01328-z.
6
AI in health and medicine.人工智能在医疗中的应用。
Nat Med. 2022 Jan;28(1):31-38. doi: 10.1038/s41591-021-01614-0. Epub 2022 Jan 20.
7
A survey on missing data in machine learning.关于机器学习中缺失数据的一项调查。
J Big Data. 2021;8(1):140. doi: 10.1186/s40537-021-00516-9. Epub 2021 Oct 27.
8
A survey of word embeddings for clinical text.临床文本词嵌入研究
J Biomed Inform. 2019;100S:100057. doi: 10.1016/j.yjbinx.2019.100057. Epub 2019 Oct 28.
9
Review of deep learning: concepts, CNN architectures, challenges, applications, future directions.深度学习综述:概念、卷积神经网络架构、挑战、应用及未来方向。
J Big Data. 2021;8(1):53. doi: 10.1186/s40537-021-00444-8. Epub 2021 Mar 31.
10
Deep learning-based ovarian cancer subtypes identification using multi-omics data.基于深度学习的多组学数据卵巢癌亚型识别
BioData Min. 2020 Aug 24;13:10. doi: 10.1186/s13040-020-00222-x. eCollection 2020.