• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

构建灵活、可扩展且可适应机器学习的多模态肿瘤学数据集。

Building Flexible, Scalable, and Machine Learning-Ready Multimodal Oncology Datasets.

机构信息

Department of Machine Learning, Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA.

Department of Electrical Engineering, University of South Florida, Tampa, FL 33620, USA.

出版信息

Sensors (Basel). 2024 Mar 2;24(5):1634. doi: 10.3390/s24051634.

DOI:10.3390/s24051634
PMID:38475170
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10933897/
Abstract

The advancements in data acquisition, storage, and processing techniques have resulted in the rapid growth of heterogeneous medical data. Integrating radiological scans, histopathology images, and molecular information with clinical data is essential for developing a holistic understanding of the disease and optimizing treatment. The need for integrating data from multiple sources is further pronounced in complex diseases such as cancer for enabling precision medicine and personalized treatments. This work proposes Multimodal Integration of Oncology Data System (MINDS)-a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources such as the Cancer Research Data Commons (CRDC) into an interconnected, patient-centric framework. MINDS consolidates over 41,000 cases from across repositories while achieving a high compression ratio relative to the 3.78 PB source data size. It offers sub-5-s query response times for interactive exploration. MINDS offers an interface for exploring relationships across data types and building cohorts for developing large-scale multimodal machine learning models. By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability to uncover diagnostic and prognostic insights and enable evidence-based personalized care. MINDS tracks granular end-to-end data provenance, ensuring reproducibility and transparency. The cloud-native architecture of MINDS can handle exponential data growth in a secure, cost-optimized manner while ensuring substantial storage optimization, replication avoidance, and dynamic access capabilities. Auto-scaling, access controls, and other mechanisms guarantee pipelines' scalability and security. MINDS overcomes the limitations of existing biomedical data silos via an interoperable metadata-driven approach that represents a pivotal step toward the future of oncology data integration.

摘要

数据采集、存储和处理技术的进步导致了异构医疗数据的快速增长。将放射学扫描、组织病理学图像和分子信息与临床数据相结合,对于全面了解疾病和优化治疗至关重要。在癌症等复杂疾病中,需要整合来自多个来源的数据,以实现精准医学和个性化治疗。这项工作提出了 Multimodal Integration of Oncology Data System(MINDS)-一个灵活、可扩展且具有成本效益的元数据框架,用于有效地将来自公共数据源(如癌症研究数据共享平台(CRDC)的异类数据融合到一个相互连接的、以患者为中心的框架中。MINDS 整合了来自多个存储库的超过 41000 个病例,相对于 37.8PB 的源数据大小,实现了高压缩比。它提供了低于 5 秒的查询响应时间,用于交互式探索。MINDS 提供了一个用于探索数据类型之间关系和构建队列的接口,用于开发大规模多模态机器学习模型。通过协调多模态数据,MINDS 旨在为研究人员提供更大的分析能力,以发现诊断和预后见解,并实现基于证据的个性化护理。MINDS 跟踪细粒度的端到端数据来源,确保可重复性和透明度。MINDS 的云原生架构可以安全、优化成本的方式处理指数级的数据增长,同时确保大量的存储优化、避免复制和动态访问功能。自动缩放、访问控制和其他机制保证了管道的可扩展性和安全性。MINDS 通过互操作的元数据驱动方法克服了现有生物医学数据孤岛的限制,这是肿瘤学数据集成未来的关键一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/9c5b13813151/sensors-24-01634-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/25d16f6cdd79/sensors-24-01634-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/7d8c728fb214/sensors-24-01634-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/9392d16bd939/sensors-24-01634-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/f90fa7097358/sensors-24-01634-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/5970af0d4663/sensors-24-01634-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/f33f00afbb0f/sensors-24-01634-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/a74df292e1ec/sensors-24-01634-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/e5550996a7fe/sensors-24-01634-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/9c5b13813151/sensors-24-01634-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/25d16f6cdd79/sensors-24-01634-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/7d8c728fb214/sensors-24-01634-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/9392d16bd939/sensors-24-01634-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/f90fa7097358/sensors-24-01634-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/5970af0d4663/sensors-24-01634-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/f33f00afbb0f/sensors-24-01634-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/a74df292e1ec/sensors-24-01634-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/e5550996a7fe/sensors-24-01634-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/9c5b13813151/sensors-24-01634-g009.jpg

相似文献

1
Building Flexible, Scalable, and Machine Learning-Ready Multimodal Oncology Datasets.构建灵活、可扩展且可适应机器学习的多模态肿瘤学数据集。
Sensors (Basel). 2024 Mar 2;24(5):1634. doi: 10.3390/s24051634.
2
Blockchain-Powered Healthcare Systems: Enhancing Scalability and Security with Hybrid Deep Learning.区块链赋能的医疗保健系统:通过混合深度学习提高可扩展性和安全性。
Sensors (Basel). 2023 Sep 7;23(18):7740. doi: 10.3390/s23187740.
3
Multimodal data integration for oncology in the era of deep neural networks: a review.深度神经网络时代肿瘤学中的多模态数据整合:综述
Front Artif Intell. 2024 Jul 25;7:1408843. doi: 10.3389/frai.2024.1408843. eCollection 2024.
4
A semantic proteomics dashboard (SemPoD) for data management in translational research.用于转化研究数据管理的语义蛋白质组学仪表板(SemPoD)。
BMC Syst Biol. 2012;6 Suppl 3(Suppl 3):S20. doi: 10.1186/1752-0509-6-S3-S20. Epub 2012 Dec 17.
5
Deep learning-based multimodal spatial transcriptomics analysis for cancer.基于深度学习的癌症多模态空间转录组学分析。
Adv Cancer Res. 2024;163:1-38. doi: 10.1016/bs.acr.2024.08.001. Epub 2024 Aug 22.
6
RGMQL: scalable and interoperable computing of heterogeneous omics big data and metadata in R/Bioconductor.RGMQL:在 R/Bioconductor 中可扩展和互操作的异构组学大数据和元数据的计算。
BMC Bioinformatics. 2022 Apr 7;23(1):123. doi: 10.1186/s12859-022-04648-4.
7
Making Metadata Machine-Readable as the First Step to Providing Findable, Accessible, Interoperable, and Reusable Population Health Data: Framework Development and Implementation Study.将元数据转化为机器可读形式作为提供可查找、可访问、可互操作和可重用的人群健康数据的第一步:框架开发与实施研究
Online J Public Health Inform. 2024 Aug 1;16:e56237. doi: 10.2196/56237.
8
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
9
INSPIRE datahub: a pan-African integrated suite of services for harmonising longitudinal population health data using OHDSI tools.INSPIRE数据中心:一个使用OHDSI工具协调纵向人群健康数据的泛非综合服务套件。
Front Digit Health. 2024 Jan 29;6:1329630. doi: 10.3389/fdgth.2024.1329630. eCollection 2024.
10
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning.多模态基准测试:用于多模态表示学习的多尺度基准测试
Adv Neural Inf Process Syst. 2021 Dec;2021(DB1):1-20.

引用本文的文献

1
Self-Normalizing Multi-Omics Neural Network for Pan-Cancer Prognostication.用于泛癌预后预测的自归一化多组学神经网络
Int J Mol Sci. 2025 Jul 30;26(15):7358. doi: 10.3390/ijms26157358.
2
Reliable Radiologic Skeletal Muscle Area Assessment - A Biomarker for Cancer Cachexia Diagnosis.可靠的放射学骨骼肌面积评估——一种用于癌症恶病质诊断的生物标志物。
medRxiv. 2025 Apr 25:2025.04.21.25326162. doi: 10.1101/2025.04.21.25326162.
3
Innovations in heart failure management: The role of cutting-edge biomarkers and multi-omics integration.心力衰竭管理的创新:前沿生物标志物和多组学整合的作用。

本文引用的文献

1
Multimodal data integration for oncology in the era of deep neural networks: a review.深度神经网络时代肿瘤学中的多模态数据整合:综述
Front Artif Intell. 2024 Jul 25;7:1408843. doi: 10.3389/frai.2024.1408843. eCollection 2024.
2
Multimodal learning with graphs.基于图的多模态学习。
Nat Mach Intell. 2023 Apr;5(4):340-350. doi: 10.1038/s42256-023-00624-6. Epub 2023 Apr 3.
3
Multimodal analysis and the oncology patient: Creating a hospital system for integrated diagnostics and discovery.多模态分析与肿瘤患者:创建一个用于综合诊断与发现的医院系统。
J Mol Cell Cardiol Plus. 2025 Mar 1;11:100290. doi: 10.1016/j.jmccpl.2025.100290. eCollection 2025 Mar.
4
Mechanisms and technologies in cancer epigenetics.癌症表观遗传学的机制与技术
Front Oncol. 2025 Jan 7;14:1513654. doi: 10.3389/fonc.2024.1513654. eCollection 2024.
5
Vision-language models for medical report generation and visual question answering: a review.用于医学报告生成和视觉问答的视觉语言模型:综述
Front Artif Intell. 2024 Nov 19;7:1430984. doi: 10.3389/frai.2024.1430984. eCollection 2024.
6
Multimodal data integration for oncology in the era of deep neural networks: a review.深度神经网络时代肿瘤学中的多模态数据整合:综述
Front Artif Intell. 2024 Jul 25;7:1408843. doi: 10.3389/frai.2024.1408843. eCollection 2024.
Comput Struct Biotechnol J. 2023 Sep 15;21:4536-4539. doi: 10.1016/j.csbj.2023.09.014. eCollection 2023.
4
Revolutionizing Digital Pathology With the Power of Generative Artificial Intelligence and Foundation Models.利用生成式人工智能和基础模型推动数字病理学革命。
Lab Invest. 2023 Nov;103(11):100255. doi: 10.1016/j.labinv.2023.100255. Epub 2023 Sep 26.
5
Multimodal Learning With Transformers: A Survey.基于Transformer的多模态学习:一项综述。
IEEE Trans Pattern Anal Mach Intell. 2023 Oct;45(10):12113-12132. doi: 10.1109/TPAMI.2023.3275156. Epub 2023 Sep 5.
6
Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning.利用机器学习提取的电子健康记录数据在肿瘤学中复制真实世界证据
Cancers (Basel). 2023 Mar 20;15(6):1853. doi: 10.3390/cancers15061853.
7
Artificial intelligence for multimodal data integration in oncology.人工智能在肿瘤学中用于多模态数据整合。
Cancer Cell. 2022 Oct 10;40(10):1095-1110. doi: 10.1016/j.ccell.2022.09.012.
8
Failure Detection in Deep Neural Networks for Medical Imaging.医学成像深度神经网络中的故障检测
Front Med Technol. 2022 Jul 22;4:919046. doi: 10.3389/fmedt.2022.919046. eCollection 2022.
9
The ReIMAGINE Multimodal Warehouse: Using Artificial Intelligence for Accurate Risk Stratification of Prostate Cancer.重新构想多模式仓库:利用人工智能对前列腺癌进行准确的风险分层。
Front Artif Intell. 2021 Nov 16;4:769582. doi: 10.3389/frai.2021.769582. eCollection 2021.
10
From biobank and data silos into a data commons: convergence to support translational medicine.从生物库和数据孤岛到数据共享:汇聚以支持转化医学。
J Transl Med. 2021 Dec 4;19(1):493. doi: 10.1186/s12967-021-03147-z.