Suppr超能文献

构建灵活、可扩展且可适应机器学习的多模态肿瘤学数据集。

Building Flexible, Scalable, and Machine Learning-Ready Multimodal Oncology Datasets.

机构信息

Department of Machine Learning, Moffitt Cancer Center & Research Institute, Tampa, FL 33612, USA.

Department of Electrical Engineering, University of South Florida, Tampa, FL 33620, USA.

出版信息

Sensors (Basel). 2024 Mar 2;24(5):1634. doi: 10.3390/s24051634.

Abstract

The advancements in data acquisition, storage, and processing techniques have resulted in the rapid growth of heterogeneous medical data. Integrating radiological scans, histopathology images, and molecular information with clinical data is essential for developing a holistic understanding of the disease and optimizing treatment. The need for integrating data from multiple sources is further pronounced in complex diseases such as cancer for enabling precision medicine and personalized treatments. This work proposes Multimodal Integration of Oncology Data System (MINDS)-a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources such as the Cancer Research Data Commons (CRDC) into an interconnected, patient-centric framework. MINDS consolidates over 41,000 cases from across repositories while achieving a high compression ratio relative to the 3.78 PB source data size. It offers sub-5-s query response times for interactive exploration. MINDS offers an interface for exploring relationships across data types and building cohorts for developing large-scale multimodal machine learning models. By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability to uncover diagnostic and prognostic insights and enable evidence-based personalized care. MINDS tracks granular end-to-end data provenance, ensuring reproducibility and transparency. The cloud-native architecture of MINDS can handle exponential data growth in a secure, cost-optimized manner while ensuring substantial storage optimization, replication avoidance, and dynamic access capabilities. Auto-scaling, access controls, and other mechanisms guarantee pipelines' scalability and security. MINDS overcomes the limitations of existing biomedical data silos via an interoperable metadata-driven approach that represents a pivotal step toward the future of oncology data integration.

摘要

数据采集、存储和处理技术的进步导致了异构医疗数据的快速增长。将放射学扫描、组织病理学图像和分子信息与临床数据相结合,对于全面了解疾病和优化治疗至关重要。在癌症等复杂疾病中,需要整合来自多个来源的数据,以实现精准医学和个性化治疗。这项工作提出了 Multimodal Integration of Oncology Data System(MINDS)-一个灵活、可扩展且具有成本效益的元数据框架,用于有效地将来自公共数据源(如癌症研究数据共享平台(CRDC)的异类数据融合到一个相互连接的、以患者为中心的框架中。MINDS 整合了来自多个存储库的超过 41000 个病例,相对于 37.8PB 的源数据大小,实现了高压缩比。它提供了低于 5 秒的查询响应时间,用于交互式探索。MINDS 提供了一个用于探索数据类型之间关系和构建队列的接口,用于开发大规模多模态机器学习模型。通过协调多模态数据,MINDS 旨在为研究人员提供更大的分析能力,以发现诊断和预后见解,并实现基于证据的个性化护理。MINDS 跟踪细粒度的端到端数据来源,确保可重复性和透明度。MINDS 的云原生架构可以安全、优化成本的方式处理指数级的数据增长,同时确保大量的存储优化、避免复制和动态访问功能。自动缩放、访问控制和其他机制保证了管道的可扩展性和安全性。MINDS 通过互操作的元数据驱动方法克服了现有生物医学数据孤岛的限制,这是肿瘤学数据集成未来的关键一步。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3344/10933897/25d16f6cdd79/sensors-24-01634-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验