Suppr超能文献

通过级联扩散模型从RNA测序数据生成肿瘤的合成全切片图像块

Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models.

作者信息

Carrillo-Perez Francisco, Pizurica Marija, Zheng Yuanning, Nandi Tarak Nath, Madduri Ravi, Shen Jeanne, Gevaert Olivier

机构信息

Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, School of Medicine, Stanford, CA, USA.

Internet technology and Data science Lab (IDLab), Ghent University, Ghent, Belgium.

出版信息

Nat Biomed Eng. 2025 Mar;9(3):320-332. doi: 10.1038/s41551-024-01193-8. Epub 2024 Mar 21.

Abstract

Training machine-learning models with synthetically generated data can alleviate the problem of data scarcity when acquiring diverse and sufficiently large datasets is costly and challenging. Here we show that cascaded diffusion models can be used to synthesize realistic whole-slide image tiles from latent representations of RNA-sequencing data from human tumours. Alterations in gene expression affected the composition of cell types in the generated synthetic image tiles, which accurately preserved the distribution of cell types and maintained the cell fraction observed in bulk RNA-sequencing data, as we show for lung adenocarcinoma, kidney renal papillary cell carcinoma, cervical squamous cell carcinoma, colon adenocarcinoma and glioblastoma. Machine-learning models pretrained with the generated synthetic data performed better than models trained from scratch. Synthetic data may accelerate the development of machine-learning models in scarce-data settings and allow for the imputation of missing data modalities.

摘要

在获取多样且足够大的数据集成本高昂且具有挑战性时,使用合成生成的数据训练机器学习模型可以缓解数据稀缺问题。在此,我们表明级联扩散模型可用于从人类肿瘤的RNA测序数据的潜在表示中合成逼真的全切片图像块。基因表达的改变影响了生成的合成图像块中的细胞类型组成,正如我们在肺腺癌、肾肾乳头状细胞癌、宫颈鳞状细胞癌、结肠腺癌和胶质母细胞瘤中所展示的那样,其准确保留了细胞类型的分布并维持了在批量RNA测序数据中观察到的细胞比例。用生成的合成数据进行预训练的机器学习模型比从头开始训练的模型表现更好。合成数据可能会加速稀缺数据环境中机器学习模型的开发,并允许对缺失的数据模态进行插补。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验