Suppr超能文献

通过级联扩散模型从RNA测序数据生成肿瘤的合成全切片图像块

Generation of synthetic whole-slide image tiles of tumours from RNA-sequencing data via cascaded diffusion models.

作者信息

Carrillo-Perez Francisco, Pizurica Marija, Zheng Yuanning, Nandi Tarak Nath, Madduri Ravi, Shen Jeanne, Gevaert Olivier

机构信息

Stanford Center for Biomedical Informatics Research (BMIR), Stanford University, School of Medicine, Stanford, CA, USA.

Internet technology and Data science Lab (IDLab), Ghent University, Ghent, Belgium.

出版信息

Nat Biomed Eng. 2025 Mar;9(3):320-332. doi: 10.1038/s41551-024-01193-8. Epub 2024 Mar 21.

Abstract

Training machine-learning models with synthetically generated data can alleviate the problem of data scarcity when acquiring diverse and sufficiently large datasets is costly and challenging. Here we show that cascaded diffusion models can be used to synthesize realistic whole-slide image tiles from latent representations of RNA-sequencing data from human tumours. Alterations in gene expression affected the composition of cell types in the generated synthetic image tiles, which accurately preserved the distribution of cell types and maintained the cell fraction observed in bulk RNA-sequencing data, as we show for lung adenocarcinoma, kidney renal papillary cell carcinoma, cervical squamous cell carcinoma, colon adenocarcinoma and glioblastoma. Machine-learning models pretrained with the generated synthetic data performed better than models trained from scratch. Synthetic data may accelerate the development of machine-learning models in scarce-data settings and allow for the imputation of missing data modalities.

摘要

在获取多样且足够大的数据集成本高昂且具有挑战性时,使用合成生成的数据训练机器学习模型可以缓解数据稀缺问题。在此,我们表明级联扩散模型可用于从人类肿瘤的RNA测序数据的潜在表示中合成逼真的全切片图像块。基因表达的改变影响了生成的合成图像块中的细胞类型组成,正如我们在肺腺癌、肾肾乳头状细胞癌、宫颈鳞状细胞癌、结肠腺癌和胶质母细胞瘤中所展示的那样,其准确保留了细胞类型的分布并维持了在批量RNA测序数据中观察到的细胞比例。用生成的合成数据进行预训练的机器学习模型比从头开始训练的模型表现更好。合成数据可能会加速稀缺数据环境中机器学习模型的开发,并允许对缺失的数据模态进行插补。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验