Suppr超能文献

基于 23 个人类组织类型的特征化组织形态多样性和空间 RNA 表达预测的自监督学习。

Self-supervised learning for characterising histomorphological diversity and spatial RNA expression prediction across 23 human tissue types.

机构信息

Human Technopole, Viale Rita Levi-Montalcini 1, 20157, Milan, Italy.

Research Department of Pathology, University College London, London, UK.

出版信息

Nat Commun. 2024 Jul 13;15(1):5906. doi: 10.1038/s41467-024-50317-w.

Abstract

As vast histological archives are digitised, there is a pressing need to be able to associate specific tissue substructures and incident pathology to disease outcomes without arduous annotation. Here, we learn self-supervised representations using a Vision Transformer, trained on 1.7 M histology images across 23 healthy tissues in 838 donors from the Genotype Tissue Expression consortium (GTEx). Using these representations, we can automatically segment tissues into their constituent tissue substructures and pathology proportions across thousands of whole slide images, outperforming other self-supervised methods (43% increase in silhouette score). Additionally, we can detect and quantify histological pathologies present, such as arterial calcification (AUROC = 0.93) and identify missing calcification diagnoses. Finally, to link gene expression to tissue morphology, we introduce RNAPath, a set of models trained on 23 tissue types that can predict and spatially localise individual RNA expression levels directly from H&E histology (mean genes significantly regressed = 5156, FDR 1%). We validate RNAPath spatial predictions with matched ground truth immunohistochemistry for several well characterised control genes, recapitulating their known spatial specificity. Together, these results demonstrate how self-supervised machine learning when applied to vast histological archives allows researchers to answer questions about tissue pathology, its spatial organisation and the interplay between morphological tissue variability and gene expression.

摘要

随着大量组织学档案的数字化,人们迫切需要能够将特定的组织亚结构和偶然发生的病理学与疾病结果联系起来,而无需繁琐的注释。在这里,我们使用 Vision Transformer 学习了自监督表示,该模型在基因型组织表达联盟 (GTEx) 的 838 名供体的 23 种健康组织中,经过 170 万张组织学图像的训练。使用这些表示,我们可以自动将组织分割成其组成的组织亚结构和数千张全幻灯片图像中的病理学比例,优于其他自监督方法(轮廓得分提高 43%)。此外,我们还可以检测和量化存在的组织病理学,如动脉钙化(AUROC=0.93)和识别缺失的钙化诊断。最后,为了将基因表达与组织形态联系起来,我们引入了 RNAPath,这是一组在 23 种组织类型上训练的模型,可以直接从 H&E 组织学预测和空间定位单个 RNA 表达水平(平均显著回归基因=5156,FDR 1%)。我们使用几种经过充分特征描述的对照基因的匹配免疫组织化学来验证 RNAPath 的空间预测,重现了它们已知的空间特异性。总之,这些结果表明,当应用于大量组织学档案时,自监督机器学习如何使研究人员能够回答有关组织病理学、其空间组织以及形态组织变异性与基因表达之间相互作用的问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c314/11246527/66a36f069974/41467_2024_50317_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验