Khodaee Farhan, Zandie Rohola, Edelman Elazer R
Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA.
Department of Medicine (Cardiovascular Medicine), Brigham and Women's Hospital, Boston, MA, USA.
Nat Comput Sci. 2025 Apr;5(4):333-344. doi: 10.1038/s43588-024-00765-7. Epub 2025 Jan 28.
How complex phenotypes emerge from intricate gene expression patterns is a fundamental question in biology. Integrating high-content genotyping approaches such as single-cell RNA sequencing and advanced learning methods such as language models offers an opportunity for dissecting this complex relationship. Here we present a computational integrated genetics framework designed to analyze and interpret the high-dimensional landscape of genotypes and their associated phenotypes simultaneously. We applied this approach to develop a multimodal foundation model to explore the genotype-phenotype relationship manifold for human transcriptomics at the cellular level. Analyzing this joint manifold showed a refined resolution of cellular heterogeneity, uncovered potential cross-tissue biomarkers and provided contextualized embeddings to investigate the polyfunctionality of genes shown for the von Willebrand factor (VWF) gene in endothelial cells. Overall, this study advances our understanding of the dynamic interplay between gene expression and phenotypic manifestation and demonstrates the potential of integrated genetics in uncovering new dimensions of cellular function and complexity.
复杂的表型如何从错综复杂的基因表达模式中产生,这是生物学中的一个基本问题。整合诸如单细胞RNA测序等高通量基因分型方法和诸如语言模型等先进的学习方法,为剖析这种复杂关系提供了契机。在此,我们提出了一个计算综合遗传学框架,旨在同时分析和解释基因型及其相关表型的高维格局。我们应用此方法开发了一个多模态基础模型,以在细胞水平上探索人类转录组学的基因型-表型关系流形。对这个联合流形的分析显示了细胞异质性的精细分辨率,发现了潜在的跨组织生物标志物,并提供了情境化嵌入,以研究在内皮细胞中显示的血管性血友病因子(VWF)基因的基因多功能性。总体而言,本研究推进了我们对基因表达与表型表现之间动态相互作用的理解,并证明了综合遗传学在揭示细胞功能和复杂性新维度方面的潜力。