Zhang Yingying, Leung Alden K, Kang Jin Joo, Sun Yu, Wu Guanxi, Li Le, Sun Jiayang, Cheng Lily, Qiu Tian, Zhang Junke, Wierbowski Shayne, Gupta Shagun, Booth James, Yu Haiyuan
Department of Computational Biology, Cornell University; Ithaca, 14853, USA.
Weill Institute for Cell and Molecular Biology, Cornell University; Ithaca, 14853, USA.
bioRxiv. 2024 Aug 8:2023.03.06.531441. doi: 10.1101/2023.03.06.531441.
A major goal of cancer biology is to understand the mechanisms underlying tumorigenesis driven by somatically acquired mutations. Two distinct types of computational methodologies have emerged: one focuses on analyzing clustering of mutations within protein sequences and 3D structures, while the other characterizes mutations by leveraging the topology of protein-protein interaction network. Their insights are largely non-overlapping, offering complementary strengths. Here, we established a unified, end-to-end 3D structurally-informed protein interaction network propagation framework, NetFlow3D, that systematically maps the multiscale mechanistic effects of somatic mutations in cancer. The establishment of NetFlow3D hinges upon the Human Protein Structurome, a comprehensive repository we compiled that incorporates the 3D structures of every single protein as well as the binding interfaces of all known protein interactions in humans. NetFlow3D leverages the Structurome to integrate information across atomic, residue, protein and network levels: It conducts 3D clustering of mutations across atomic and residue levels on protein structures to identify potential driver mutations. It then anisotropically propagates their impacts across the protein interaction network, with propagation guided by the specific 3D structural interfaces involved, to identify significantly interconnected network "modules", thereby uncovering key biological processes underlying disease etiology. Applied to 1,038,899 somatic protein-altering mutations in 9,946 TCGA tumors across 33 cancer types, NetFlow3D identified 1,4444 significant 3D clusters throughout the Human Protein Structurome, of which ~55% would not have been found if using only experimentally-determined structures. It then identified 26 significantly interconnected modules that encompass ~8-fold more proteins than applying standard network analyses. NetFlow3D and our pan-cancer results can be accessed from http://netflow3d.yulab.org.
癌症生物学的一个主要目标是了解由体细胞获得性突变驱动的肿瘤发生机制。已经出现了两种不同类型的计算方法:一种侧重于分析蛋白质序列和三维结构内突变的聚类,另一种则通过利用蛋白质-蛋白质相互作用网络的拓扑结构来表征突变。它们的见解在很大程度上互不重叠,各有互补优势。在这里,我们建立了一个统一的、端到端的三维结构信息蛋白质相互作用网络传播框架NetFlow3D,该框架系统地描绘了癌症中体细胞突变的多尺度机制效应。NetFlow3D的建立依赖于人类蛋白质结构组,这是我们编制的一个综合数据库,其中包含了人类每一种蛋白质的三维结构以及所有已知蛋白质相互作用的结合界面。NetFlow3D利用结构组整合原子、残基、蛋白质和网络水平的信息:它在蛋白质结构上对原子和残基水平的突变进行三维聚类,以识别潜在的驱动突变。然后,它在蛋白质相互作用网络中各向异性地传播这些突变的影响,传播由所涉及的特定三维结构界面引导,以识别显著相互连接的网络“模块”,从而揭示疾病病因背后的关键生物学过程。将NetFlow3D应用于33种癌症类型的9946个TCGA肿瘤中的1038899个体细胞蛋白质改变突变,在整个人类蛋白质结构组中识别出14444个显著的三维聚类,如果仅使用实验确定的结构,其中约55%是无法发现的。然后,它识别出26个显著相互连接的模块,这些模块包含的蛋白质比应用标准网络分析多约8倍。可从http://netflow3d.yulab.org访问NetFlow3D和我们的泛癌研究结果。