Shadman Hossain, Gomrok Saghar, Litle Christopher, Cheng Qianyi, Jiang Yu, Huang Xiaohua, Ziebarth Jesse D, Wang Yongmei
Department of Chemistry, The University of Memphis, Memphis, TN, 38152, USA.
School of Public Health, The University of Memphis, Memphis, TN, 38152, USA.
Sci Rep. 2025 Feb 12;15(1):5270. doi: 10.1038/s41598-025-89497-w.
Integrins, a family of transmembrane receptor proteins, are well known to play important roles in cancer development and metastasis. However, a comprehensive understanding of these roles has not been achieved due to the complex relationships between specific integrins, cancer types, and the stages of cancer progression. Publicly accessible repositories from the Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA) projects provide rich datasets for exploring these relationships using machine learning (ML). In this study, integrin RNA-Seq expression data of ~ 8 healthy tissues in GTEx and corresponding tumors in TCGA were selected. Integrin expression was used to train ML models to distinguish between different healthy tissues, solid tumors, as well as normal and tumor samples from the same tissue type. These ML models can classify samples by tissue origin or disease status with high accuracy, and the integrins essential to these classifiers were identified. In some cases, the expression of only one or two integrins was needed to classify tissue type, tumor type or disease status with accuracy > 0.9. For example, expression of ITGA7 alone can distinguish healthy and cancerous breast tissue. Additionally, integrin co-expression networks in healthy and cancerous breast tissues were compared and were found to change significantly from healthy to cancer, indicating changes in functional involvement of integrins due to cancer. Integrin expression in metastatic tumors were further examined using data from the AURORA project for Metastatic Breast Cancer (MBC), and several integrins such as ITGAD, ITGA4, ITGAL, and ITGA11 were found to have significantly lower expression in metastases than in primary tumors.
整合素是一类跨膜受体蛋白家族,众所周知,它们在癌症发展和转移中发挥着重要作用。然而,由于特定整合素、癌症类型和癌症进展阶段之间的复杂关系,尚未实现对这些作用的全面理解。来自基因型-组织表达(GTEx)和癌症基因组图谱(TCGA)项目的公开可用存储库提供了丰富的数据集,可用于使用机器学习(ML)探索这些关系。在本研究中,选择了GTEx中约8种健康组织以及TCGA中相应肿瘤的整合素RNA测序表达数据。整合素表达用于训练ML模型,以区分不同的健康组织、实体瘤以及来自同一组织类型的正常和肿瘤样本。这些ML模型可以高精度地按组织来源或疾病状态对样本进行分类,并确定了这些分类器所必需的整合素。在某些情况下,仅需要一两种整合素的表达就能以>0.9的准确率对组织类型、肿瘤类型或疾病状态进行分类。例如,仅ITGA7的表达就能区分健康和癌性乳腺组织。此外,还比较了健康和癌性乳腺组织中的整合素共表达网络,发现从健康状态到癌症状态有显著变化,这表明癌症导致整合素的功能参与发生了变化。使用转移性乳腺癌(MBC)的AURORA项目的数据进一步检查了转移性肿瘤中的整合素表达,发现几种整合素,如ITGAD、ITGA4、ITGAL和ITGA11,在转移灶中的表达明显低于原发性肿瘤。