Department of Medical Biotechnology, School of Advanced Technologies, Shahrekord University of Medical Sciences, Shahrekord, Iran.
Laboratory of Systems Biology and Bioinformatics (LBB), University of Tehran, Tehran, Iran.
BMC Bioinformatics. 2023 Jul 4;24(1):275. doi: 10.1186/s12859-023-05400-2.
P4 medicine (predict, prevent, personalize, and participate) is a new approach to diagnosing and predicting diseases on a patient-by-patient basis. For the prevention and treatment of diseases, prediction plays a fundamental role. One of the intelligent strategies is the design of deep learning models that can predict the state of the disease using gene expression data.
We create an autoencoder deep learning model called DeeP4med, including a Classifier and a Transferor that predicts cancer's gene expression (mRNA) matrix from its matched normal sample and vice versa. The range of the F1 score of the model, depending on tissue type in the Classifier, is from 0.935 to 0.999 and in Transferor from 0.944 to 0.999. The accuracy of DeeP4med for tissue and disease classification was 0.986 and 0.992, respectively, which performed better compared to seven classic machine learning models (Support Vector Classifier, Logistic Regression, Linear Discriminant Analysis, Naive Bayes, Decision Tree, Random Forest, K Nearest Neighbors).
Based on the idea of DeeP4med, by having the gene expression matrix of a normal tissue, we can predict its tumor gene expression matrix and, in this way, find effective genes in transforming a normal tissue into a tumor tissue. Results of Differentially Expressed Genes (DEGs) and enrichment analysis on the predicted matrices for 13 types of cancer showed a good correlation with the literature and biological databases. This led that by using the gene expression matrix, to train the model with features of each person in a normal and cancer state, this model could predict diagnosis based on gene expression data from healthy tissue and be used to identify possible therapeutic interventions for those patients.
P4 医学(预测、预防、个体化和参与)是一种新的方法,可以对患者进行个体化的疾病诊断和预测。对于疾病的预防和治疗,预测起着根本性的作用。其中一种智能策略是设计深度学习模型,可以使用基因表达数据预测疾病的状态。
我们创建了一个称为 DeeP4med 的自动编码器深度学习模型,包括一个分类器和一个转换器,用于从匹配的正常样本预测癌症的基因表达(mRNA)矩阵,反之亦然。该模型的 F1 分数范围,取决于分类器中的组织类型,从 0.935 到 0.999,在转换器中从 0.944 到 0.999。DeeP4med 对组织和疾病分类的准确性分别为 0.986 和 0.992,与七种经典机器学习模型(支持向量分类器、逻辑回归、线性判别分析、朴素贝叶斯、决策树、随机森林、K 最近邻)相比表现更好。
基于 DeeP4med 的思想,通过获得正常组织的基因表达矩阵,我们可以预测其肿瘤基因表达矩阵,从而找到将正常组织转化为肿瘤组织的有效基因。对 13 种癌症的预测矩阵进行差异表达基因(DEGs)和富集分析的结果与文献和生物数据库有很好的相关性。这使得通过使用基因表达矩阵,用正常和癌症状态下每个人的特征来训练模型,该模型可以根据健康组织的基因表达数据预测诊断,并用于识别这些患者可能的治疗干预措施。