PDAC-ANN:一种基于基因表达预测胰腺导管腺癌的人工神经网络。
PDAC-ANN: an artificial neural network to predict pancreatic ductal adenocarcinoma based on gene expression.
机构信息
Núcleo de Biointegração, Instituto Multidisciplinar em Saúde, Universidade Federal da Bahia, Vitória da Conquista, Brazil.
Faculdade Santo Agostinho, Vitória da Conquista, Brazil.
出版信息
BMC Cancer. 2020 Jan 31;20(1):82. doi: 10.1186/s12885-020-6533-0.
BACKGROUND
Although the pancreatic ductal adenocarcinoma (PDAC) presents high mortality and metastatic potential, there is a lack of effective therapies and a low survival rate for this disease. This PDAC scenario urges new strategies for diagnosis, drug targets, and treatment.
METHODS
We performed a gene expression microarray meta-analysis of the tumor against normal tissues in order to identify differentially expressed genes (DEG) shared among all datasets, named core-genes (CG). We confirmed the CG protein expression in pancreatic tissue through The Human Protein Atlas. It was selected five genes with the highest area under the curve (AUC) among these proteins with expression confirmed in the tumor group to train an artificial neural network (ANN) to classify samples.
RESULTS
This microarray included 461 tumor and 187 normal samples. We identified a CG composed of 40 genes, 39 upregulated, and one downregulated. The upregulated CG included proteins and extracellular matrix receptors linked to actin cytoskeleton reorganization. With the Human Protein Atlas, we verified that fourteen genes of the CG are translated, with high or medium expression in most of the pancreatic tumor samples. To train our ANN, we selected the best genes (AHNAK2, KRT19, LAMB3, LAMC2, and S100P) to classify the samples based on AUC using mRNA expression. The network classified tumor samples with an f1-score of 0.83 for the normal samples and 0.88 for the PDAC samples, with an average of 0.86. The PDAC-ANN could classify the test samples with a sensitivity of 87.6 and specificity of 83.1.
CONCLUSION
The gene expression meta-analysis and confirmation of the protein expression allow us to select five genes highly expressed PDAC samples. We could build a python script to classify the samples based on RNA expression. This software can be useful in the PDAC diagnosis.
背景
尽管胰腺导管腺癌 (PDAC) 死亡率高且转移潜能强,但针对这种疾病仍缺乏有效的治疗方法和较高的生存率。这种 PDAC 状况迫切需要新的诊断策略、药物靶点和治疗方法。
方法
我们对肿瘤与正常组织进行了基因表达微阵列荟萃分析,以确定所有数据集之间共享的差异表达基因 (DEG),命名为核心基因 (CG)。我们通过人类蛋白质图谱确认了胰腺组织中的 CG 蛋白表达。从这些在肿瘤组中表达得到确认的蛋白质中选择了五个表达曲线下面积 (AUC) 最高的基因,用于训练人工神经网络 (ANN) 来对样本进行分类。
结果
该微阵列包含 461 个肿瘤样本和 187 个正常样本。我们鉴定了一个由 40 个基因组成的 CG,其中 39 个上调,1 个下调。上调的 CG 包括与肌动蛋白细胞骨架重排相关的蛋白质和细胞外基质受体。通过人类蛋白质图谱,我们验证了 CG 的 14 个基因是翻译的,大多数胰腺肿瘤样本中都有高或中表达。为了训练我们的 ANN,我们选择了最佳基因 (AHNAK2、KRT19、LAMB3、LAMC2 和 S100P),根据 AUC 使用 mRNA 表达来对样本进行分类。该网络对正常样本的 f1 分数为 0.83,对 PDAC 样本的 f1 分数为 0.88,平均为 0.86。PDAC-ANN 可以对测试样本进行分类,其敏感性为 87.6%,特异性为 83.1%。
结论
基因表达荟萃分析和蛋白质表达的验证使我们能够选择在 PDAC 样本中高度表达的五个基因。我们可以构建一个基于 RNA 表达的样本分类的 Python 脚本。这个软件在 PDAC 诊断中可能很有用。