Wang Charles C N, Jin Jennifer, Chang Jan-Gowth, Hayakawa Masahiro, Kitazawa Atsushi, Tsai Jeffrey J P, Sheu Phillip C-Y
Department of Bioinformatics and Medical Engineering, Asia University, Taichung, Taiwan.
Center for Artificial Intelligence in Precision Medicine, UAsia University, Taichung, Taiwan.
BMC Med Inform Decis Mak. 2020 Sep 3;20(1):208. doi: 10.1186/s12911-020-01227-6.
Gastrointestinal (GI) cancer including colorectal cancer, gastric cancer, pancreatic cancer, etc., are among the most frequent malignancies diagnosed annually and represent a major public health problem worldwide.
This paper reports an aided curation pipeline to identify potential influential genes for gastrointestinal cancer. The curation pipeline integrates biomedical literature to identify named entities by Bi-LSTM-CNN-CRF methods. The entities and their associations can be used to construct a graph, and from which we can compute the sets of co-occurring genes that are the most influential based on an influence maximization algorithm.
The sets of co-occurring genes that are the most influential that we discover include RARA - CRBP1, CASP3 - BCL2, BCL2 - CASP3 - CRBP1, RARA - CASP3 - CRBP1, FOXJ1 - RASSF3 - ESR1, FOXJ1 - RASSF1A - ESR1, FOXJ1 - RASSF1A - TNFAIP8 - ESR1. With TCGA and functional and pathway enrichment analysis, we prove the proposed approach works well in the context of gastrointestinal cancer.
Our pipeline that uses text mining to identify objects and relationships to construct a graph and uses graph-based influence maximization to discover the most influential co-occurring genes presents a viable direction to assist knowledge discovery for clinical applications.
胃肠道癌,包括结直肠癌、胃癌、胰腺癌等,是每年诊断出的最常见恶性肿瘤之一,也是全球主要的公共卫生问题。
本文报告了一种辅助筛选流程,用于识别胃肠道癌的潜在影响基因。该筛选流程整合生物医学文献,通过双向长短期记忆网络-卷积神经网络-条件随机场(Bi-LSTM-CNN-CRF)方法识别命名实体。这些实体及其关联关系可用于构建一个图,基于影响最大化算法,我们可以从中计算出最具影响力的共现基因集。
我们发现的最具影响力的共现基因集包括视黄酸受体α(RARA)-细胞视黄醇结合蛋白1(CRBP1)、半胱天冬酶3(CASP3)-B细胞淋巴瘤2(BCL2)、BCL2-CASP3-CRBP1、RARA-CASP3-CRBP1、叉头框蛋白J1(FOXJ1)-RAS相关结构域家族成员3(RASSF3)-雌激素受体1(ESR1)、FOXJ1-RASSF1A-ESR1、FOXJ1-RASSF1A-肿瘤坏死因子α诱导蛋白8(TNFAIP8)-ESR1。通过癌症基因组图谱(TCGA)以及功能和通路富集分析证实,我们提出的方法在胃肠道癌背景下效果良好。
我们的流程利用文本挖掘来识别对象和关系以构建一个图,并使用基于图的影响最大化来发现最具影响力的共现基因,为临床应用的知识发现提供了一个可行的方向。