利用生物医学文献挖掘和基于图的影响力最大化方法识别对胃肠道癌最具影响力的共现基因集

Identification of most influential co-occurring gene suites for gastrointestinal cancer using biomedical literature mining and graph-based influence maximization.

作者信息

Wang Charles C N, Jin Jennifer, Chang Jan-Gowth, Hayakawa Masahiro, Kitazawa Atsushi, Tsai Jeffrey J P, Sheu Phillip C-Y

机构信息

Department of Bioinformatics and Medical Engineering, Asia University, Taichung, Taiwan.

Center for Artificial Intelligence in Precision Medicine, UAsia University, Taichung, Taiwan.

出版信息

BMC Med Inform Decis Mak. 2020 Sep 3;20(1):208. doi: 10.1186/s12911-020-01227-6.

BACKGROUND

Gastrointestinal (GI) cancer including colorectal cancer, gastric cancer, pancreatic cancer, etc., are among the most frequent malignancies diagnosed annually and represent a major public health problem worldwide.

METHODS

This paper reports an aided curation pipeline to identify potential influential genes for gastrointestinal cancer. The curation pipeline integrates biomedical literature to identify named entities by Bi-LSTM-CNN-CRF methods. The entities and their associations can be used to construct a graph, and from which we can compute the sets of co-occurring genes that are the most influential based on an influence maximization algorithm.

RESULTS

The sets of co-occurring genes that are the most influential that we discover include RARA - CRBP1, CASP3 - BCL2, BCL2 - CASP3 - CRBP1, RARA - CASP3 - CRBP1, FOXJ1 - RASSF3 - ESR1, FOXJ1 - RASSF1A - ESR1, FOXJ1 - RASSF1A - TNFAIP8 - ESR1. With TCGA and functional and pathway enrichment analysis, we prove the proposed approach works well in the context of gastrointestinal cancer.

CONCLUSIONS

Our pipeline that uses text mining to identify objects and relationships to construct a graph and uses graph-based influence maximization to discover the most influential co-occurring genes presents a viable direction to assist knowledge discovery for clinical applications.

背景

胃肠道癌，包括结直肠癌、胃癌、胰腺癌等，是每年诊断出的最常见恶性肿瘤之一，也是全球主要的公共卫生问题。

方法

本文报告了一种辅助筛选流程，用于识别胃肠道癌的潜在影响基因。该筛选流程整合生物医学文献，通过双向长短期记忆网络-卷积神经网络-条件随机场（Bi-LSTM-CNN-CRF）方法识别命名实体。这些实体及其关联关系可用于构建一个图，基于影响最大化算法，我们可以从中计算出最具影响力的共现基因集。

结果

我们发现的最具影响力的共现基因集包括视黄酸受体α（RARA）-细胞视黄醇结合蛋白1（CRBP1）、半胱天冬酶3（CASP3）-B细胞淋巴瘤2（BCL2）、BCL2-CASP3-CRBP1、RARA-CASP3-CRBP1、叉头框蛋白J1（FOXJ1）-RAS相关结构域家族成员3（RASSF3）-雌激素受体1（ESR1）、FOXJ1-RASSF1A-ESR1、FOXJ1-RASSF1A-肿瘤坏死因子α诱导蛋白8（TNFAIP8）-ESR1。通过癌症基因组图谱（TCGA）以及功能和通路富集分析证实，我们提出的方法在胃肠道癌背景下效果良好。

结论

我们的流程利用文本挖掘来识别对象和关系以构建一个图，并使用基于图的影响最大化来发现最具影响力的共现基因，为临床应用的知识发现提供了一个可行的方向。

Identification of most influential co-occurring gene suites for gastrointestinal cancer using biomedical literature mining and graph-based influence maximization.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献