Zhen Cheng, Zhu Caizhong, Chen Haoyang, Xiong Yiru, Tan Junyuan, Chen Dong, Li Jin
Beijing 302 Hospital, Beijing, 100039, China.
Oncotarget. 2017 Feb 21;8(8):13909-13916. doi: 10.18632/oncotarget.14692.
To systematically explore the molecular mechanism for hepatocellular carcinoma (HCC) metastasis and identify regulatory genes with text mining methods.
Genes with highest frequencies and significant pathways related to HCC metastasis were listed. A handful of proteins such as EGFR, MDM2, TP53 and APP, were identified as hub nodes in PPI (protein-protein interaction) network. Compared with unique genes for HBV-HCCs, genes particular to HCV-HCCs were less, but may participate in more extensive signaling processes. VEGFA, PI3KCA, MAPK1, MMP9 and other genes may play important roles in multiple phenotypes of metastasis.
Genes in abstracts of HCC-metastasis literatures were identified. Word frequency analysis, KEGG pathway and PPI network analysis were performed. Then co-occurrence analysis between genes and metastasis-related phenotypes were carried out.
Text mining is effective for revealing potential regulators or pathways, but the purpose of it should be specific, and the combination of various methods will be more useful.
运用文本挖掘方法系统探究肝细胞癌(HCC)转移的分子机制并鉴定调控基因。
列出了与HCC转移相关的高频基因及重要通路。在蛋白质-蛋白质相互作用(PPI)网络中,表皮生长因子受体(EGFR)、小鼠双微体2(MDM2)、肿瘤蛋白p53(TP53)和淀粉样前体蛋白(APP)等一些蛋白质被确定为枢纽节点。与乙肝相关肝细胞癌(HBV-HCCs)的独特基因相比,丙肝相关肝细胞癌(HCV-HCCs)特有的基因较少,但可能参与更广泛的信号传导过程。血管内皮生长因子A(VEGFA)、磷脂酰肌醇-3激酶催化亚基α(PI3KCA)、丝裂原活化蛋白激酶1(MAPK1)、基质金属蛋白酶9(MMP9)等基因可能在转移的多种表型中发挥重要作用。
识别HCC转移文献摘要中的基因。进行词频分析、京都基因与基因组百科全书(KEGG)通路分析和PPI网络分析。然后开展基因与转移相关表型之间的共现分析。
文本挖掘对于揭示潜在的调控因子或通路是有效的,但目的应明确,多种方法结合会更有用。