Wang Deling, Li Jia-Rui, Zhang Yu-Hang, Chen Lei, Huang Tao, Cai Yu-Dong
Institute of Health Sciences, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China.
Department of Medical Imaging, Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China; Collaborative Innovation Center for Cancer Medicine, Guangzhou 510060, China.
Genes (Basel). 2018 Mar 12;9(3):155. doi: 10.3390/genes9030155.
Breast cancer is one of the most common malignancies in women. Patient-derived tumor xenograft (PDX) model is a cutting-edge approach for drug research on breast cancer. However, PDX still exhibits differences from original human tumors, thereby challenging the molecular understanding of tumorigenesis. In particular, gene expression changes after tissues are transplanted from human to mouse model. In this study, we propose a novel computational method by incorporating several machine learning algorithms, including Monte Carlo feature selection (MCFS), random forest (RF), and rough set-based rule learning, to identify genes with significant expression differences between PDX and original human tumors. First, 831 breast tumors, including 657 PDX and 174 human tumors, were collected. Based on MCFS and RF, 32 genes were then identified to be informative for the prediction of PDX and human tumors and can be used to construct a prediction model. The prediction model exhibits a Matthews coefficient correlation value of 0.777. Seven interpretable interactions within the informative gene were detected based on the rough set-based rule learning. Furthermore, the seven interpretable interactions can be well supported by previous experimental studies. Our study not only presents a method for identifying informative genes with differential expression but also provides insights into the mechanism through which gene expression changes after being transplanted from human tumor into mouse model. This work would be helpful for research and drug development for breast cancer.
乳腺癌是女性中最常见的恶性肿瘤之一。患者来源的肿瘤异种移植(PDX)模型是乳腺癌药物研究的前沿方法。然而,PDX与原始人类肿瘤仍存在差异,这对肿瘤发生的分子理解提出了挑战。特别是,组织从人类移植到小鼠模型后基因表达会发生变化。在本研究中,我们提出了一种新的计算方法,通过结合多种机器学习算法,包括蒙特卡罗特征选择(MCFS)、随机森林(RF)和基于粗糙集的规则学习,来识别PDX与原始人类肿瘤之间具有显著表达差异的基因。首先,收集了831个乳腺肿瘤,包括657个PDX和174个人类肿瘤。基于MCFS和RF,随后鉴定出32个基因对预测PDX和人类肿瘤具有信息价值,可用于构建预测模型。该预测模型的马修斯系数相关值为0.777。基于粗糙集的规则学习检测到信息基因内的七个可解释相互作用。此外,这七个可解释相互作用得到了先前实验研究的有力支持。我们的研究不仅提出了一种识别差异表达信息基因的方法,还为从人类肿瘤移植到小鼠模型后基因表达变化的机制提供了见解。这项工作将有助于乳腺癌的研究和药物开发。