The Cancer Genomics and BioComputing of Complex Diseases lab, Azrieli Faculty of Medicine, Bar-Ilan University, Safed, Israel.
Cellular Network Biology Group, Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark.
PLoS Comput Biol. 2019 Aug 22;15(8):e1007239. doi: 10.1371/journal.pcbi.1007239. eCollection 2019 Aug.
Tailored therapy aims to cure cancer patients effectively and safely, based on the complex interactions between patients' genomic features, disease pathology and drug metabolism. Thus, the continual increase in scientific literature drives the need for efficient methods of data mining to improve the extraction of useful information from texts based on patients' genomic features. An important application of text mining to tailored therapy in cancer encompasses the use of mutations and cancer fusion genes as moieties that change patients' cellular networks to develop cancer, and also affect drug metabolism. Fusion proteins, which are derived from the slippage of two parental genes, are produced in cancer by chromosomal aberrations and trans-splicing. Given that the two parental proteins for predicted fusion proteins are known, we used our previously developed method for identifying chimeric protein-protein interactions (ChiPPIs) associated with the fusion proteins. Here, we present a validation approach that receives fusion proteins of interest, predicts their cellular network alterations by ChiPPI and validates them by our new method, ProtFus, using an online literature search. This process resulted in a set of 358 fusion proteins and their corresponding protein interactions, as a training set for a Naïve Bayes classifier, to identify predicted fusion proteins that have reliable evidence in the literature and that were confirmed experimentally. Next, for a test group of 1817 fusion proteins, we were able to identify from the literature 2908 PPIs in total, across 18 cancer types. The described method, ProtFus, can be used for screening the literature to identify unique cases of fusion proteins and their PPIs, as means of studying alterations of protein networks in cancers. Availability: http://protfus.md.biu.ac.il/.
精准医疗旨在根据患者的基因组特征、疾病病理和药物代谢之间的复杂相互作用,有效地、安全地治疗癌症患者。因此,科学文献的不断增加推动了对数据挖掘方法的有效需求,以提高从基于患者基因组特征的文本中提取有用信息的能力。文本挖掘在癌症精准治疗中的一个重要应用包括使用突变和癌症融合基因作为改变患者细胞网络以发展癌症并影响药物代谢的部分。融合蛋白是由两个亲本基因的滑移产生的,在癌症中是由染色体异常和转剪接产生的。鉴于预测融合蛋白的两个亲本蛋白是已知的,我们使用了我们之前开发的用于识别与融合蛋白相关的嵌合蛋白-蛋白相互作用(ChiPPIs)的方法。在这里,我们提出了一种验证方法,该方法接收感兴趣的融合蛋白,通过 ChiPPI 预测其细胞网络改变,并使用我们的新方法 ProtFus 通过在线文献搜索进行验证。该过程产生了一组 358 个融合蛋白及其相应的蛋白质相互作用,作为 Naive Bayes 分类器的训练集,以识别文献中有可靠证据且经过实验证实的预测融合蛋白。接下来,对于 1817 个融合蛋白的测试组,我们总共能够从文献中鉴定出 18 种癌症类型中的 2908 个蛋白质相互作用。描述的方法 ProtFus 可用于筛选文献以识别融合蛋白及其蛋白质相互作用的独特案例,作为研究癌症中蛋白质网络改变的手段。可利用性:http://protfus.md.biu.ac.il/。