Xu David, Jalal Shadia I, Sledge George W, Meroueh Samy O
Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, USA.
Mol Biosyst. 2016 Oct 20;12(10):3067-87. doi: 10.1039/c6mb00231e. Epub 2016 Jul 25.
The Cancer Genome Atlas (TCGA) offers an unprecedented opportunity to identify small-molecule binding sites on proteins with overexpressed mRNA levels that correlate with poor survival. Here, we analyze RNA-seq and clinical data for 10 tumor types to identify genes that are both overexpressed and correlate with patient survival. Protein products of these genes were scanned for binding sites that possess shape and physicochemical properties that can accommodate small-molecule probes or therapeutic agents (druggable). These binding sites were classified as enzyme active sites (ENZ), protein-protein interaction sites (PPI), or other sites whose function is unknown (OTH). Interestingly, the overwhelming majority of binding sites were classified as OTH. We find that ENZ, PPI, and OTH binding sites often occurred on the same structure suggesting that many of these OTH cavities can be used for allosteric modulation of enzyme activity or protein-protein interactions with small molecules. We discovered several ENZ (PYCR1, QPRT, and HSPA6) and PPI (CASC5, ZBTB32, and CSAD) binding sites on proteins that have been seldom explored in cancer. We also found proteins that have been extensively studied in cancer that have not been previously explored with small molecules that harbor ENZ (PKMYT1, STEAP3, and NNMT) and PPI (HNF4A, MEF2B, and CBX2) binding sites. All binding sites were classified by the signaling pathways to which the protein that harbors them belongs using KEGG. In addition, binding sites were mapped onto structural protein-protein interaction networks to identify promising sites for drug discovery. Finally, we identify pockets that harbor missense mutations previously identified from analysis of TCGA data. The occurrence of mutations in these binding sites provides new opportunities to develop small-molecule probes to explore their function in cancer.
癌症基因组图谱(TCGA)提供了一个前所未有的机会,可用于识别mRNA水平过表达且与生存不良相关的蛋白质上的小分子结合位点。在此,我们分析了10种肿瘤类型的RNA测序和临床数据,以识别那些既过表达又与患者生存相关的基因。对这些基因的蛋白质产物进行扫描,寻找具有能够容纳小分子探针或治疗药物(可成药)的形状和物理化学性质的结合位点。这些结合位点被分类为酶活性位点(ENZ)、蛋白质-蛋白质相互作用位点(PPI)或功能未知的其他位点(OTH)。有趣的是,绝大多数结合位点被分类为OTH。我们发现ENZ、PPI和OTH结合位点常常出现在同一结构上,这表明许多这些OTH腔可用于酶活性的变构调节或小分子介导的蛋白质-蛋白质相互作用。我们在癌症研究中很少探索的蛋白质上发现了几个ENZ(PYCR1、QPRT和HSPA6)和PPI(CASC5、ZBTB32和CSAD)结合位点。我们还发现了在癌症研究中已被广泛研究但此前未用小分子探索过的蛋白质,这些蛋白质具有ENZ(PKMYT1、STEAP3和NNMT)和PPI(HNF4A、MEF2B和CBX2)结合位点。使用KEGG将所有结合位点按照含有它们的蛋白质所属的信号通路进行分类。此外,将结合位点映射到蛋白质-蛋白质相互作用结构网络上,以识别有前景的药物发现位点。最后,我们确定了含有先前从TCGA数据分析中识别出的错义突变的口袋。这些结合位点中突变的出现为开发小分子探针以探索它们在癌症中的功能提供了新机会。