Department of Biological Sciences, Royal Holloway University of London, Egham, Surrey TW20 0EX, UK.
School of Life Sciences, University of Lincoln, Lincoln LN6 7TS, UK.
Genes (Basel). 2022 Aug 26;13(9):1538. doi: 10.3390/genes13091538.
Early detection of cancer facilitates treatment and improves patient survival. We hypothesized that molecular biomarkers of cancer could be rationally predicted based on even partial knowledge of transcriptional regulation, functional pathways and gene co-expression networks. To test our data mining approach, we focused on breast cancer, as one of the best-studied models of this disease. We were particularly interested to check whether such a 'guilt by association' approach would lead to pan-cancer markers generally known in the field or whether molecular subtype-specific 'seed' markers will yield subtype-specific extended sets of breast cancer markers. The key challenge of this investigation was to utilize a small number of well-characterized, largely intracellular, breast cancer-related proteins to uncover similarly regulated and functionally related genes and proteins with the view to predicting a much-expanded range of disease markers, especially that of extracellular molecular markers, potentially suitable for the early non-invasive detection of the disease. We selected 23 previously characterized proteins specific to three major molecular subtypes of breast cancer and analyzed their established transcription factor networks, their known metabolic and functional pathways and the existing experimentally derived protein co-expression data. Having started with largely intracellular and transmembrane marker 'seeds' we predicted the existence of as many as 150 novel biomarker genes to be associated with the selected three major molecular sub-types of breast cancer all coding for extracellularly targeted or secreted proteins and therefore being potentially most suitable for molecular diagnosis of the disease. Of the 150 such predicted protein markers, 114 were predicted to be linked through the combination of regulatory networks to basal breast cancer, 48 to luminal and 7 to Her2-positive breast cancer. The reported approach to mining molecular markers is not limited to breast cancer and therefore offers a widely applicable strategy of biomarker mining.
早期发现癌症有助于治疗并提高患者的生存率。我们假设,即使对转录调控、功能途径和基因共表达网络只有部分了解,也可以合理地预测癌症的分子生物标志物。为了测试我们的数据挖掘方法,我们专注于乳腺癌,因为它是这种疾病研究得最好的模型之一。我们特别感兴趣的是检查这种“关联定罪”的方法是否会导致该领域通常已知的泛癌标志物,或者分子亚型特异性的“种子”标志物是否会产生特定于分子亚型的乳腺癌标志物扩展集。这项研究的关键挑战是利用少数经过充分表征的、主要位于细胞内的乳腺癌相关蛋白来揭示具有相似调节作用和功能相关性的基因和蛋白,以期预测更多疾病标志物,特别是潜在适合疾病早期非侵入性检测的细胞外分子标志物。我们选择了 23 种先前已确定的与乳腺癌三种主要分子亚型相关的蛋白质,并分析了它们已建立的转录因子网络、已知的代谢和功能途径以及现有的实验衍生的蛋白质共表达数据。从主要位于细胞内和跨膜的标志物“种子”开始,我们预测了多达 150 种新的生物标志物基因与选定的三种主要乳腺癌分子亚型相关,这些基因均编码针对细胞外的靶向或分泌蛋白,因此最有可能用于该疾病的分子诊断。在这 150 种预测的蛋白质标志物中,有 114 种被预测通过调控网络与基底型乳腺癌相关,48 种与腔细胞型乳腺癌相关,7 种与 Her2 阳性乳腺癌相关。所报道的挖掘分子标志物的方法不仅限于乳腺癌,因此提供了一种广泛适用的生物标志物挖掘策略。