Genes and Environment Laboratory, School of Public Health, University of California, Berkeley, CA 94720, USA.
Int J Environ Res Public Health. 2012 Jul;9(7):2479-503. doi: 10.3390/ijerph9072479. Epub 2012 Jul 12.
We have applied bioinformatic approaches to identify pathways common to chemical leukemogens and to determine whether leukemogens could be distinguished from non-leukemogenic carcinogens. From all known and probable carcinogens classified by IARC and NTP, we identified 35 carcinogens that were associated with leukemia risk in human studies and 16 non-leukemogenic carcinogens. Using data on gene/protein targets available in the Comparative Toxicogenomics Database (CTD) for 29 of the leukemogens and 11 of the non-leukemogenic carcinogens, we analyzed for enrichment of all 250 human biochemical pathways in the Kyoto Encyclopedia of Genes and Genomes (KEGG) database. The top pathways targeted by the leukemogens included metabolism of xenobiotics by cytochrome P450, glutathione metabolism, neurotrophin signaling pathway, apoptosis, MAPK signaling, Toll-like receptor signaling and various cancer pathways. The 29 leukemogens formed 18 distinct clusters comprising 1 to 3 chemicals that did not correlate with known mechanism of action or with structural similarity as determined by 2D Tanimoto coefficients in the PubChem database. Unsupervised clustering and one-class support vector machines, based on the pathway data, were unable to distinguish the 29 leukemogens from 11 non-leukemogenic known and probable IARC carcinogens. However, using two-class random forests to estimate leukemogen and non-leukemogen patterns, we estimated a 76% chance of distinguishing a random leukemogen/non-leukemogen pair from each other.
我们应用生物信息学方法来确定化学性白血病原和非白血病致癌原共同的途径,并确定白血病原是否可以与非白血病致癌原区分开来。从 IARC 和 NTP 分类的所有已知和可能的致癌物中,我们确定了 35 种与人类研究中白血病风险相关的致癌物和 16 种非白血病致癌原。利用比较毒理学基因组数据库(CTD)中 29 种白血病原和 11 种非白血病致癌原的基因/蛋白质靶标数据,我们分析了京都基因与基因组百科全书(KEGG)数据库中所有 250 个人类生化途径的富集情况。白血病原靶向的主要途径包括细胞色素 P450 介导的外源物代谢、谷胱甘肽代谢、神经营养素信号通路、细胞凋亡、MAPK 信号通路、Toll 样受体信号通路和各种癌症途径。这 29 种白血病原形成了 18 个不同的簇,包含 1 至 3 种化学物质,这些化学物质与已知的作用机制或通过 PubChem 数据库中的 2D Tanimoto 系数确定的结构相似性无关。基于通路数据的无监督聚类和单类支持向量机无法将 29 种白血病原与 11 种非白血病的已知和可能的 IARC 致癌物区分开来。然而,使用二类随机森林来估计白血病原和非白血病原的模式,我们估计有 76%的机会可以将随机的白血病原/非白血病原对彼此区分开来。