Kosnik Marissa B, Planchart Antonio, Marvel Skylar W, Reif David M, Mattingly Carolyn J
Toxicology Program, North Carolina State University, North Carolina State University, Raleigh, NC 27695-7617, United States.
Bioinformatics Research Center, North Carolina State University, North Carolina State University, Raleigh, NC 27695-7617, United States.
Comput Toxicol. 2019 Nov;12. doi: 10.1016/j.comtox.2019.100094. Epub 2019 Jun 27.
Addressing the complex relationship between public health and environmental exposure requires multiple types and sources of data. An important source of chemical data derives from high-throughput screening (HTS) efforts, such as the Tox21/ToxCast program, which aim to identify chemical hazard using primarily assays to probe toxicity. While most of these assays target specific genes, assessing the disease-relevance of these assays remains challenging. Integration with additional data sets may help to resolve these questions by providing broader context for individual assay results. The Comparative Toxicogenomics Database (CTD), a publicly available database that builds networks of chemical, gene, and disease information from manually curated literature sources, offers a promising solution for contextual integration with HTS data. Here, we tested the value of integrating data across Tox21/ToxCast and CTD by linking elements common to both databases (i.e., assays, genes, and chemicals). Using polymarcine and Parkinson's disease as a case study, we found that their union significantly increased chemical-gene associations and disease-pathway coverage. Integration also enabled new disease associations to be made with HTS assays, expanding coverage of chemical-gene data associated with diseases. We demonstrate how integration enables development of predictive adverse outcome pathways using 4-nonylphenol, branched as an example. Thus, we demonstrate enhancements to each data source through database integration, including scenarios where HTS data can efficiently probe chemical space that may be understudied in the literature, as well as how CTD can add biological context to those results.
处理公共卫生与环境暴露之间的复杂关系需要多种类型和来源的数据。化学数据的一个重要来源来自高通量筛选(HTS)工作,例如Tox21/ToxCast计划,其旨在主要使用检测方法来探测毒性以识别化学危害。虽然这些检测大多针对特定基因,但评估这些检测与疾病的相关性仍然具有挑战性。与其他数据集整合可能有助于通过为单个检测结果提供更广泛的背景来解决这些问题。比较毒理基因组学数据库(CTD)是一个可公开获取的数据库,它从人工整理的文献来源构建化学、基因和疾病信息网络,为与HTS数据进行背景整合提供了一个有前景的解决方案。在这里,我们通过链接两个数据库共有的元素(即检测、基因和化学物质)来测试整合Tox21/ToxCast和CTD数据的价值。以多马西明和帕金森病为例进行研究,我们发现它们的结合显著增加了化学-基因关联和疾病-通路覆盖范围。整合还使人们能够将新的疾病关联与HTS检测联系起来,扩大了与疾病相关的化学-基因数据的覆盖范围。我们以4-壬基酚(支链)为例展示了整合如何促进预测性不良结局途径的开发。因此,我们展示了通过数据库整合对每个数据源的增强作用,包括HTS数据可以有效探测文献中可能研究不足的化学空间的情况,以及CTD如何为这些结果增添生物学背景。