Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA.
Center for Computational Science, University of Miami, Miami, FL 33136, USA.
Molecules. 2019 Apr 23;24(8):1604. doi: 10.3390/molecules24081604.
The Toxicology in the 21st Century (Tox21) project seeks to develop and test methods for high-throughput examination of the effect certain chemical compounds have on biological systems. Although primary and toxicity assay data were readily available for multiple reporter gene modified cell lines, extensive annotation and curation was required to improve these datasets with respect to how FAIR (Findable, Accessible, Interoperable, and Reusable) they are. In this study, we fully annotated the Tox21 published data with relevant and accepted controlled vocabularies. After removing unreliable data points, we aggregated the results and created three sets of signatures reflecting activity in the reporter gene assays, cytotoxicity, and selective reporter gene activity, respectively. We benchmarked these signatures using the chemical structures of the tested compounds and obtained generally high receiver operating characteristic (ROC) scores, suggesting good quality and utility of these signatures and the underlying data. We analyzed the results to identify promiscuous individual compounds and chemotypes for the three signature categories and interpreted the results to illustrate the utility and re-usability of the datasets. With this study, we aimed to demonstrate the importance of data standards in reporting screening results and high-quality annotations to enable re-use and interpretation of these data. To improve the data with respect to all FAIR criteria, all assay annotations, cleaned and aggregate datasets, and signatures were made available as standardized dataset packages (Aggregated Tox21 bioactivity data, 2019).
21 世纪毒理学(Tox21)项目旨在开发和测试高通量检测某些化合物对生物系统影响的方法。虽然多个报告基因修饰细胞系的初步和毒性测定数据已经可用,但为了提高这些数据集的 FAIR(可发现、可访问、可互操作和可重用)程度,需要进行广泛的注释和整理。在这项研究中,我们使用相关且可接受的受控词汇表对 Tox21 发表的数据进行了全面注释。在删除不可靠的数据点后,我们对结果进行了汇总,并创建了三组签名,分别反映报告基因测定、细胞毒性和选择性报告基因活性的活性。我们使用测试化合物的化学结构对这些签名进行了基准测试,并获得了通常较高的接收器操作特性(ROC)分数,这表明这些签名和基础数据具有良好的质量和实用性。我们分析了结果,以确定三个签名类别的混杂单个化合物和化学型,并解释结果以说明数据集的实用性和可重用性。通过这项研究,我们旨在展示在报告筛选结果时数据标准的重要性以及高质量注释,以实现这些数据的重用和解释。为了提高数据在所有 FAIR 标准方面的水平,所有测定注释、清理和汇总数据集以及签名都作为标准化数据集包(2019 年综合 Tox21 生物活性数据)提供。