Department of Bioinformatics, Institute of Health Service and Transfusion Medicine, Beijing 100850, China.
Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin 300072, China.
Nucleic Acids Res. 2023 Jan 6;51(D1):D1432-D1445. doi: 10.1093/nar/gkac1074.
The toxic effects of compounds on environment, humans, and other organisms have been a major focus of many research areas, including drug discovery and ecological research. Identifying the potential toxicity in the early stage of compound/drug discovery is critical. The rapid development of computational methods for evaluating various toxicity categories has increased the need for comprehensive and system-level collection of toxicological data, associated attributes, and benchmarks. To contribute toward this goal, we proposed TOXRIC (https://toxric.bioinforai.tech/), a database with comprehensive toxicological data, standardized attribute data, practical benchmarks, informative visualization of molecular representations, and an intuitive function interface. The data stored in TOXRIC contains 113 372 compounds, 13 toxicity categories, 1474 toxicity endpoints covering in vivo/in vitro endpoints and 39 feature types, covering structural, target, transcriptome, metabolic data, and other descriptors. All the curated datasets of endpoints and features can be retrieved, downloaded and directly used as output or input to Machine Learning (ML)-based prediction models. In addition to serving as a data repository, TOXRIC also provides visualization of benchmarks and molecular representations for all endpoint datasets. Based on these results, researchers can better understand and select optimal feature types, molecular representations, and baseline algorithms for each endpoint prediction task. We believe that the rich information on compound toxicology, ML-ready datasets, benchmarks and molecular representation distribution can greatly facilitate toxicological investigations, interpretation of toxicological mechanisms, compound/drug discovery and the development of computational methods.
化合物对环境、人类和其他生物的毒性作用一直是许多研究领域的重点,包括药物发现和生态研究。在化合物/药物发现的早期阶段识别潜在的毒性至关重要。用于评估各种毒性类别的计算方法的快速发展增加了对全面和系统的毒性数据、相关属性和基准的需求。为了实现这一目标,我们提出了 TOXRIC(https://toxric.bioinforai.tech/),这是一个包含全面毒性数据、标准化属性数据、实用基准、分子表示形式的信息可视化以及直观功能界面的数据库。TOXRIC 中存储的数据包含 113372 种化合物、13 种毒性类别、1474 种毒性终点,涵盖体内/体外终点和 39 种特征类型,包括结构、靶标、转录组、代谢数据和其他描述符。所有终点和特征的已编目数据集都可以检索、下载并直接用作基于机器学习 (ML) 的预测模型的输出或输入。除了作为一个数据库,TOXRIC 还为所有终点数据集提供基准和分子表示形式的可视化。基于这些结果,研究人员可以更好地理解和选择每个终点预测任务的最佳特征类型、分子表示形式和基线算法。我们相信化合物毒理学的丰富信息、ML 就绪数据集、基准和分子表示分布可以极大地促进毒理学研究、毒理学机制的解释、化合物/药物发现和计算方法的发展。