Laboratory for Molecular Modeling, Eshelman School of Pharmacy, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA.
Chem Res Toxicol. 2010 Jan;23(1):171-83. doi: 10.1021/tx900326k.
Drug-induced liver injury is one of the main causes of drug attrition. The ability to predict the liver effects of drug candidates from their chemical structures is critical to help guide experimental drug discovery projects toward safer medicines. In this study, we have compiled a data set of 951 compounds reported to produce a wide range of effects in the liver in different species, comprising humans, rodents, and nonrodents. The liver effects for this data set were obtained as assertional metadata, generated from MEDLINE abstracts using a unique combination of lexical and linguistic methods and ontological rules. We have analyzed this data set using conventional cheminformatics approaches and addressed several questions pertaining to cross-species concordance of liver effects, chemical determinants of liver effects in humans, and the prediction of whether a given compound is likely to cause a liver effect in humans. We found that the concordance of liver effects was relatively low (ca. 39-44%) between different species, raising the possibility that species specificity could depend on specific features of chemical structure. Compounds were clustered by their chemical similarity, and similar compounds were examined for the expected similarity of their species-dependent liver effect profiles. In most cases, similar profiles were observed for members of the same cluster, but some compounds appeared as outliers. The outliers were the subject of focused assertion regeneration from MEDLINE as well as other data sources. In some cases, additional biological assertions were identified, which were in line with expectations based on compounds' chemical similarities. The assertions were further converted to binary annotations of underlying chemicals (i.e., liver effect vs no liver effect), and binary quantitative structure-activity relationship (QSAR) models were generated to predict whether a compound would be expected to produce liver effects in humans. Despite the apparent heterogeneity of data, models have shown good predictive power assessed by external 5-fold cross-validation procedures. The external predictive power of binary QSAR models was further confirmed by their application to compounds that were retrieved or studied after the model was developed. To the best of our knowledge, this is the first study for chemical toxicity prediction that applied QSAR modeling and other cheminformatics techniques to observational data generated by the means of automated text mining with limited manual curation, opening up new opportunities for generating and modeling chemical toxicology data.
药物性肝损伤是药物淘汰的主要原因之一。从化学结构预测候选药物的肝脏效应的能力对于指导实验性药物发现项目开发更安全的药物至关重要。在这项研究中,我们汇集了 951 种化合物的数据,这些化合物在不同物种(包括人类、啮齿动物和非啮齿动物)中报告产生了广泛的肝脏效应。该数据集的肝脏效应是通过使用词汇和语言方法以及本体规则的独特组合从 MEDLINE 摘要中生成断言元数据获得的。我们使用传统的化学信息学方法分析了这个数据集,并解决了几个关于肝脏效应的跨物种一致性、人类肝脏效应的化学决定因素以及预测给定化合物是否可能在人类中引起肝脏效应的问题。我们发现,不同物种之间的肝脏效应一致性相对较低(约 39-44%),这表明物种特异性可能取决于化学结构的特定特征。化合物按化学相似性聚类,然后检查具有相似化学结构的化合物的物种依赖性肝脏效应谱的预期相似性。在大多数情况下,同一聚类的成员观察到相似的谱,但也有一些化合物表现为异常值。异常值是从 MEDLINE 以及其他数据源中重新生成的焦点断言的主题。在某些情况下,还确定了其他生物学断言,这些断言与基于化合物化学相似性的预期一致。断言进一步转换为基础化学物质的二进制注释(即肝脏效应与无肝脏效应),并生成二进制定量结构-活性关系(QSAR)模型,以预测化合物是否预期在人类中产生肝脏效应。尽管数据明显存在异质性,但通过外部 5 倍交叉验证程序评估,模型显示出良好的预测能力。二进制 QSAR 模型的外部预测能力通过将其应用于模型开发后检索或研究的化合物得到进一步证实。据我们所知,这是首次应用 QSAR 建模和其他化学信息学技术对通过自动化文本挖掘生成的观察数据进行化学毒性预测的研究,该研究对有限的人工编辑进行了观察数据生成和建模,为生成和建模化学毒理学数据开辟了新的机会。