Suppr超能文献

使用分层袋装法和一致性预测对肝脂肪变性进行建模的数据分析与预测相结合。

Combining Data with Predictions for Modeling Hepatic Steatosis by Using Stratified Bagging and Conformal Prediction.

机构信息

Department of Pharmaceutical Chemistry, Division of Drug Design and Medicinal Chemistry, University of Vienna, 1090 Vienna, Austria.

Unit of Toxicology Sciences, Swetox, Karolinska Institutet, SE-15136 Södertälje, Sweden.

出版信息

Chem Res Toxicol. 2021 Feb 15;34(2):656-668. doi: 10.1021/acs.chemrestox.0c00511. Epub 2020 Dec 21.

Abstract

Hepatic steatosis (fatty liver) is a severe liver disease induced by the excessive accumulation of fatty acids in hepatocytes. In this study, we developed reliable models for predicting hepatic steatosis on the basis of an data set of 1041 compounds measured in rodent studies with repeated oral exposure. The imbalanced nature of the data set (1:8, with the "steatotic" compounds belonging to the minority class) required the use of meta-classifiers-bagging with stratified under-sampling and Mondrian conformal prediction-on top of the base classifier random forest. One major goal was the investigation of the influence of different descriptor combinations on model performance (tested by predicting an external validation set): physicochemical descriptors (RDKit), ToxPrint features, as well as predictions from nuclear receptor and transporter models. All models based upon descriptor combinations including physicochemical features led to reasonable balanced accuracies (BAs between 0.65 and 0.69 for the respective models). Combining physicochemical features with transporter predictions and further with ToxPrint features gave the best performing model (BAs up to 0.7 and efficiencies of 0.82). Whereas both meta-classifiers proved useful for this highly imbalanced toxicity data set, the conformal prediction framework also guarantees the error level and thus might be favored for future studies in the field of predictive toxicology.

摘要

肝脂肪变性(脂肪肝)是一种由肝细胞内脂肪酸过度积累引起的严重肝脏疾病。在这项研究中,我们基于啮齿动物经重复口服暴露研究中测量的 1041 种化合物的数据集,开发了可靠的肝脂肪变性预测模型。数据集的不平衡性质(“脂肪变性”化合物属于少数类,比例为 1:8)需要在基础分类器随机森林之上使用元分类器——袋装分层欠采样和蒙地卡罗一致性预测。一个主要目标是研究不同描述符组合对模型性能的影响(通过预测外部验证集来测试):物理化学描述符(RDKit)、ToxPrint 特征,以及核受体和转运体模型的预测。所有基于包含物理化学特征的描述符组合的模型都导致了合理的平衡准确性(各自模型的平衡准确率在 0.65 到 0.69 之间)。将物理化学特征与转运体预测相结合,并进一步与 ToxPrint 特征相结合,得到了表现最佳的模型(平衡准确率高达 0.7,效率为 0.82)。虽然这两种元分类器都对这个高度不平衡的毒性数据集很有用,但一致性预测框架也保证了错误水平,因此可能更适合预测毒理学领域的未来研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9dbc/7887803/382d432b3806/tx0c00511_0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验