• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于随机森林学习者的不平衡数据中酚类化合物毒性作用机制的计算预测。

In silico prediction of toxic action mechanisms of phenols for imbalanced data with Random Forest learner.

机构信息

College of Computer Science, Chongqing University, Chongqing 400030, China.

出版信息

J Mol Graph Model. 2012 May;35:21-7. doi: 10.1016/j.jmgm.2012.01.002. Epub 2012 Jan 17.

DOI:10.1016/j.jmgm.2012.01.002
PMID:22481075
Abstract

With an increasing need for the rapid and effective safety assessment of compounds in industrial and civil-use products, in silico toxicity exploration techniques provide an economic way for environmental hazard assessment. The previous in silico researches have developed many quantitative structure-activity relationships models to predict toxicity mechanisms for last decade. Most of these methods benefit from data analysis and machine learning techniques, which rely heavily on the characteristics of data sets. For Tetrahymena pyriformis toxicity data sets, there is a great technical challenge-data imbalance. The skewness of data class distribution would greatly deteriorate the prediction performance on rare classes. Most of the previous researches for phenol mechanisms of toxic action prediction did not consider this practical problem. In this work, we dealt with the problem by considering the difference between the two types of misclassifications. Random Forest learner was employed in cost-sensitive learning framework to construct prediction models based on selected molecular descriptors. In computational experiments, both the global and local models obtained appreciable overall prediction accuracies. Particularly, the performance on rare classes was indeed promoted. Moreover, for practical usage of these models, the balance of the two misclassifications can be adjusted by using different cost matrices according to the application goals.

摘要

随着对工业和民用产品中化合物快速有效安全评估的需求不断增加,基于计算的毒性探索技术为环境危害评估提供了一种经济的方法。过去十年,许多定量构效关系模型已被用于研究开发,以预测毒性机制。这些方法大多受益于数据分析和机器学习技术,而这些技术严重依赖于数据集的特征。对于四膜虫毒性数据集,存在一个巨大的技术挑战——数据不平衡。数据类分布的偏斜会极大地降低稀有类别的预测性能。之前大多数关于苯酚毒性作用机制预测的研究都没有考虑到这个实际问题。在这项工作中,我们通过考虑两种类型的错误分类之间的差异来处理这个问题。随机森林学习者被应用于基于选择的分子描述符的成本敏感学习框架中,以构建预测模型。在计算实验中,所获得的全局和局部模型都具有可观的整体预测精度。特别是,稀有类别的性能确实得到了提高。此外,对于这些模型的实际应用,可以根据应用目标使用不同的代价矩阵来调整这两种误分类的平衡。

相似文献

1
In silico prediction of toxic action mechanisms of phenols for imbalanced data with Random Forest learner.基于随机森林学习者的不平衡数据中酚类化合物毒性作用机制的计算预测。
J Mol Graph Model. 2012 May;35:21-7. doi: 10.1016/j.jmgm.2012.01.002. Epub 2012 Jan 17.
2
In silico prediction of Tetrahymena pyriformis toxicity for diverse industrial chemicals with substructure pattern recognition and machine learning methods.基于亚结构模式识别和机器学习方法的秀丽隐杆线虫毒性的工业化学物质的计算机预测。
Chemosphere. 2011 Mar;82(11):1636-43. doi: 10.1016/j.chemosphere.2010.11.043. Epub 2010 Dec 9.
3
Exploring QSTR analysis of the toxicity of phenols and thiophenols using machine learning methods.运用机器学习方法探究酚类和噻吩类化合物毒性的定量构效关系分析。
Environ Toxicol Pharmacol. 2012 Nov;34(3):826-31. doi: 10.1016/j.etap.2012.09.003. Epub 2012 Sep 15.
4
In silico prediction of toxicity of phenols to Tetrahymena pyriformis by using genetic algorithm and decision tree-based modeling approach.利用遗传算法和基于决策树的建模方法对苯酚对梨形四膜虫的毒性进行计算机模拟预测。
Chemosphere. 2017 Apr;172:249-259. doi: 10.1016/j.chemosphere.2016.12.095. Epub 2017 Jan 2.
5
In silico prediction of toxicity of non-congeneric industrial chemicals using ensemble learning based modeling approaches.基于集成学习的建模方法对非同类型工业化学品毒性的计算机预测。
Toxicol Appl Pharmacol. 2014 Mar 15;275(3):198-212. doi: 10.1016/j.taap.2014.01.006. Epub 2014 Jan 23.
6
An evaluation of global QSAR models for the prediction of the toxicity of phenols to Tetrahymena pyriformis.用于预测酚类对梨形四膜虫毒性的全局定量构效关系模型评估。
Chemosphere. 2008 Apr;71(7):1225-32. doi: 10.1016/j.chemosphere.2007.12.011. Epub 2008 Feb 7.
7
Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection.针对梨形四膜虫的环境毒性定量构效关系(QSAR)模型的批判性评估:聚焦适用域及变量选择导致的过拟合问题
J Chem Inf Model. 2008 Sep;48(9):1733-46. doi: 10.1021/ci800151m. Epub 2008 Aug 26.
8
Application of support vector machine (SVM) for prediction toxic activity of different data sets.支持向量机(SVM)在不同数据集毒性活性预测中的应用。
Toxicology. 2006 Jan 16;217(2-3):105-19. doi: 10.1016/j.tox.2005.08.019. Epub 2005 Oct 5.
9
Application of random forest approach to QSAR prediction of aquatic toxicity.随机森林方法在定量结构-活性关系预测水生毒性中的应用。
J Chem Inf Model. 2009 Nov;49(11):2481-8. doi: 10.1021/ci900203n.
10
Prediction of antibacterial compounds by machine learning approaches.通过机器学习方法预测抗菌化合物。
J Comput Chem. 2009 Jun;30(8):1202-11. doi: 10.1002/jcc.21148.

引用本文的文献

1
Adjusted imbalance ratio leads to effective AI-based drug discovery against infectious disease.调整后的失衡率有助于基于人工智能的有效传染病药物发现。
Sci Rep. 2025 Aug 12;15(1):29563. doi: 10.1038/s41598-025-15265-5.
2
Data-Driven Quantitative Structure-Activity Relationship Modeling for Human Carcinogenicity by Chronic Oral Exposure.基于数据的定量构效关系模型在人类经口慢性暴露致癌性研究中的应用。
Environ Sci Technol. 2023 Apr 25;57(16):6573-6588. doi: 10.1021/acs.est.3c00648. Epub 2023 Apr 11.
3
Large-Scale Modeling of Multispecies Acute Toxicity End Points Using Consensus of Multitask Deep Learning Methods.
采用多任务深度学习方法共识对多种物种急性毒性终点进行大规模建模。
J Chem Inf Model. 2021 Feb 22;61(2):653-663. doi: 10.1021/acs.jcim.0c01164. Epub 2021 Feb 3.
4
Structure-activity relationship-based chemical classification of highly imbalanced Tox21 datasets.基于结构-活性关系的高度不平衡Tox21数据集的化学分类
J Cheminform. 2020 Oct 27;12(1):66. doi: 10.1186/s13321-020-00468-x.
5
Comparing the performance of meta-classifiers-a case study on selected imbalanced data sets relevant for prediction of liver toxicity.比较元分类器的性能——以与预测肝毒性相关的选定不平衡数据集为例的研究。
J Comput Aided Mol Des. 2018 May;32(5):583-590. doi: 10.1007/s10822-018-0116-z. Epub 2018 Apr 6.
6
CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests.基于随机森林的用于特征选择和参数优化的CURE-SMOTE算法及混合算法。
BMC Bioinformatics. 2017 Mar 14;18(1):169. doi: 10.1186/s12859-017-1578-z.
7
QSAR modeling of imbalanced high-throughput screening data in PubChem.基于PubChem中不平衡高通量筛选数据的定量构效关系建模
J Chem Inf Model. 2014 Mar 24;54(3):705-12. doi: 10.1021/ci400737s. Epub 2014 Feb 28.