• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用模拟多尺度数据模型对化学毒性分类的机器学习算法比较

A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model.

作者信息

Judson Richard, Elloumi Fathi, Setzer R Woodrow, Li Zhen, Shah Imran

机构信息

National Center for Computational Toxicology, Office of Research and Development, US Environmental Protection Agency, Research Triangle Park, North Carolina 27711, USA.

出版信息

BMC Bioinformatics. 2008 May 19;9:241. doi: 10.1186/1471-2105-9-241.

DOI:10.1186/1471-2105-9-241
PMID:18489778
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2409339/
Abstract

BACKGROUND

Bioactivity profiling using high-throughput in vitro assays can reduce the cost and time required for toxicological screening of environmental chemicals and can also reduce the need for animal testing. Several public efforts are aimed at discovering patterns or classifiers in high-dimensional bioactivity space that predict tissue, organ or whole animal toxicological endpoints. Supervised machine learning is a powerful approach to discover combinatorial relationships in complex in vitro/in vivo datasets. We present a novel model to simulate complex chemical-toxicology data sets and use this model to evaluate the relative performance of different machine learning (ML) methods.

RESULTS

The classification performance of Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), Linear Discriminant Analysis (LDA), Naïve Bayes (NB), Recursive Partitioning and Regression Trees (RPART), and Support Vector Machines (SVM) in the presence and absence of filter-based feature selection was analyzed using K-way cross-validation testing and independent validation on simulated in vitro assay data sets with varying levels of model complexity, number of irrelevant features and measurement noise. While the prediction accuracy of all ML methods decreased as non-causal (irrelevant) features were added, some ML methods performed better than others. In the limit of using a large number of features, ANN and SVM were always in the top performing set of methods while RPART and KNN (k = 5) were always in the poorest performing set. The addition of measurement noise and irrelevant features decreased the classification accuracy of all ML methods, with LDA suffering the greatest performance degradation. LDA performance is especially sensitive to the use of feature selection. Filter-based feature selection generally improved performance, most strikingly for LDA.

CONCLUSION

We have developed a novel simulation model to evaluate machine learning methods for the analysis of data sets in which in vitro bioassay data is being used to predict in vivo chemical toxicology. From our analysis, we can recommend that several ML methods, most notably SVM and ANN, are good candidates for use in real world applications in this area.

摘要

背景

使用高通量体外试验进行生物活性分析可以降低环境化学品毒理学筛选所需的成本和时间,还可以减少动物试验的需求。几项公共努力旨在在高维生物活性空间中发现预测组织、器官或全动物毒理学终点的模式或分类器。监督式机器学习是在复杂的体外/体内数据集中发现组合关系的强大方法。我们提出了一种新颖的模型来模拟复杂的化学-毒理学数据集,并使用该模型评估不同机器学习(ML)方法的相对性能。

结果

使用K折交叉验证测试和独立验证,对具有不同模型复杂度、无关特征数量和测量噪声水平的模拟体外试验数据集,分析了人工神经网络(ANN)、K近邻(KNN)、线性判别分析(LDA)、朴素贝叶斯(NB)、递归划分和回归树(RPART)以及支持向量机(SVM)在有无基于过滤的特征选择情况下的分类性能。虽然随着非因果(无关)特征的添加,所有ML方法的预测准确性都有所下降,但一些ML方法的表现优于其他方法。在使用大量特征的极限情况下,ANN和SVM始终处于表现最佳的方法组中,而RPART和KNN(k = 5)始终处于表现最差的方法组中。测量噪声和无关特征的添加降低了所有ML方法的分类准确性,其中LDA的性能下降最为明显。LDA的性能对特征选择的使用特别敏感。基于过滤的特征选择通常会提高性能,对LDA最为显著。

结论

我们开发了一种新颖的模拟模型,用于评估机器学习方法在分析体外生物测定数据用于预测体内化学毒理学的数据集中的性能。通过我们的分析,我们可以推荐几种ML方法,最值得注意的是SVM和ANN,是该领域实际应用的良好候选方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e9e/2409339/87d1f0b7aa0f/1471-2105-9-241-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e9e/2409339/cb9d6047fb83/1471-2105-9-241-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e9e/2409339/64fa2fe904a6/1471-2105-9-241-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e9e/2409339/90a2545ef74c/1471-2105-9-241-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e9e/2409339/106ae40c1e62/1471-2105-9-241-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e9e/2409339/b4d87974d7cf/1471-2105-9-241-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e9e/2409339/131cc99e27de/1471-2105-9-241-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e9e/2409339/1094350f78f5/1471-2105-9-241-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e9e/2409339/87d1f0b7aa0f/1471-2105-9-241-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e9e/2409339/cb9d6047fb83/1471-2105-9-241-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e9e/2409339/64fa2fe904a6/1471-2105-9-241-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e9e/2409339/90a2545ef74c/1471-2105-9-241-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e9e/2409339/106ae40c1e62/1471-2105-9-241-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e9e/2409339/b4d87974d7cf/1471-2105-9-241-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e9e/2409339/131cc99e27de/1471-2105-9-241-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e9e/2409339/1094350f78f5/1471-2105-9-241-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e9e/2409339/87d1f0b7aa0f/1471-2105-9-241-8.jpg

相似文献

1
A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model.使用模拟多尺度数据模型对化学毒性分类的机器学习算法比较
BMC Bioinformatics. 2008 May 19;9:241. doi: 10.1186/1471-2105-9-241.
2
Predicting hepatotoxicity using ToxCast in vitro bioactivity and chemical structure.利用ToxCast体外生物活性和化学结构预测肝毒性。
Chem Res Toxicol. 2015 Apr 20;28(4):738-51. doi: 10.1021/tx500501h. Epub 2015 Mar 9.
3
Effect of finite sample size on feature selection and classification: a simulation study.有限样本大小对特征选择和分类的影响:一项模拟研究。
Med Phys. 2010 Feb;37(2):907-20. doi: 10.1118/1.3284974.
4
Automated classification of neurological disorders of gait using spatio-temporal gait parameters.利用时空步态参数对神经源性步态障碍进行自动分类。
J Electromyogr Kinesiol. 2015 Apr;25(2):413-22. doi: 10.1016/j.jelekin.2015.01.004. Epub 2015 Feb 7.
5
Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.基于多类支持向量机的中医唇诊计算机辅助诊断。
BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127.
6
Improving performance of computer-aided detection scheme by combining results from two machine learning classifiers.通过结合两个机器学习分类器的结果来提高计算机辅助检测方案的性能。
Acad Radiol. 2009 Mar;16(3):266-74. doi: 10.1016/j.acra.2008.08.012.
7
Top scoring pairs for feature selection in machine learning and applications to cancer outcome prediction.机器学习中特征选择的最佳评分对及其在癌症预后预测中的应用。
BMC Bioinformatics. 2011 Sep 23;12:375. doi: 10.1186/1471-2105-12-375.
8
Application of supervised machine learning algorithms in the classification of sagittal gait patterns of cerebral palsy children with spastic diplegia.监督机器学习算法在痉挛性双瘫脑瘫儿童矢状面步态模式分类中的应用。
Comput Biol Med. 2019 Mar;106:33-39. doi: 10.1016/j.compbiomed.2019.01.009. Epub 2019 Jan 16.
9
Application of genetic algorithms and constructive neural networks for the analysis of microarray cancer data.遗传算法和构造神经网络在微阵列癌症数据分析中的应用。
Theor Biol Med Model. 2014 May 7;11 Suppl 1(Suppl 1):S7. doi: 10.1186/1742-4682-11-S1-S7.
10
Feature weight estimation for gene selection: a local hyperlinear learning approach.特征权重估计在基因选择中的应用:一种局部超线性学习方法。
BMC Bioinformatics. 2014 Mar 14;15:70. doi: 10.1186/1471-2105-15-70.

引用本文的文献

1
A Review of the Applications, Benefits, and Challenges of Generative AI for Sustainable Toxicology.生成式人工智能在可持续毒理学中的应用、益处及挑战综述
Curr Res Toxicol. 2025 Apr 21;8:100232. doi: 10.1016/j.crtox.2025.100232. eCollection 2025.
2
Navigating Transcriptomic Connectivity Mapping Workflows to Link Chemicals with Bioactivities.导航转录组连通性映射工作流程,将化学物质与生物活性联系起来。
Chem Res Toxicol. 2022 Nov 21;35(11):1929-1949. doi: 10.1021/acs.chemrestox.2c00245. Epub 2022 Oct 27.
3
Machine Learning to Predict, Detect, and Intervene Older Adults Vulnerable for Adverse Drug Events in the Emergency Department.

本文引用的文献

1
An aryl hydrocarbon receptor odyssey to the shores of toxicology: the Deichmann Lecture, International Congress of Toxicology-XI.芳烃受体的毒理学之旅:第11届国际毒理学大会戴希曼讲座
Toxicol Sci. 2007 Jul;98(1):5-38. doi: 10.1093/toxsci/kfm096.
2
The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease.连通性图谱:利用基因表达特征连接小分子、基因与疾病。
Science. 2006 Sep 29;313(5795):1929-35. doi: 10.1126/science.1132939.
3
The ToxCast program for prioritizing toxicity testing of environmental chemicals.
机器学习用于预测、检测和干预急诊科中易发生药物不良事件的老年人。
J Med Toxicol. 2018 Sep;14(3):248-252. doi: 10.1007/s13181-018-0667-3. Epub 2018 Jun 1.
4
Predicting the Reasons of Customer Complaints: A First Step Toward Anticipating Quality Issues of In Vitro Diagnostics Assays with Machine Learning.预测客户投诉原因:迈向利用机器学习预测体外诊断检测质量问题的第一步。
JMIR Med Inform. 2018 May 15;6(2):e34. doi: 10.2196/medinform.9960.
5
Can human experts predict solubility better than computers?人类专家在预测溶解度方面是否比计算机更胜一筹?
J Cheminform. 2017 Dec 13;9(1):63. doi: 10.1186/s13321-017-0250-y.
6
Predicting epiglottic collapse in patients with obstructive sleep apnoea.预测阻塞性睡眠呼吸暂停患者的会厌塌陷。
Eur Respir J. 2017 Sep 20;50(3). doi: 10.1183/13993003.00345-2017. Print 2017 Sep.
7
Computational methods for prediction of in vitro effects of new chemical structures.预测新化学结构体外效应的计算方法。
J Cheminform. 2016 Sep 29;8:51. doi: 10.1186/s13321-016-0162-2. eCollection 2016.
8
Machine learning algorithms for mode-of-action classification in toxicity assessment.用于毒性评估中作用模式分类的机器学习算法。
BioData Min. 2016 May 13;9:19. doi: 10.1186/s13040-016-0098-0. eCollection 2016.
9
Recent Advances and Emerging Applications in Text and Data Mining for Biomedical Discovery.用于生物医学发现的文本与数据挖掘的最新进展及新兴应用
Brief Bioinform. 2016 Jan;17(1):33-42. doi: 10.1093/bib/bbv087. Epub 2015 Sep 29.
10
Characterization of chemically induced liver injuries using gene co-expression modules.利用基因共表达模块对化学诱导的肝损伤进行表征。
PLoS One. 2014 Sep 16;9(9):e107230. doi: 10.1371/journal.pone.0107230. eCollection 2014.
用于对环境化学品毒性测试进行优先级排序的ToxCast计划。
Toxicol Sci. 2007 Jan;95(1):5-12. doi: 10.1093/toxsci/kfl103. Epub 2006 Sep 8.
4
Identifying genes that contribute most to good classification in microarrays.识别在微阵列中对良好分类贡献最大的基因。
BMC Bioinformatics. 2006 Sep 7;7:407. doi: 10.1186/1471-2105-7-407.
5
On the statistical assessment of classifiers using DNA microarray data.关于使用DNA微阵列数据对分类器进行统计评估
BMC Bioinformatics. 2006 Aug 19;7:387. doi: 10.1186/1471-2105-7-387.
6
What should be expected from feature selection in small-sample settings.在小样本情况下,特征选择应达到什么预期效果。
Bioinformatics. 2006 Oct 1;22(19):2430-6. doi: 10.1093/bioinformatics/btl407. Epub 2006 Jul 26.
7
Quantitative high-throughput screening: a titration-based approach that efficiently identifies biological activities in large chemical libraries.定量高通量筛选:一种基于滴定的方法,可有效识别大型化学文库中的生物活性。
Proc Natl Acad Sci U S A. 2006 Aug 1;103(31):11473-8. doi: 10.1073/pnas.0604348103. Epub 2006 Jul 24.
8
Using high-throughput screening data to discriminate compounds with single-target effects from those with side effects.利用高通量筛选数据区分具有单一靶点效应的化合物和具有副作用的化合物。
J Chem Inf Model. 2006 Jul-Aug;46(4):1549-62. doi: 10.1021/ci050495h.
9
Global mapping of pharmacological space.药理空间的全球图谱。
Nat Biotechnol. 2006 Jul;24(7):805-15. doi: 10.1038/nbt1228.
10
Differential activation of nuclear receptors by perfluorinated fatty acid analogs and natural fatty acids: a comparison of human, mouse, and rat peroxisome proliferator-activated receptor-alpha, -beta, and -gamma, liver X receptor-beta, and retinoid X receptor-alpha.全氟脂肪酸类似物和天然脂肪酸对核受体的差异激活作用:人、小鼠和大鼠过氧化物酶体增殖物激活受体α、β和γ、肝X受体β以及视黄酸X受体α的比较
Toxicol Sci. 2006 Aug;92(2):476-89. doi: 10.1093/toxsci/kfl014. Epub 2006 May 26.