• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

训练数据大小和噪声水平对支持向量机从大型化合物库中虚拟筛选遗传毒性化合物的影响。

Effect of training data size and noise level on support vector machines virtual screening of genotoxic compounds from large compound libraries.

机构信息

Bioinformatics and Drug Design Group, Centre for Computational Science and Engineering, Department of Pharmacy, National University of Singapore.

出版信息

J Comput Aided Mol Des. 2011 May;25(5):455-67. doi: 10.1007/s10822-011-9431-3. Epub 2011 May 10.

DOI:10.1007/s10822-011-9431-3
PMID:21556903
Abstract

Various in vitro and in-silico methods have been used for drug genotoxicity tests, which show limited genotoxicity (GT+) and non-genotoxicity (GT-) identification rates. New methods and combinatorial approaches have been explored for enhanced collective identification capability. The rates of in-silco methods may be further improved by significantly diversified training data enriched by the large number of recently reported GT+ and GT- compounds, but a major concern is the increased noise levels arising from high false-positive rates of in vitro data. In this work, we evaluated the effect of training data size and noise level on the performance of support vector machines (SVM) method known to tolerate high noise levels in training data. Two SVMs of different diversity/noise levels were developed and tested. H-SVM trained by higher diversity higher noise data (GT+ in any in vivo or in vitro test) outperforms L-SVM trained by lower noise lower diversity data (GT+ in in vivo or Ames test only). H-SVM trained by 4,763 GT+ compounds reported before 2008 and 8,232 GT- compounds excluding clinical trial drugs correctly identified 81.6% of the 38 GT+ compounds reported since 2008, predicted 83.1% of the 2,008 clinical trial drugs as GT-, and 23.96% of 168 K MDDR and 27.23% of 17.86M PubChem compounds as GT+. These are comparable to the 43.1-51.9% GT+ and 75-93% GT- rates of existing in-silico methods, 58.8% GT+ and 79% GT- rates of Ames method, and the estimated percentages of 23% in vivo and 31-33% in vitro GT+ compounds in the "universe of chemicals". There is a substantial level of agreement between H-SVM and L-SVM predicted GT+ and GT- MDDR compounds and the prediction from TOPKAT. SVM showed good potential in identifying GT+ compounds from large compound libraries based on higher diversity and higher noise training data.

摘要

各种体外和计算机模拟方法已被用于药物遗传毒性测试,这些方法显示出有限的遗传毒性(GT+)和非遗传毒性(GT-)识别率。新的方法和组合方法已经被探索用于提高集体识别能力。通过大量最近报道的 GT+和 GT-化合物丰富的大量训练数据,可进一步提高计算机模拟方法的识别率,但主要关注的是体外数据高假阳性率导致的噪声水平增加。在这项工作中,我们评估了训练数据大小和噪声水平对支持向量机(SVM)方法性能的影响,该方法已知可耐受训练数据中的高噪声水平。开发并测试了两种具有不同多样性/噪声水平的 SVM。由更高多样性更高噪声数据(任何体内或体外试验中的 GT+)训练的 H-SVM 优于由更低噪声更低多样性数据(仅体内或 Ames 试验中的 GT+)训练的 L-SVM。由 2008 年前报告的 4,763 种 GT+化合物和排除临床试验药物的 8,232 种 GT-化合物训练的 H-SVM 正确识别了自 2008 年以来报告的 38 种 GT+化合物中的 81.6%,预测了 2,008 种临床试验药物中的 83.1%为 GT-,预测了 168 K MDDR 的 23.96%和 17.86M PubChem 化合物的 27.23%为 GT+。这些与现有的计算机模拟方法的 43.1-51.9% GT+和 75-93% GT-识别率、Ames 方法的 58.8% GT+和 79% GT-识别率以及“化学物质宇宙”中估计的 23%体内和 31-33%体外 GT+化合物的百分比相当。H-SVM 和 L-SVM 预测的 GT+和 GT- MDDR 化合物与 TOPKAT 的预测之间存在相当大的一致性。SVM 显示出从大型化合物库中识别 GT+化合物的良好潜力,基于更高的多样性和更高的噪声训练数据。

相似文献

1
Effect of training data size and noise level on support vector machines virtual screening of genotoxic compounds from large compound libraries.训练数据大小和噪声水平对支持向量机从大型化合物库中虚拟筛选遗传毒性化合物的影响。
J Comput Aided Mol Des. 2011 May;25(5):455-67. doi: 10.1007/s10822-011-9431-3. Epub 2011 May 10.
2
Virtual screening of Abl inhibitors from large compound libraries by support vector machines.利用支持向量机从大型化合物库中虚拟筛选Abl抑制剂
J Chem Inf Model. 2009 Sep;49(9):2101-10. doi: 10.1021/ci900135u.
3
Identification of small molecule aggregators from large compound libraries by support vector machines.通过支持向量机从大型化合物库中鉴定小分子聚集物。
J Comput Chem. 2010 Mar;31(4):752-63. doi: 10.1002/jcc.21347.
4
Evaluation of virtual screening performance of support vector machines trained by sparsely distributed active compounds.稀疏分布活性化合物训练的支持向量机虚拟筛选性能评估。
J Chem Inf Model. 2008 Jun;48(6):1227-37. doi: 10.1021/ci800022e. Epub 2008 Jun 6.
5
Prediction of genotoxicity of chemical compounds by statistical learning methods.用统计学习方法预测化合物的遗传毒性。
Chem Res Toxicol. 2005 Jun;18(6):1071-80. doi: 10.1021/tx049652h.
6
Combinatorial support vector machines approach for virtual screening of selective multi-target serotonin reuptake inhibitors from large compound libraries.组合支持向量机方法从大型化合物库中筛选选择性多靶标 5-羟色胺再摄取抑制剂。
J Mol Graph Model. 2012 Feb;32:49-66. doi: 10.1016/j.jmgm.2011.09.002. Epub 2011 Oct 5.
7
Development and experimental test of support vector machines virtual screening method for searching Src inhibitors from large compound libraries.用于从大型化合物库中搜索Src抑制剂的支持向量机虚拟筛选方法的开发与实验测试
Chem Cent J. 2012 Nov 23;6(1):139. doi: 10.1186/1752-153X-6-139.
8
Virtual screening of selective multitarget kinase inhibitors by combinatorial support vector machines.组合支持向量机的选择性多靶点激酶抑制剂虚拟筛选
Mol Pharm. 2010 Oct 4;7(5):1545-60. doi: 10.1021/mp100179t. Epub 2010 Aug 26.
9
Identifying Novel Type ZBGs and Nonhydroxamate HDAC Inhibitors Through a SVM Based Virtual Screening Approach.通过基于支持向量机的虚拟筛选方法鉴定新型ZBG型和非异羟肟酸类组蛋白去乙酰化酶抑制剂。
Mol Inform. 2010 May 17;29(5):407-20. doi: 10.1002/minf.200900014.
10
SVM model for virtual screening of Lck inhibitors.用于Lck抑制剂虚拟筛选的支持向量机模型
J Chem Inf Model. 2009 Apr;49(4):877-85. doi: 10.1021/ci800387z.

本文引用的文献

1
Emodin triggers DNA double-strand breaks by stabilizing topoisomerase II-DNA cleavage complexes and by inhibiting ATP hydrolysis of topoisomerase II.大黄素通过稳定拓扑异构酶 II-DNA 断裂复合物和抑制拓扑异构酶 II 的 ATP 水解来引发 DNA 双链断裂。
Toxicol Sci. 2010 Dec;118(2):435-43. doi: 10.1093/toxsci/kfq282. Epub 2010 Sep 20.
2
Genotoxicity of soluble and particulate cadmium compounds: impact on oxidative DNA damage and nucleotide excision repair.可溶性和颗粒态镉化合物的遗传毒性:对氧化 DNA 损伤和核苷酸切除修复的影响。
Chem Res Toxicol. 2010 Feb 15;23(2):432-42. doi: 10.1021/tx900444w.
3
Combining the in vivo comet and micronucleus assays: a practical approach to genotoxicity testing and data interpretation.
将体内彗星和微核试验相结合:一种用于遗传毒性测试和数据解释的实用方法。
Mutagenesis. 2010 Mar;25(2):187-99. doi: 10.1093/mutage/gep060. Epub 2009 Dec 6.
4
Reduction of use of animals in regulatory genotoxicity testing: Identification and implementation opportunities-Report from an ECVAM workshop.减少监管遗传毒性测试中动物的使用:鉴定和实施机会——来自 ECVAM 研讨会的报告。
Mutat Res. 2009 Nov-Dec;680(1-2):31-42. doi: 10.1016/j.mrgentox.2009.09.002. Epub 2009 Sep 16.
5
Virtual screening of Abl inhibitors from large compound libraries by support vector machines.利用支持向量机从大型化合物库中虚拟筛选Abl抑制剂
J Chem Inf Model. 2009 Sep;49(9):2101-10. doi: 10.1021/ci900135u.
6
Genotoxicity testing in vitro - development of a higher throughput analysis method based on the comet assay.体外遗传毒性测试——基于彗星试验的高通量分析方法的开发。
Toxicol In Vitro. 2009 Dec;23(8):1570-5. doi: 10.1016/j.tiv.2009.07.007. Epub 2009 Jul 10.
7
Epigenetic side-effects of common pharmaceuticals: a potential new field in medicine and pharmacology.常见药物的表观遗传副作用:医学和药理学的一个新潜在领域。
Med Hypotheses. 2009 Nov;73(5):770-80. doi: 10.1016/j.mehy.2008.10.039. Epub 2009 Jun 5.
8
Comparative analysis of machine learning methods in ligand-based virtual screening of large compound libraries.基于配体的大型化合物库虚拟筛选中机器学习方法的比较分析
Comb Chem High Throughput Screen. 2009 May;12(4):344-57. doi: 10.2174/138620709788167944.
9
Genotoxic and carcinogenic effects of antipsychotics and antidepressants.抗精神病药物和抗抑郁药物的遗传毒性和致癌作用。
Toxicology. 2009 Jul 10;261(3):77-88. doi: 10.1016/j.tox.2009.04.056. Epub 2009 May 3.
10
The role of genetic toxicology in drug discovery and optimization.遗传毒理学在药物发现与优化中的作用。
Curr Drug Metab. 2008 Nov;9(9):978-85. doi: 10.2174/138920008786485191.