• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用GPU加速概率分类器从二维拓扑指纹预测细胞色素P450代谢位点

Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers.

作者信息

Tyzack Jonathan D, Mussa Hamse Y, Williamson Mark J, Kirchmair Johannes, Glen Robert C

机构信息

Unilever Centre for Molecular Science Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, CB2 1EW Cambridge, UK.

ETH Zurich, Department of Chemistry and Applied Biosciences, Institute of Pharmaceutical Sciences, HCI G 474.2, Vladimir-Prelog-Weg 1-5/10, 8093 Zurich, Switzerland.

出版信息

J Cheminform. 2014 May 27;6:29. doi: 10.1186/1758-2946-6-29. eCollection 2014.

DOI:10.1186/1758-2946-6-29
PMID:24959208
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4047555/
Abstract

BACKGROUND

The prediction of sites and products of metabolism in xenobiotic compounds is key to the development of new chemical entities, where screening potential metabolites for toxicity or unwanted side-effects is of crucial importance. In this work 2D topological fingerprints are used to encode atomic sites and three probabilistic machine learning methods are applied: Parzen-Rosenblatt Window (PRW), Naive Bayesian (NB) and a novel approach called RASCAL (Random Attribute Subsampling Classification ALgorithm). These are implemented by randomly subsampling descriptor space to alleviate the problem often suffered by data mining methods of having to exactly match fingerprints, and in the case of PRW by measuring a distance between feature vectors rather than exact matching. The classifiers have been implemented in CUDA/C++ to exploit the parallel architecture of graphical processing units (GPUs) and is freely available in a public repository.

RESULTS

It is shown that for PRW a SoM (Site of Metabolism) is identified in the top two predictions for 85%, 91% and 88% of the CYP 3A4, 2D6 and 2C9 data sets respectively, with RASCAL giving similar performance of 83%, 91% and 88%, respectively. These results put PRW and RASCAL performance ahead of NB which gave a much lower classification performance of 51%, 73% and 74%, respectively.

CONCLUSIONS

2D topological fingerprints calculated to a bond depth of 4-6 contain sufficient information to allow the identification of SoMs using classifiers based on relatively small data sets. Thus, the machine learning methods outlined in this paper are conceptually simpler and more efficient than other methods tested and the use of simple topological descriptors derived from 2D structure give results competitive with other approaches using more expensive quantum chemical descriptors. The descriptor space subsampling approach and ensemble methodology allow the methods to be applied to molecules more distant from the training data where data mining would be more likely to fail due to the lack of common fingerprints. The RASCAL algorithm is shown to give equivalent classification performance to PRW but at lower computational expense allowing it to be applied more efficiently in the ensemble scheme.

摘要

背景

预测异源生物化合物的代谢位点和产物是新化学实体开发的关键,其中筛选潜在代谢物的毒性或不良副作用至关重要。在这项工作中,使用二维拓扑指纹对原子位点进行编码,并应用了三种概率机器学习方法:Parzen-Rosenblatt窗口法(PRW)、朴素贝叶斯法(NB)和一种名为RASCAL(随机属性子采样分类算法)的新方法。这些方法通过对描述符空间进行随机子采样来实现,以缓解数据挖掘方法经常遇到的必须精确匹配指纹的问题,对于PRW方法,通过测量特征向量之间的距离而不是精确匹配来实现。这些分类器已在CUDA/C++中实现,以利用图形处理单元(GPU)的并行架构,并且可在公共存储库中免费获取。

结果

结果表明,对于PRW,在CYP 3A4、2D6和2C9数据集的前两个预测中分别有85%、91%和88%识别出代谢位点(SoM),RASCAL的表现与之相似,分别为83%、91%和88%。这些结果使PRW和RASCAL的性能优于NB,NB的分类性能要低得多,分别为51%、73%和74%。

结论

计算到键深度为4 - 6的二维拓扑指纹包含足够的信息,能够使用基于相对较小数据集建立的分类器来识别代谢位点。因此,本文概述的机器学习方法在概念上比其他测试方法更简单、更高效,并且使用从二维结构导出的简单拓扑描述符所得到的结果与使用更昂贵的量子化学描述符的其他方法具有竞争力。描述符空间子采样方法和集成方法使这些方法能够应用于与训练数据差异更大的分子,在这种情况下,由于缺乏共同指纹,数据挖掘更有可能失败。结果表明,RASCAL算法与PRW具有同等的分类性能,但计算成本更低,使其能够在集成方案中更高效地应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a5d/4047555/dfa829a269cb/1758-2946-6-29-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a5d/4047555/d386d860306c/1758-2946-6-29-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a5d/4047555/27f03663b05e/1758-2946-6-29-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a5d/4047555/28b1f6205577/1758-2946-6-29-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a5d/4047555/4ecd57d4f9ea/1758-2946-6-29-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a5d/4047555/679ad757965c/1758-2946-6-29-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a5d/4047555/dfa829a269cb/1758-2946-6-29-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a5d/4047555/d386d860306c/1758-2946-6-29-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a5d/4047555/27f03663b05e/1758-2946-6-29-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a5d/4047555/28b1f6205577/1758-2946-6-29-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a5d/4047555/4ecd57d4f9ea/1758-2946-6-29-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a5d/4047555/679ad757965c/1758-2946-6-29-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8a5d/4047555/dfa829a269cb/1758-2946-6-29-6.jpg

相似文献

1
Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers.使用GPU加速概率分类器从二维拓扑指纹预测细胞色素P450代谢位点
J Cheminform. 2014 May 27;6:29. doi: 10.1186/1758-2946-6-29. eCollection 2014.
2
Similarity searching of chemical databases using atom environment descriptors (MOLPRINT 2D): evaluation of performance.使用原子环境描述符(MOLPRINT 2D)对化学数据库进行相似性搜索:性能评估
J Chem Inf Comput Sci. 2004 Sep-Oct;44(5):1708-18. doi: 10.1021/ci0498719.
3
FAME 2: Simple and Effective Machine Learning Model of Cytochrome P450 Regioselectivity.FAME 2:细胞色素P450区域选择性的简单有效机器学习模型。
J Chem Inf Model. 2017 Aug 28;57(8):1832-1846. doi: 10.1021/acs.jcim.7b00250. Epub 2017 Aug 7.
4
Novel Electrotopological Atomic Descriptors for the Prediction of Xenobiotic Cytochrome P450 Reactions.新型电拓扑原子描述符用于预测外源物质细胞色素 P450 反应。
Mol Inform. 2019 Oct;38(10):e1900010. doi: 10.1002/minf.201900010. Epub 2019 Jun 12.
5
Metabolism site prediction based on xenobiotic structural formulas and PASS prediction algorithm.基于外源化学物结构式和 PASS 预测算法的代谢部位预测。
J Chem Inf Model. 2014 Feb 24;54(2):498-507. doi: 10.1021/ci400472j. Epub 2014 Jan 17.
6
In silico target predictions: defining a benchmarking data set and comparison of performance of the multiclass Naïve Bayes and Parzen-Rosenblatt window.计算机辅助靶点预测:定义基准数据集和多类朴素贝叶斯与 Parzen-Rosenblatt 窗口性能比较。
J Chem Inf Model. 2013 Aug 26;53(8):1957-66. doi: 10.1021/ci300435j. Epub 2013 Jul 24.
7
Prediction of cytochrome P450 xenobiotic metabolism: tethered docking and reactivity derived from ligand molecular orbital analysis.预测细胞色素 P450 异生物质代谢:基于配体分子轨道分析的系留对接和反应性。
J Chem Inf Model. 2013 Jun 24;53(6):1294-305. doi: 10.1021/ci400058s. Epub 2013 May 24.
8
Classification of cytochrome P450 inhibitors and noninhibitors using combined classifiers.使用组合分类器对细胞色素 P450 抑制剂和非抑制剂进行分类。
J Chem Inf Model. 2011 May 23;51(5):996-1011. doi: 10.1021/ci200028n. Epub 2011 Apr 14.
9
Improved Prediction of CYP-Mediated Metabolism with Chemical Fingerprints.利用化学指纹图谱提高 CYP 介导代谢的预测能力。
J Chem Inf Model. 2015 May 26;55(5):972-82. doi: 10.1021/ci5005652. Epub 2015 May 8.
10
QuBiLS-MAS, open source multi-platform software for atom- and bond-based topological (2D) and chiral (2.5D) algebraic molecular descriptors computations.QuBiLS-MAS,一款用于基于原子和键的拓扑(二维)和手性(2.5维)代数分子描述符计算的开源多平台软件。
J Cheminform. 2017 Jun 7;9(1):35. doi: 10.1186/s13321-017-0211-5.

引用本文的文献

1
Prediction of UGT-mediated Metabolism Using the Manually Curated MetaQSAR Database.使用人工整理的MetaQSAR数据库预测UGT介导的代谢
ACS Med Chem Lett. 2019 Feb 12;10(4):633-638. doi: 10.1021/acsmedchemlett.8b00603. eCollection 2019 Apr 11.
2
Computational methods and tools to predict cytochrome P450 metabolism for drug discovery.用于药物发现的预测细胞色素 P450 代谢的计算方法和工具。
Chem Biol Drug Des. 2019 Apr;93(4):377-386. doi: 10.1111/cbdd.13445. Epub 2019 Jan 15.
3
QNA-Based Prediction of Sites of Metabolism.基于问答的代谢部位预测。

本文引用的文献

1
Quantitative Prediction of Regioselectivity Toward Cytochrome P450/3A4 Using Machine Learning Approaches.使用机器学习方法对细胞色素P450/3A4区域选择性的定量预测
Mol Inform. 2010 Mar 15;29(3):243-9. doi: 10.1002/minf.200900086. Epub 2010 Mar 9.
2
SMARTCyp: A 2D Method for Prediction of Cytochrome P450-Mediated Drug Metabolism.SMARTCyp:一种预测细胞色素P450介导的药物代谢的二维方法。
ACS Med Chem Lett. 2010 Mar 15;1(3):96-100. doi: 10.1021/ml100016x. eCollection 2010 Jun 10.
3
Metabolism site prediction based on xenobiotic structural formulas and PASS prediction algorithm.
Molecules. 2017 Dec 1;22(12):2123. doi: 10.3390/molecules22122123.
4
G.A.M.E.: GPU-accelerated mixture elucidator.G.A.M.E.:GPU加速混合物解析器。
J Cheminform. 2017 Sep 15;9(1):50. doi: 10.1186/s13321-017-0238-7.
5
Predicting the Metabolic Sites by Flavin-Containing Monooxygenase on Drug Molecules Using SVM Classification on Computed Quantum Mechanics and Circular Fingerprints Molecular Descriptors.利用计算量子力学和圆形指纹分子描述符,通过支持向量机分类法预测含黄素单加氧酶在药物分子上的代谢位点。
PLoS One. 2017 Jan 10;12(1):e0169910. doi: 10.1371/journal.pone.0169910. eCollection 2017.
6
Prediction of reacting atoms for the major biotransformation reactions of organic xenobiotics.有机外源性物质主要生物转化反应中反应原子的预测。
J Cheminform. 2016 Nov 28;8:68. doi: 10.1186/s13321-016-0183-x. eCollection 2016.
7
Developing a Physiologically-Based Pharmacokinetic Model Knowledgebase in Support of Provisional Model Construction.开发基于生理学的药代动力学模型知识库以支持临时模型构建。
PLoS Comput Biol. 2016 Feb 12;12(2):e1004495. doi: 10.1371/journal.pcbi.1004495. eCollection 2016 Feb.
8
Cheminformatics Research at the Unilever Centre for Molecular Science Informatics Cambridge.剑桥联合利华分子科学信息学中心的化学信息学研究。
Mol Inform. 2015 Sep;34(9):626-633. doi: 10.1002/minf.201400166. Epub 2015 Mar 10.
9
Modeling of interactions between xenobiotics and cytochrome P450 (CYP) enzymes.异生物素与细胞色素P450(CYP)酶之间相互作用的建模。
Front Pharmacol. 2015 Jun 12;6:123. doi: 10.3389/fphar.2015.00123. eCollection 2015.
10
Verifying the fully "Laplacianised" posterior Naïve Bayesian approach and more.验证完全“拉普拉斯化”的后验朴素贝叶斯方法等等。
J Cheminform. 2015 Jun 12;7:27. doi: 10.1186/s13321-015-0075-5. eCollection 2015.
基于外源化学物结构式和 PASS 预测算法的代谢部位预测。
J Chem Inf Model. 2014 Feb 24;54(2):498-507. doi: 10.1021/ci400472j. Epub 2014 Jan 17.
4
XenoSite: accurately predicting CYP-mediated sites of metabolism with neural networks.XenoSite:利用神经网络准确预测细胞色素P450介导的代谢位点。
J Chem Inf Model. 2013 Dec 23;53(12):3373-83. doi: 10.1021/ci400518g. Epub 2013 Nov 23.
5
FAst MEtabolizer (FAME): A rapid and accurate predictor of sites of metabolism in multiple species by endogenous enzymes.快速代谢物预测器(FAME):一种通过内源性酶快速准确预测多种物种代谢部位的方法。
J Chem Inf Model. 2013 Nov 25;53(11):2896-907. doi: 10.1021/ci400503s. Epub 2013 Nov 12.
6
Full "Laplacianised" posterior naive Bayesian algorithm.全拉普拉斯化后验朴素贝叶斯算法。
J Cheminform. 2013 Aug 23;5(1):37. doi: 10.1186/1758-2946-5-37.
7
Prediction of cytochrome P450 xenobiotic metabolism: tethered docking and reactivity derived from ligand molecular orbital analysis.预测细胞色素 P450 异生物质代谢:基于配体分子轨道分析的系留对接和反应性。
J Chem Inf Model. 2013 Jun 24;53(6):1294-305. doi: 10.1021/ci400058s. Epub 2013 May 24.
8
Development of a computational tool to rival experts in the prediction of sites of metabolism of xenobiotics by p450s.开发一种计算工具,以在预测细胞色素 P450 对外源化合物代谢部位方面与专家相媲美。
J Chem Inf Model. 2012 Sep 24;52(9):2471-83. doi: 10.1021/ci3003073. Epub 2012 Sep 4.
9
RS-Predictor models augmented with SMARTCyp reactivities: robust metabolic regioselectivity predictions for nine CYP isozymes.基于 SMARTCyp 反应性增强的 RS-Predictor 模型:对九种 CYP 同工酶进行稳健的代谢区域选择性预测。
J Chem Inf Model. 2012 Jun 25;52(6):1637-59. doi: 10.1021/ci300009z. Epub 2012 May 29.
10
Computational prediction of metabolism: sites, products, SAR, P450 enzyme dynamics, and mechanisms.计算预测代谢:部位、产物、SAR、P450 酶动力学和机制。
J Chem Inf Model. 2012 Mar 26;52(3):617-48. doi: 10.1021/ci200542m. Epub 2012 Feb 17.