He Kedan
Physical Sciences, Eastern Connecticut State University, 83 Windham St, Willimantic, CT, 06226, USA.
J Cheminform. 2022 Jun 7;14(1):35. doi: 10.1186/s13321-022-00607-6.
Facing the continuous emergence of new psychoactive substances (NPS) and their threat to public health, more effective methods for NPS prediction and identification are critical. In this study, the pharmacological affinity fingerprints (Ph-fp) of NPS compounds were predicted by Random Forest classification models using bioactivity data from the ChEMBL database. The binary Ph-fp is the vector consisting of a compound's activity against a list of molecular targets reported to be responsible for the pharmacological effects of NPS. Their performance in similarity searching and unsupervised clustering was assessed and compared to 2D structure fingerprints Morgan and MACCS (1024-bits ECFP4 and 166-bits SMARTS-based MACCS implementation of RDKit). The performance in retrieving compounds according to their pharmacological categorizations is influenced by the predicted active assay counts in Ph-fp and the choice of similarity metric. Overall, the comparative unsupervised clustering analysis suggests the use of a classification model with Morgan fingerprints as input for the construction of Ph-fp. This combination gives satisfactory clustering performance based on external and internal clustering validation indices.
面对新型精神活性物质(NPS)的不断涌现及其对公众健康的威胁,更有效的NPS预测和识别方法至关重要。在本研究中,利用来自ChEMBL数据库的生物活性数据,通过随机森林分类模型预测了NPS化合物的药理亲和指纹(Ph-fp)。二元Ph-fp是由化合物针对一系列据报道对NPS药理作用负责的分子靶点的活性组成的向量。评估了它们在相似性搜索和无监督聚类中的性能,并与二维结构指纹Morgan和MACCS(1024位ECFP4和基于166位基于SMARTS的RDKit MACCS实现)进行了比较。根据药理分类检索化合物的性能受Ph-fp中预测的活性测定计数和相似性度量选择的影响。总体而言,比较无监督聚类分析表明,使用以Morgan指纹作为输入构建Ph-fp的分类模型。基于外部和内部聚类验证指标,这种组合给出了令人满意的聚类性能。