Applied BioSciences, Macquarie University, Sydney, NSW 2109, Australia.
Department of Molecular Genetics and Microbiology, Duke University School of Medicine, Durham, NC 27710, USA.
Int J Mol Sci. 2021 Oct 26;22(21):11546. doi: 10.3390/ijms222111546.
Olfactory receptors (ORs) constitute the largest superfamily of G protein-coupled receptors (GPCRs). ORs are involved in sensing odorants as well as in other ectopic roles in non-nasal tissues. Matching of an enormous number of the olfactory stimulation repertoire to its counterpart OR through machine learning (ML) will enable understanding of olfactory system, receptor characterization, and exploitation of their therapeutic potential. In the current study, we have selected two broadly tuned ectopic human OR proteins, OR1A1 and OR2W1, for expanding their known chemical space by using molecular descriptors. We present a scheme for selecting the optimal features required to train an ML-based model, based on which we selected the random forest (RF) as the best performer. High activity agonist prediction involved screening five databases comprising ~23 M compounds, using the trained RF classifier. To evaluate the effectiveness of the machine learning based virtual screening and check receptor binding site compatibility, we used docking of the top target ligands to carefully develop receptor model structures. Finally, experimental validation of selected compounds with significant docking scores through in vitro assays revealed two high activity novel agonists for OR1A1 and one for OR2W1.
嗅觉受体(ORs)构成了 G 蛋白偶联受体(GPCRs)最大的超家族。ORs 参与嗅觉感受器的感应,以及在非鼻腔组织中的其他异位作用。通过机器学习(ML)将大量嗅觉刺激谱与其对应的 OR 相匹配,将能够理解嗅觉系统、受体特征,并利用其治疗潜力。在本研究中,我们选择了两种广泛调谐的异位人 OR 蛋白 OR1A1 和 OR2W1,通过使用分子描述符来扩展其已知的化学空间。我们提出了一种选择训练基于 ML 的模型所需的最佳特征的方案,基于该方案,我们选择了随机森林(RF)作为最佳表现者。高活性激动剂预测涉及使用训练有素的 RF 分类器筛选包含约 2300 万个化合物的五个数据库。为了评估基于机器学习的虚拟筛选的有效性并检查受体结合位点的兼容性,我们使用对接来仔细开发受体模型结构。最后,通过体外测定对具有显著对接分数的选定化合物进行实验验证,发现了两种对 OR1A1 具有高活性的新型激动剂和一种对 OR2W1 具有高活性的新型激动剂。