Burggraaff Lindsey, Oranje Paul, Gouka Robin, van der Pijl Pieter, Geldof Marian, van Vlijmen Herman W T, IJzerman Adriaan P, van Westen Gerard J P
Division of Drug Discovery & Safety, Leiden Academic Centre for Drug Research, Leiden University, Einsteinweg 55, 2333 CC, Leiden, The Netherlands.
Unilever Research & Development, Olivier van Noortlaan 120, 3133 AT, Vlaardingen, The Netherlands.
J Cheminform. 2019 Feb 14;11(1):15. doi: 10.1186/s13321-019-0337-8.
Sodium-dependent glucose co-transporter 1 (SGLT1) is a solute carrier responsible for active glucose absorption. SGLT1 is present in both the renal tubules and small intestine. In contrast, the closely related sodium-dependent glucose co-transporter 2 (SGLT2), a protein that is targeted in the treatment of diabetes type II, is only expressed in the renal tubules. Although dual inhibitors for both SGLT1 and SGLT2 have been developed, no drugs on the market are targeted at decreasing dietary glucose uptake by SGLT1 in the gastrointestinal tract. Here we aim at identifying SGLT1 inhibitors in silico by applying a machine learning approach that does not require structural information, which is absent for SGLT1. We applied proteochemometrics by implementation of compound- and protein-based information into random forest models. We obtained a predictive model with a sensitivity of 0.64 ± 0.06, specificity of 0.93 ± 0.01, positive predictive value of 0.47 ± 0.07, negative predictive value of 0.96 ± 0.01, and Matthews correlation coefficient of 0.49 ± 0.05. Subsequent to model training, we applied our model in virtual screening to identify novel SGLT1 inhibitors. Of the 77 tested compounds, 30 were experimentally confirmed for SGLT1-inhibiting activity in vitro, leading to a hit rate of 39% with activities in the low micromolar range. Moreover, the hit compounds included novel molecules, which is reflected by the low similarity of these compounds with the training set (< 0.3). Conclusively, proteochemometric modeling of SGLT1 is a viable strategy for identifying active small molecules. Therefore, this method may also be applied in detection of novel small molecules for other transporter proteins.
钠依赖性葡萄糖协同转运蛋白1(SGLT1)是一种负责主动葡萄糖吸收的溶质载体。SGLT1存在于肾小管和小肠中。相比之下,密切相关的钠依赖性葡萄糖协同转运蛋白2(SGLT2),一种用于治疗II型糖尿病的靶向蛋白,仅在肾小管中表达。尽管已经开发出SGLT1和SGLT2的双重抑制剂,但市场上没有药物旨在降低胃肠道中SGLT1对膳食葡萄糖的摄取。在这里,我们旨在通过应用一种不需要结构信息的机器学习方法在计算机上识别SGLT1抑制剂,而SGLT1不存在结构信息。我们通过将基于化合物和蛋白质的信息实施到随机森林模型中来应用蛋白质化学计量学。我们获得了一个预测模型,其灵敏度为0.64±0.06,特异性为0.93±0.01,阳性预测值为0.47±0.07,阴性预测值为0.96±0.01,马修斯相关系数为0.49±0.05。在模型训练之后,我们将我们的模型应用于虚拟筛选以识别新型SGLT1抑制剂。在77种测试化合物中,有30种在体外实验中被证实具有SGLT1抑制活性,导致命中率为39%,活性在低微摩尔范围内。此外,命中的化合物包括新型分子,这通过这些化合物与训练集的低相似性(<0.3)得到体现。总之,SGLT1的蛋白质化学计量学建模是识别活性小分子的可行策略。因此,这种方法也可应用于检测其他转运蛋白的新型小分子。