Lambev Momchil, Dimitrova Dimana, Mihaylova Silviya
Medical College, Medical University of Varna, 84 Tzar Osvoboditel Str., 9002 Varna, Bulgaria.
Int J Mol Sci. 2025 Aug 29;26(17):8407. doi: 10.3390/ijms26178407.
Peptide therapeutics often fall outside classical small-molecule heuristics, such as Lipinski's Rule of Five (Ro5), motivating the development of adapted filters and data-driven approaches to early drug-likeness assessment. We curated >300 k drug (small and peptide) and non-drug molecules from PubChem, extracted key molecular descriptors with RDKit, and generated three rule-violation counters for Ro5, the peptide-oriented beyond-Ro5 (bRo5) extension, and Muegge's criteria. Random Forest (RF) classifier and regressor models (with 10, 20, and 30 trees) were trained and evaluated. Predictions for 26 peptide test molecules were compared with those from SwissADME, Molinspiration, and manual calculations. Model metrics were uniformly high (Ro5 accuracy/precision/recall = 1.0; Muegge ≈ 0.99), indicating effective learning. Ro5 violation counts matched reference values for 23/26 peptides; the remaining cases differed by +1 violation, reflecting larger structures and platform limits. bRo5 predictions showed near-complete agreement with manual values; minor discrepancies occurred in isolated peptides. Muegge's predictions were internally consistent but tended to underestimate SwissADME by ~1 violation in several molecules. Four peptides (ML13-16) satisfied bRo5 boundaries; three also fully met Ro5. RF models thus provide fast and reliable in silico filters for peptide drug-likeness and can support the prioritisation of orally developable candidates.
肽类疗法通常不符合经典的小分子启发式方法,如Lipinski的五规则(Ro5),这推动了适应性筛选方法和数据驱动方法的发展,以用于早期药物相似性评估。我们从PubChem中整理了超过30万个药物(小分子和肽)及非药物分子,使用RDKit提取关键分子描述符,并为Ro5、面向肽的Ro5扩展(bRo5)和Muegge标准生成了三个规则违反计数器。训练并评估了随机森林(RF)分类器和回归模型(分别有10、20和30棵树)。将26个肽测试分子的预测结果与来自SwissADME、Molinspiration和手动计算的结果进行了比较。模型指标普遍较高(Ro5的准确率/精确率/召回率 = 1.0;Muegge约为0.99),表明学习效果良好。26个肽中有23个的Ro5违反计数与参考值匹配;其余情况相差1次违反,这反映了更大的结构和平台限制。bRo5的预测结果与手动计算值几乎完全一致;在个别肽中出现了微小差异。Muegge的预测在内部是一致的,但在几个分子中往往比SwissADME低估约1次违反。四个肽(ML13 - 16)满足bRo5边界;其中三个也完全符合Ro5。因此,RF模型为肽类药物相似性提供了快速可靠的计算机筛选方法,并可支持对可口服开发候选物的优先级排序。