Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa094.
Scoring functions (SFs) based on complex machine learning (ML) algorithms have gradually emerged as a promising alternative to overcome the weaknesses of classical SFs. However, extensive efforts have been devoted to the development of SFs based on new protein-ligand interaction representations and advanced alternative ML algorithms instead of the energy components obtained by the decomposition of existing SFs. Here, we propose a new method named energy auxiliary terms learning (EATL), in which the scoring components are extracted and used as the input for the development of three levels of ML SFs including EATL SFs, docking-EATL SFs and comprehensive SFs with ascending VS performance. The EATL approach not only outperforms classical SFs for the absolute performance (ROC) and initial enrichment (BEDROC) but also yields comparable performance compared with other advanced ML-based methods on the diverse subset of Directory of Useful Decoys: Enhanced (DUD-E). The test on the relatively unbiased actives as decoys (AD) dataset also proved the effectiveness of EATL. Furthermore, the idea of learning from SF components to yield improved screening power can also be extended to other docking programs and SFs available.
基于复杂机器学习 (ML) 算法的评分函数 (SFs) 逐渐成为克服经典 SFs 弱点的一种有前途的替代方法。然而,人们已经投入了大量精力来开发基于新的蛋白质-配体相互作用表示和先进的替代 ML 算法的 SFs,而不是从现有 SFs 分解获得的能量分量。在这里,我们提出了一种名为能量辅助项学习 (EATL) 的新方法,其中提取了评分组件并将其用作开发三个层次的 ML SFs 的输入,包括 EATL SFs、对接-EATL SFs 和综合 SFs,其 VS 性能呈上升趋势。EATL 方法不仅在绝对性能 (ROC) 和初始富集 (BEDROC) 方面优于经典 SFs,而且在 DUD-E 的不同子集上与其他基于先进 ML 的方法相比也具有可比的性能。在相对无偏的活性作为诱饵 (AD) 数据集上的测试也证明了 EATL 的有效性。此外,从 SF 组件中学习以获得改进的筛选能力的想法也可以扩展到其他可用的对接程序和 SFs。