Suppr超能文献

AutoPeptideML:关于如何构建更可信的肽生物活性预测器的研究。

AutoPeptideML: a study on how to build more trustworthy peptide bioactivity predictors.

机构信息

IBM Research, Dublin, Dublin D15 HN66, Ireland.

School of Medicine, University College Dublin, Dublin D04 C1P1, Ireland.

出版信息

Bioinformatics. 2024 Sep 2;40(9). doi: 10.1093/bioinformatics/btae555.

Abstract

MOTIVATION

Automated machine learning (AutoML) solutions can bridge the gap between new computational advances and their real-world applications by enabling experimental scientists to build their own custom models. We examine different steps in the development life-cycle of peptide bioactivity binary predictors and identify key steps where automation cannot only result in a more accessible method, but also more robust and interpretable evaluation leading to more trustworthy models.

RESULTS

We present a new automated method for drawing negative peptides that achieves better balance between specificity and generalization than current alternatives. We study the effect of homology-based partitioning for generating the training and testing data subsets and demonstrate that model performance is overestimated when no such homology correction is used, which indicates that prior studies may have overestimated their performance when applied to new peptide sequences. We also conduct a systematic analysis of different protein language models as peptide representation methods and find that they can serve as better descriptors than a naive alternative, but that there is no significant difference across models with different sizes or algorithms. Finally, we demonstrate that an ensemble of optimized traditional machine learning algorithms can compete with more complex neural network models, while being more computationally efficient. We integrate these findings into AutoPeptideML, an easy-to-use AutoML tool to allow researchers without a computational background to build new predictive models for peptide bioactivity in a matter of minutes.

AVAILABILITY AND IMPLEMENTATION

Source code, documentation, and data are available at https://github.com/IBM/AutoPeptideML and a dedicated web-server at http://peptide.ucd.ie/AutoPeptideML. A static version of the software to ensure the reproduction of the results is available at https://zenodo.org/records/13363975.

摘要

动机

自动化机器学习 (AutoML) 解决方案可以通过使实验科学家能够构建自己的定制模型,在新的计算进展与其实际应用之间架起桥梁。我们检查了肽生物活性二分类预测器的开发生命周期中的不同步骤,并确定了自动化不仅可以使方法更易于使用,而且可以实现更稳健和可解释的评估,从而导致更值得信赖的模型的关键步骤。

结果

我们提出了一种新的自动绘制负肽的方法,在特异性和泛化性之间取得了更好的平衡,优于当前的替代方法。我们研究了基于同源性分区生成训练和测试数据子集的效果,并证明当不使用这种同源性校正时,模型性能会被高估,这表明先前的研究在将其应用于新的肽序列时可能高估了它们的性能。我们还对不同的蛋白质语言模型作为肽表示方法进行了系统分析,发现它们可以作为比简单替代方法更好的描述符,但不同大小或算法的模型之间没有显著差异。最后,我们证明了一组优化的传统机器学习算法可以与更复杂的神经网络模型竞争,同时具有更高的计算效率。我们将这些发现集成到 AutoPeptideML 中,这是一个易于使用的 AutoML 工具,允许没有计算背景的研究人员在几分钟内为肽生物活性构建新的预测模型。

可用性和实现

源代码、文档和数据可在 https://github.com/IBM/AutoPeptideML 上获得,专用网络服务器可在 http://peptide.ucd.ie/AutoPeptideML 上获得。为确保结果的可重现性,提供了软件的静态版本,可在 https://zenodo.org/records/13363975 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cfd0/11438549/b9eabba02868/btae555f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验