Suppr超能文献

蛋白质工程中回归模型的系统分析。

A systematic analysis of regression models for protein engineering.

机构信息

Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.

Department of Chemistry, University of Copenhagen, Copenhagen, Denmark.

出版信息

PLoS Comput Biol. 2024 May 3;20(5):e1012061. doi: 10.1371/journal.pcbi.1012061. eCollection 2024 May.

Abstract

To optimize proteins for particular traits holds great promise for industrial and pharmaceutical purposes. Machine Learning is increasingly applied in this field to predict properties of proteins, thereby guiding the experimental optimization process. A natural question is: How much progress are we making with such predictions, and how important is the choice of regressor and representation? In this paper, we demonstrate that different assessment criteria for regressor performance can lead to dramatically different conclusions, depending on the choice of metric, and how one defines generalization. We highlight the fundamental issues of sample bias in typical regression scenarios and how this can lead to misleading conclusions about regressor performance. Finally, we make the case for the importance of calibrated uncertainty in this domain.

摘要

为特定性状优化蛋白质在工业和制药方面具有很大的应用前景。机器学习在该领域的应用越来越广泛,用于预测蛋白质的性质,从而指导实验优化过程。一个自然的问题是:我们在这些预测方面取得了多大的进展,以及回归器和表示的选择有多重要?在本文中,我们证明了不同的回归器性能评估标准可能会导致根据度量标准和广义化的定义不同而产生截然不同的结论。我们强调了典型回归场景中样本偏差的基本问题,以及这如何导致对回归器性能的误导性结论。最后,我们提出了在这个领域中校准不确定性的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1e1b/11095727/d87ec7d57fb0/pcbi.1012061.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验