文献检索，用中文搜 PubMed

The peptide quantitative structure-activity relationship (QSAR), also known as the quantitative sequence-activity model (QSAM), has attracted much attention in the bio- and chemoinformatics communities and is a well developed computational peptidology strategy to statistically correlate the sequence/structure and activity/property relationships of functional peptides. Amino acid descriptors (AADs) are one of the most widely used methods to characterize peptide structures by decomposing the peptide into its residue building blocks and sequentially parametrizing each building block with a vector of amino acid principal properties. Considering that various AADs have been proposed over the past decades and new AADs are still emerging today, we herein query the following: is it necessary to develop so many AADs and do we need to continuously develop more new AADs? In this study, we exhaustively collect 80 published AADs and comprehensively evaluate their modeling performance (including fitting ability, internal stability, and predictive power) on 8 QSAR-oriented peptide sample sets (QPSs) by employing 2 sophisticated machine learning methods (MLMs), totally building and systematically comparing 1280 (80 AADs × 8 QPSs × 2 MLMs) peptide QSAR models. The following is revealed: (i) None of the AADs can work best on all or most peptide sets; an AAD usually performs well for some peptides but badly for others. (ii) Modeling performance is primarily determined by the peptide samples and then the MLMs used, while AADs have only a moderate influence on the performance. (iii) There is no essential difference between the modeling performances of different AAD types (physiochemical, topological, 3D-structural, etc.). (iv) Two random descriptors, which are separately generated randomly in standard normal distribution (0, 1) and uniform distribution (-1, +1), do not perform significantly worse than these carefully developed AADs. (v) A secondary descriptor, which carries major information involved in the 80 (primary) AADs, does not perform significantly better than these AADs. Overall, we conclude that since there are various AADs available to date and they already cover numerous amino acid properties, further development of new AADs is not an essential choice to improve peptide QSAR modeling; the traditional AAD methodology is believed to have almost reached the theoretical limit nowadays. In addition, the AADs are more likely to be a vector symbol but not informative data; they are utilized to mark and distinguish the 20 amino acids but do not really bring much original property information to these amino acids.

肽的定量构效关系（QSAR），也称为定量序列活性模型（QSAM），在生物信息学和化学生物信息学领域引起了广泛关注，是一种成熟的计算肽化学策略，用于统计相关功能肽的序列/结构和活性/性质关系。氨基酸描述符（AAD）是通过将肽分解为其残基构建块并顺序地用氨基酸主要性质的向量参数化来描述肽结构的最广泛使用的方法之一。考虑到过去几十年已经提出了各种 AAD，并且今天仍在不断涌现新的 AAD，我们在此询问：是否有必要开发这么多 AAD，我们是否需要不断开发更多新的 AAD？在这项研究中，我们详尽地收集了 80 种已发表的 AAD，并通过使用 2 种先进的机器学习方法（MLM），在 8 个面向 QSAR 的肽样本集（QPS）上全面评估了它们的建模性能（包括拟合能力、内部稳定性和预测能力），总共构建和系统地比较了 1280 个（80 个 AAD×8 个 QPS×2 个 MLM）肽 QSAR 模型。以下是揭示的结果：（i）没有一种 AAD 可以在所有或大多数肽集上表现最佳；一种 AAD 通常对某些肽表现良好，但对其他肽表现不佳。（ii）建模性能主要由肽样本决定，然后由使用的 MLM 决定，而 AAD 对性能只有适度的影响。（iii）不同 AAD 类型（生理化学、拓扑、3D 结构等）的建模性能没有本质区别。（iv）两个随机描述符，分别在标准正态分布（0，1）和均匀分布（-1，+1）中随机生成，其性能并不明显差于这些精心开发的 AAD。（v）一个二次描述符，包含 80 个（主要）AAD 中涉及的主要信息，其性能并不明显优于这些 AAD。总体而言，我们得出结论，由于目前已经有各种 AAD 可供使用，并且它们已经涵盖了许多氨基酸性质，因此开发新的 AAD 并不是改进肽 QSAR 建模的必要选择；传统的 AAD 方法学如今已接近理论极限。此外，AAD 更有可能是一个向量符号而不是信息数据；它们用于标记和区分 20 种氨基酸，但并没有真正为这些氨基酸带来多少原始属性信息。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

系统比较和综合评价 80 种氨基酸描述符在肽定量构效关系建模中的应用。

Systematic Comparison and Comprehensive Evaluation of 80 Amino Acid Descriptors in Peptide QSAR Modeling.

机构信息

出版信息

相似文献