Suppr超能文献

使用带有最小-最大评分函数的遗传编程在蛋白酶蛋白水解切割活性中寻找判别规则。

Searching for discrimination rules in protease proteolytic cleavage activity using genetic programming with a min-max scoring function.

作者信息

Yang Zheng Rong, Thomson Rebecca, Hodgman T Charles, Dry Jonathan, Doyle Austin K, Narayanan Ajit, Wu XiKun

机构信息

School of Engineering and Computer Science, Exeter University, Northcote House The Queen's Drive, Exeter EX4 4QJ, UK.

出版信息

Biosystems. 2003 Nov;72(1-2):159-76. doi: 10.1016/s0303-2647(03)00141-2.

Abstract

This paper presents an algorithm which is able to extract discriminant rules from oligopeptides for protease proteolytic cleavage activity prediction. The algorithm is developed using genetic programming. Three important components in the algorithm are a min-max scoring function, the reverse Polish notation (RPN) and the use of minimum description length. The min-max scoring function is developed using amino acid similarity matrices for measuring the similarity between an oligopeptide and a rule, which is a complex algebraic equation of amino acids rather than a simple pattern sequence. The Fisher ratio is then calculated on the scoring values using the class label associated with the oligopeptides. The discriminant ability of each rule can therefore be evaluated. The use of RPN makes the evolutionary operations simpler and therefore reduces the computational cost. To prevent overfitting, the concept of minimum description length is used to penalize over-complicated rules. A fitness function is therefore composed of the Fisher ratio and the use of minimum description length for an efficient evolutionary process. In the application to four protease datasets (Trypsin, Factor Xa, Hepatitis C Virus and HIV protease cleavage site prediction), our algorithm is superior to C5, a conventional method for deriving decision trees.

摘要

本文提出了一种算法,该算法能够从寡肽中提取判别规则,用于预测蛋白酶的蛋白水解切割活性。该算法是使用遗传编程开发的。算法中的三个重要组成部分是最小-最大评分函数、逆波兰表示法(RPN)和最小描述长度的使用。最小-最大评分函数是使用氨基酸相似性矩阵开发的,用于测量寡肽与规则之间的相似性,该规则是氨基酸的复杂代数方程,而不是简单的模式序列。然后使用与寡肽相关的类别标签,根据评分值计算费舍尔比率。因此,可以评估每个规则的判别能力。RPN的使用使进化操作更简单,从而降低了计算成本。为了防止过拟合,使用最小描述长度的概念来惩罚过于复杂的规则。因此,适应度函数由费舍尔比率和最小描述长度的使用组成,以实现高效的进化过程。在应用于四个蛋白酶数据集(胰蛋白酶、凝血因子Xa、丙型肝炎病毒和HIV蛋白酶切割位点预测)时,我们的算法优于推导决策树的传统方法C5。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验