Du Qi-Shi, Jiang Zhi-Qin, He Wen-Zhang, Li Da-Peng, Chou Kou-Chen
Tianjin University of Technology and Education, Mathematical Department, Liulin East, Hexi District, Tianjin, 300222, China.
J Biomol Struct Dyn. 2006 Jun;23(6):635-40. doi: 10.1080/07391102.2006.10507088.
The extremely complicated nature of many biological problems makes them bear the features of fuzzy sets, such as with vague, imprecise, noisy, ambiguous, or input-missing information For instance, the current data in classifying protein structural classes are typically a fuzzy set To deal with this kind of problem, the AAPCA (Amino Acid Principal Component Analysis) approach was introduced. In the AAPCA approach the 20-dimensional amino acid composition space is reduced to an orthogonal space with fewer dimensions, and the original base functions are converted into a set of orthogonal and normalized base functions The advantage of such an approach is that it can minimize the random errors and redundant information in protein dataset through a principal component selection, remarkably improving the success rates in predicting protein structural classes It is anticipated that the AAPCA approach can be used to deal with many other classification problems in proteins as well.
许多生物学问题极其复杂,使其具有模糊集的特征,例如存在模糊、不精确、有噪声、含混或信息缺失等情况。例如,当前蛋白质结构类别的分类数据通常就是一个模糊集。为处理这类问题,引入了氨基酸主成分分析(AAPCA)方法。在AAPCA方法中,20维的氨基酸组成空间被缩减为一个维度更少的正交空间,原始基函数被转换为一组正交且归一化的基函数。这种方法的优点是,它可以通过主成分选择将蛋白质数据集中的随机误差和冗余信息降至最低,显著提高预测蛋白质结构类别的成功率。预计AAPCA方法也可用于处理蛋白质中的许多其他分类问题。