Zhang Lichao, Kong Liang
School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, China.
College of Sciences, Northeastern University, Shenyang, China.
Protein Pept Lett. 2020;27(4):287-294. doi: 10.2174/0929866526666190718151753.
Amino acid physicochemical properties encoded in protein primary structure play a crucial role in protein folding. However, it is not yet clear which of the properties are the most suitable for protein fold classification.
To avoid exhaustively searching the total properties space, an amino acid properties selection method was proposed in this study to rapidly obtain a suitable properties combination for protein fold classification.
The proposed amino acid properties selection method was based on sequential floating forward selection strategy. Beginning with an empty set, variable number of features were added iteratively until achieving the iteration termination condition.
The experimental results indicate that the proposed method improved prediction accuracies by 0.26-5% on a widely used benchmark dataset with appropriately selected amino acid properties.
The proposed properties selection method can be extended to other biomolecule property related classification problems in bioinformatics.
蛋白质一级结构中编码的氨基酸物理化学性质在蛋白质折叠中起着至关重要的作用。然而,目前尚不清楚哪些性质最适合用于蛋白质折叠分类。
为避免详尽搜索整个性质空间,本研究提出一种氨基酸性质选择方法,以快速获得适合蛋白质折叠分类的性质组合。
所提出的氨基酸性质选择方法基于序列浮动前向选择策略。从空集开始,迭代添加可变数量的特征,直到达到迭代终止条件。
实验结果表明,在广泛使用的基准数据集上,通过适当选择氨基酸性质,所提出的方法可将预测准确率提高0.26%-5%。
所提出的性质选择方法可扩展到生物信息学中其他与生物分子性质相关的分类问题。