Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, 610054 Chengdu, Sichuan, China.
School of Applied Chemistry and Biological Technology, Shenzhen Polytechnic, 518055 Shenzhen, Guangdong, China.
Front Biosci (Landmark Ed). 2022 Jun 2;27(6):177. doi: 10.31083/j.fbl2706177.
Channel proteins are proteins that can transport molecules past the plasma membrane through free diffusion movement. Due to the cost of labor and experimental methods, developing a tool to identify channel proteins is necessary for biological research on channel proteins.
17 feature coding methods and four machine learning classifiers to generate 68-dimensional data probability features. Then, the two-step feature selection strategy was used to optimize the features, and the final prediction Model M16-LGBM (light gradient boosting machine) was obtained on the 16-dimensional optimal feature vector.
A new predictor, CAPs-LGBM, was proposed to identify the channel proteins effectively.
CAPs-LGBM is the first channel protein machine learning predictor was used to construct the final prediction model based on protein primary sequences. The classifier performed well in the training and test sets.
通道蛋白是能够通过自由扩散运动将分子运输过质膜的蛋白质。由于劳动力和实验方法的成本,开发一种识别通道蛋白的工具对于通道蛋白的生物学研究是必要的。
使用 17 种特征编码方法和四种机器学习分类器生成 68 维数据概率特征。然后,使用两步特征选择策略对特征进行优化,最终在 16 维最优特征向量上得到预测模型 M16-LGBM(轻梯度提升机)。
提出了一种新的预测器 CAPs-LGBM,用于有效识别通道蛋白。
CAPs-LGBM 是第一个基于蛋白质一级序列构建最终预测模型的通道蛋白机器学习预测器。该分类器在训练集和测试集上表现良好。