Suppr超能文献

使用变量选择方法鉴定植物五肽重复序列蛋白

Identifying Plant Pentatricopeptide Repeat Proteins Using a Variable Selection Method.

作者信息

Zhao Xudong, Wang Hanxu, Li Hangyu, Wu Yiming, Wang Guohua

机构信息

College of Information and Computer Engineering, Northeast Forestry University, Harbin, China.

State Key Laboratory of Tree Genetics and Breeding, Northeast Forestry University, Harbin, China.

出版信息

Front Plant Sci. 2021 Mar 1;12:506681. doi: 10.3389/fpls.2021.506681. eCollection 2021.

Abstract

Pentatricopeptide repeat (PPR), which is a triangular pentapeptide repeat domain, plays an important role in plant growth. Features extracted from sequences are applicable to PPR protein identification using certain classification methods. However, which components of a multidimensional feature (namely variables) are more effective for protein discrimination has never been discussed. Therefore, we seek to select variables from a multidimensional feature for identifying PPR proteins. A framework of variable selection for identifying PPR proteins is proposed. Samples representing PPR positive proteins and negative ones are equally split into a training and a testing set. Variable importance is regarded as scores derived from an iteration of resampling, training, and scoring step on the training set. A model selection method based on Gaussian mixture model is applied to automatic choice of variables which are effective to identify PPR proteins. Measurements are used on the testing set to show the effectiveness of the selected variables. Certain variables other than the multidimensional feature they belong to do work for discrimination between PPR positive proteins and those negative ones. In addition, the content of methionine may play an important role in predicting PPR proteins.

摘要

五肽重复序列(PPR)是一种三角形五肽重复结构域,在植物生长中起着重要作用。从序列中提取的特征适用于使用某些分类方法进行PPR蛋白鉴定。然而,多维特征的哪些组成部分(即变量)对蛋白质鉴别更有效从未被讨论过。因此,我们试图从多维特征中选择变量来鉴定PPR蛋白。提出了一种鉴定PPR蛋白的变量选择框架。代表PPR阳性蛋白和阴性蛋白的样本被平均分为训练集和测试集。变量重要性被视为从训练集上的重采样、训练和评分步骤的迭代中得出的分数。应用基于高斯混合模型的模型选择方法来自动选择对鉴定PPR蛋白有效的变量。在测试集上进行测量以显示所选变量的有效性。除了它们所属的多维特征之外,某些变量确实对PPR阳性蛋白和阴性蛋白之间的鉴别起作用。此外,甲硫氨酸的含量可能在预测PPR蛋白中起重要作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/09b0/7957076/952d0a8ac342/fpls-12-506681-g0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验