Bhasin Manoj, Raghava Gajendra P S
Institute of Microbial Technology, Chandigarh 160036, India.
J Biol Chem. 2004 May 28;279(22):23262-6. doi: 10.1074/jbc.M401932200. Epub 2004 Mar 23.
Nuclear receptors are key transcription factors that regulate crucial gene networks responsible for cell growth, differentiation, and homeostasis. Nuclear receptors form a superfamily of phylogenetically related proteins and control functions associated with major diseases (e.g. diabetes, osteoporosis, and cancer). In this study, a novel method has been developed for classifying the subfamilies of nuclear receptors. The classification was achieved on the basis of amino acid and dipeptide composition from a sequence of receptors using support vector machines. The training and testing was done on a non-redundant data set of 282 proteins obtained from the NucleaRDB data base (1). The performance of all classifiers was evaluated using a 5-fold cross validation test. In the 5-fold cross-validation, the data set was randomly partitioned into five equal sets and evaluated five times on each distinct set while keeping the remaining four sets for training. It was found that different subfamilies of nuclear receptors were quite closely correlated in terms of amino acid composition as well as dipeptide composition. The overall accuracy of amino acid composition-based and dipeptide composition-based classifiers were 82.6 and 97.5%, respectively. Therefore, our results prove that different subfamilies of nuclear receptors are predictable with considerable accuracy using amino acid or dipeptide composition. Furthermore, based on above approach, an online web service, NRpred, was developed, which is available at www.imtech.res.in/raghava/nrpred.
核受体是关键的转录因子,可调节负责细胞生长、分化和体内平衡的关键基因网络。核受体形成了一个由系统发育相关蛋白质组成的超家族,并控制与主要疾病(如糖尿病、骨质疏松症和癌症)相关的功能。在本研究中,开发了一种用于对核受体亚家族进行分类的新方法。该分类是基于受体序列中的氨基酸和二肽组成,使用支持向量机实现的。训练和测试是在从NucleaRDB数据库(1)获得的282种蛋白质的非冗余数据集上进行的。所有分类器的性能均使用5折交叉验证测试进行评估。在5折交叉验证中,数据集被随机划分为五个相等的集合,并在每个不同的集合上进行五次评估,同时保留其余四个集合用于训练。结果发现,核受体的不同亚家族在氨基酸组成以及二肽组成方面密切相关。基于氨基酸组成和基于二肽组成的分类器的总体准确率分别为82.6%和97.5%。因此,我们的结果证明,使用氨基酸或二肽组成可以相当准确地预测核受体的不同亚家族。此外,基于上述方法,开发了一个在线网络服务NRpred,可在www.imtech.res.in/raghava/nrpred上获取。