IEEE/ACM Trans Comput Biol Bioinform. 2018 Nov-Dec;15(6):1844-1852. doi: 10.1109/TCBB.2017.2773063. Epub 2017 Nov 14.
The Nuclear Receptor (NR) superfamily plays an important role in key biological, developmental, and physiological processes. Developing a method for the classification of NR proteins is an important step towards understanding the structure and functions of the newly discovered NR protein. The recent studies on NR classification are either unable to achieve optimum accuracy or are not designed for all the known NR subfamilies. In this study, we developed RF-NR, which is a Random Forest based approach for improved classification of nuclear receptors. The RF-NR can predict whether a query protein sequence belongs to one of the eight NR subfamilies or it is a non-NR sequence. The RF-NR uses spectrum-like features namely: Amino Acid Composition, Di-peptide Composition, and Tripeptide Composition. Benchmarking on two independent datasets with varying sequence redundancy reduction criteria, the RF-NR achieves better (or comparable) accuracy than other existing methods. The added advantage of our approach is that we can also obtain biological insights about the important features that are required to classify NR subfamilies. RF-NR is freely available at http://bcb.ncat.edu/RF_NR.
核受体 (NR) 超家族在关键的生物、发育和生理过程中发挥着重要作用。开发一种 NR 蛋白分类方法是理解新发现的 NR 蛋白结构和功能的重要步骤。最近关于 NR 分类的研究要么无法达到最佳准确性,要么不是为所有已知的 NR 亚家族设计的。在这项研究中,我们开发了 RF-NR,这是一种基于随机森林的方法,用于改进核受体的分类。RF-NR 可以预测查询蛋白序列是否属于八个 NR 亚家族之一,或者它是否是非 NR 序列。RF-NR 使用类似于光谱的特征,即:氨基酸组成、二肽组成和三肽组成。在具有不同序列冗余减少标准的两个独立数据集上进行基准测试,RF-NR 的准确性优于其他现有方法(或可与之媲美)。我们方法的另一个优点是,我们还可以获得关于分类 NR 亚家族所需的重要特征的生物学见解。RF-NR 可在 http://bcb.ncat.edu/RF_NR 免费获得。