Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.
Department of Computer Science, Abdul Wali Khan University Mardan, Pakistan; School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, China.
Genomics. 2020 Jan;112(1):276-285. doi: 10.1016/j.ygeno.2019.02.006. Epub 2019 Feb 16.
Nuclear receptor proteins (NRPs) perform a vital role in regulating gene expression. With the rapidity growth of NRPs in post-genomic era, it is highly recommendable to identify NRPs and their sub-families accurately from their primary sequences. Several conventional methods have been used for discrimination of NRPs and their sub-families, but did not achieve considerable results. In a sequel, a two-level new computational model "iNR-2 L" is developed. Two discrete methods namely: Dipeptide Composition and Tripeptide Composition were used to formulate NRPs sequences. Further, both the descriptor spaces were merged to construct hybrid space. Furthermore, feature selection technique minimum redundancy and maximum relevance was employed in order to select salient features as well as reduce the noise and redundancy. The experiential outcomes exhibited that the proposed model iNR-2 L achieved outstanding results. It is anticipated that the proposed computational model might be a practical and effective tool for academia and research community.
核受体蛋白(NRPs)在调节基因表达方面发挥着重要作用。随着后基因组时代 NRPs 的快速增长,从其一级序列中准确识别 NRPs 及其亚家族是非常值得推荐的。已经使用了几种常规方法来区分 NRPs 和它们的亚家族,但并没有取得可观的结果。随后,开发了一种两级新的计算模型“iNR-2L”。两种离散方法,即二肽组成和三肽组成,用于构建 NRPs 序列。进一步,将两个描述符空间合并以构建混合空间。此外,采用特征选择技术最小冗余最大相关性来选择显著特征,同时减少噪声和冗余。经验结果表明,所提出的模型 iNR-2L 取得了优异的结果。预计该计算模型可能成为学术界和研究界的实用有效工具。