Wang Chao, Zou Quan, Ju Ying, Shi Hua
IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):967-975. doi: 10.1109/TCBB.2022.3204365. Epub 2023 Apr 3.
Enhancers are crucial for precise regulation of gene expression, while enhancer identification and strength prediction are challenging because of their free distribution and tremendous number of similar fractions in the genome. Although several bioinformatics tools have been developed, shortfalls in these models remain, and their performances need further improvement. In the present study, a two-layer predictor called Enhancer-FRL was proposed for identifying enhancers (enhancers or nonenhancers) and their activities (strong and weak). More specifically, to build an efficient model, the feature representation learning scheme was applied to generate a 50D probabilistic vector based on 10 feature encodings and five machine learning algorithms. Subsequently, the multiview probabilistic features were integrated to construct the final prediction model. Compared with the single feature-based model, Enhancer-FRL showed significant performance improvement and model robustness. Performance assessment on the independent test dataset indicated that the proposed model outperformed state-of-the-art available toolkits. The webserver Enhancer-FRL is freely accessible at http://lab.malab.cn/∼wangchao/softwares/Enhancer-FRL/, The code and datasets can be downloaded at the webserver page or at the Github https://github.com/wangchao-malab/Enhancer-FRL/.
增强子对于基因表达的精确调控至关重要,然而由于其在基因组中的自由分布以及大量相似片段,增强子的识别和强度预测具有挑战性。尽管已经开发了几种生物信息学工具,但这些模型仍存在不足,其性能需要进一步改进。在本研究中,提出了一种名为Enhancer-FRL的两层预测器,用于识别增强子(增强子或非增强子)及其活性(强和弱)。更具体地说,为了构建一个高效的模型,应用特征表示学习方案,基于10种特征编码和5种机器学习算法生成一个50维概率向量。随后,整合多视图概率特征以构建最终的预测模型。与基于单一特征的模型相比,Enhancer-FRL表现出显著的性能提升和模型稳健性。在独立测试数据集上的性能评估表明,所提出的模型优于现有的先进工具包。网络服务器Enhancer-FRL可在http://lab.malab.cn/∼wangchao/softwares/Enhancer-FRL/免费访问,代码和数据集可在网络服务器页面或Github上的https://github.com/wangchao-malab/Enhancer-FRL/下载。