Al-Saggaf Ubaid M, Usman Muhammad, Naseem Imran, Moinuddin Muhammad, Jiman Ahmad A, Alsaggaf Mohammed U, Alshoubaki Hitham K, Khan Shujaat
Center of Excellence in Intelligent Engineering Systems, King Abdulaziz University, Jeddah, Saudi Arabia.
Electrical and Computer Engineering Department, King Abdulaziz University, Jeddah, Saudi Arabia.
Front Bioeng Biotechnol. 2021 Oct 14;9:752658. doi: 10.3389/fbioe.2021.752658. eCollection 2021.
Extracelluar matrix (ECM) proteins create complex networks of macromolecules which fill-in the extracellular spaces of living tissues. They provide structural support and play an important role in maintaining cellular functions. Identification of ECM proteins can play a vital role in studying various types of diseases. Conventional wet lab-based methods are reliable; however, they are expensive and time consuming and are, therefore, not scalable. In this research, we propose a sequence-based novel machine learning approach for the prediction of ECM proteins. In the proposed method, composition of k-spaced amino acid pair (CKSAAP) features are encoded into a classifiable latent space (LS) with the help of deep latent space encoding (LSE). A comprehensive ablation analysis is conducted for performance evaluation of the proposed method. Results are compared with other state-of-the-art methods on the benchmark dataset, and the proposed ECM-LSE approach has shown to comprehensively outperform the contemporary methods.
细胞外基质(ECM)蛋白形成复杂的大分子网络,填充活组织的细胞外空间。它们提供结构支持,并在维持细胞功能方面发挥重要作用。ECM蛋白的鉴定在研究各种疾病中起着至关重要的作用。传统的基于湿实验室的方法是可靠的;然而,它们昂贵且耗时,因此不可扩展。在本研究中,我们提出了一种基于序列的新型机器学习方法来预测ECM蛋白。在所提出的方法中,借助深度潜在空间编码(LSE)将k间隔氨基酸对(CKSAAP)特征的组成编码到可分类的潜在空间(LS)中。对所提出的方法进行了全面的消融分析以进行性能评估。在基准数据集上与其他现有方法进行了结果比较,所提出的ECM-LSE方法已显示出全面优于当代方法。