Lv Zhibin, Wang Pingping, Zou Quan, Jiang Qinghua
Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin 150000, China.
Bioinformatics. 2021 Apr 5;36(24):5600-5609. doi: 10.1093/bioinformatics/btaa1074.
The Golgi apparatus has a key functional role in protein biosynthesis within the eukaryotic cell with malfunction resulting in various neurodegenerative diseases. For a better understanding of the Golgi apparatus, it is essential to identification of sub-Golgi protein localization. Although some machine learning methods have been used to identify sub-Golgi localization proteins by sequence representation fusion, more accurate sub-Golgi protein identification is still challenging by existing methodology.
we developed a protein sub-Golgi localization identification protocol using deep representation learning features with 107 dimensions. By this protocol, we demonstrated that instead of multi-type protein sequence feature representation fusion as in previous state-of-the-art sub-Golgi-protein localization classifiers, it is sufficient to exploit only one type of feature representation for more accurately identification of sub-Golgi proteins. Compared with independent testing results for benchmark datasets, our protocol is able to perform generally, reliably and robustly for sub-Golgi protein localization prediction.
A use-friendly webserver is freely accessible at http://isGP-DRLF.aibiochem.net and the prediction code is accessible at https://github.com/zhibinlv/isGP-DRLF.
Supplementary data are available at Bioinformatics online.
高尔基体在真核细胞内的蛋白质生物合成中具有关键的功能作用,其功能异常会导致各种神经退行性疾病。为了更好地理解高尔基体,确定高尔基体亚结构蛋白的定位至关重要。尽管一些机器学习方法已被用于通过序列表示融合来识别高尔基体亚结构定位蛋白,但现有方法在更准确地识别高尔基体亚结构蛋白方面仍具有挑战性。
我们开发了一种使用107维深度表示学习特征的蛋白质高尔基体亚结构定位识别方案。通过该方案,我们证明,与先前最先进的高尔基体亚结构蛋白定位分类器中使用的多类型蛋白质序列特征表示融合不同,仅利用一种类型的特征表示就足以更准确地识别高尔基体亚结构蛋白。与基准数据集的独立测试结果相比,我们的方案能够对高尔基体亚结构蛋白定位预测进行总体、可靠且稳健的执行。
可通过http://isGP-DRLF.aibiochem.net免费访问一个用户友好的网络服务器,预测代码可在https://github.com/zhibinlv/isGP-DRLF获取。
补充数据可在《生物信息学》在线版获取。