HITS-PR-HHblits：结合PageRank和超链接诱导主题搜索进行蛋白质远程同源性检测。

HITS-PR-HHblits: protein remote homology detection by combining PageRank and Hyperlink-Induced Topic Search.

作者信息

Liu Bin, Jiang Shuangyan, Zou Quan

机构信息

School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China.

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.

出版信息

Brief Bioinform. 2020 Jan 17;21(1):298-308. doi: 10.1093/bib/bby104.

DOI:10.1093/bib/bby104

PMID:30403770

Abstract

As one of the most important fundamental problems in protein sequence analysis, protein remote homology detection is critical for both theoretical research (protein structure and function studies) and real world applications (drug design). Although several computational predictors have been proposed, their detection performance is still limited. In this study, we treat protein remote homology detection as a document retrieval task, where the proteins are considered as documents and its aim is to find the highly related documents with the query documents in a database. A protein similarity network was constructed based on the true labels of proteins in the database, and the query proteins were then connected into the network based on the similarity scores calculated by three ranking methods, including PSI-BLAST, Hmmer and HHblits. The PageRank algorithm and Hyperlink-Induced Topic Search (HITS) algorithm were respectively performed on this network to move the homologous proteins of query proteins to the neighbors of the query proteins in the network. Finally, PageRank and HITS algorithms were combined, and a predictor called HITS-PR-HHblits was proposed to further improve the predictive performance. Tested on the SCOP and SCOPe benchmark datasets, the experimental results showed that the proposed protocols outperformed other state-of-the-art methods. For the convenience of the most experimental scientists, a web server for HITS-PR-HHblits was established at http://bioinformatics.hitsz.edu.cn/HITS-PR-HHblits, by which the users can easily get the results without the need to go through the mathematical details. The HITS-PR-HHblits predictor is a protocol for protein remote homology detection using different sets of programs, which will become a very useful computational tool for proteome analysis.

摘要

作为蛋白质序列分析中最重要的基础问题之一，蛋白质远程同源性检测对于理论研究（蛋白质结构与功能研究）和实际应用（药物设计）都至关重要。尽管已经提出了几种计算预测方法，但其检测性能仍然有限。在本研究中，我们将蛋白质远程同源性检测视为一项文档检索任务，其中蛋白质被视为文档，其目的是在数据库中找到与查询文档高度相关的文档。基于数据库中蛋白质的真实标签构建了蛋白质相似性网络，然后根据PSI-BLAST、Hmmer和HHblits这三种排序方法计算的相似性得分将查询蛋白质连接到该网络中。分别对该网络执行PageRank算法和超链接诱导主题搜索（HITS）算法，以使查询蛋白质的同源蛋白质移动到网络中查询蛋白质的邻居位置。最后，将PageRank算法和HITS算法相结合，提出了一种名为HITS-PR-HHblits的预测器以进一步提高预测性能。在SCOP和SCOPe基准数据集上进行测试，实验结果表明所提出的方案优于其他现有方法。为了方便大多数实验科学家，在http://bioinformatics.hitsz.edu.cn/HITS-PR-HHblits建立了一个HITS-PR-HHblits的网络服务器，用户可以通过该服务器轻松获得结果，而无需了解数学细节。HITS-PR-HHblits预测器是一种使用不同程序集进行蛋白质远程同源性检测的方案，它将成为蛋白质组分析中非常有用的计算工具。

相似文献

HITS-PR-HHblits: protein remote homology detection by combining PageRank and Hyperlink-Induced Topic Search.

Brief Bioinform. 2020 Jan 17;21(1):298-308. doi: 10.1093/bib/bby104.

dRHP-PseRA: detecting remote homology proteins using profile-based pseudo protein sequence and rank aggregation.

Sci Rep. 2016 Sep 1;6:32333. doi: 10.1038/srep32333.

Application of learning to rank to protein remote homology detection.

Bioinformatics. 2015 Nov 1;31(21):3492-8. doi: 10.1093/bioinformatics/btv413. Epub 2015 Jul 10.

PHR-search: a search framework for protein remote homology detection based on the predicted protein hierarchical relationships.

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbab609.

PL-search: a profile-link-based search method for protein remote homology detection.

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa051.

ProtDet-CCH: Protein Remote Homology Detection by Combining Long Short-Term Memory and Ranking Methods.

IEEE/ACM Trans Comput Biol Bioinform. 2019 Jul-Aug;16(4):1203-1210. doi: 10.1109/TCBB.2018.2789880. Epub 2018 Jan 5.

SMI-BLAST: a novel supervised search framework based on PSI-BLAST for protein remote homology detection.

Bioinformatics. 2021 May 17;37(7):913-920. doi: 10.1093/bioinformatics/btaa772.

Protein fold recognition based on multi-view modeling.

Bioinformatics. 2019 Sep 1;35(17):2982-2990. doi: 10.1093/bioinformatics/btz040.

Combining evolutionary information extracted from frequency profiles with sequence-based kernels for protein remote homology detection.

Bioinformatics. 2014 Feb 15;30(4):472-9. doi: 10.1093/bioinformatics/btt709. Epub 2013 Dec 5.

ProtDec-LTR2.0: an improved method for protein remote homology detection by combining pseudo protein and supervised Learning to Rank.

Bioinformatics. 2017 Nov 1;33(21):3473-3476. doi: 10.1093/bioinformatics/btx429.

引用本文的文献

BioSeq-Diabolo: Biological sequence similarity analysis using Diabolo.

PLoS Comput Biol. 2023 Jun 20;19(6):e1011214. doi: 10.1371/journal.pcbi.1011214. eCollection 2023 Jun.

Identification of potential therapeutic intervening targets by in-silico analysis of nsSNPs in preterm birth-related genes.

PLoS One. 2023 Mar 7;18(3):e0280305. doi: 10.1371/journal.pone.0280305. eCollection 2023.

A GHKNN model based on the physicochemical property extraction method to identify SNARE proteins.

Front Genet. 2022 Nov 23;13:935717. doi: 10.3389/fgene.2022.935717. eCollection 2022.

Collectively encoding protein properties enriches protein language models.

BMC Bioinformatics. 2022 Nov 8;23(1):467. doi: 10.1186/s12859-022-05031-z.

Computational analysis and prediction of PE_PGRS proteins using machine learning.

Comput Struct Biotechnol J. 2022 Jan 22;20:662-674. doi: 10.1016/j.csbj.2022.01.019. eCollection 2022.

Multiple Laplacian Regularized RBF Neural Network for Assessing Dry Weight of Patients With End-Stage Renal Disease.

Front Physiol. 2021 Dec 13;12:790086. doi: 10.3389/fphys.2021.790086. eCollection 2021.

Multimodal Brain Network Jointly Construction and Fusion for Diagnosis of Epilepsy.

Front Neurosci. 2021 Sep 29;15:734711. doi: 10.3389/fnins.2021.734711. eCollection 2021.

Accurate Identification of Antioxidant Proteins Based on a Combination of Machine Learning Techniques and Hidden Markov Model Profiles.

Comput Math Methods Med. 2021 Aug 7;2021:5770981. doi: 10.1155/2021/5770981. eCollection 2021.

Assessing Dry Weight of Hemodialysis Patients via Sparse Laplacian Regularized RVFL Neural Network with L-Norm.

Biomed Res Int. 2021 Feb 4;2021:6627650. doi: 10.1155/2021/6627650. eCollection 2021.

Sequence alignment generation using intermediate sequence search for homology modeling.

Comput Struct Biotechnol J. 2020 Jul 25;18:2043-2050. doi: 10.1016/j.csbj.2020.07.012. eCollection 2020.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

HITS-PR-HHblits：结合PageRank和超链接诱导主题搜索进行蛋白质远程同源性检测。

HITS-PR-HHblits: protein remote homology detection by combining PageRank and Hyperlink-Induced Topic Search.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献