Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560012, India.
Centre for Development of Advanced Computing, Knowledge Park, Byappanahalli, Bangalore, 560038, India.
Sci Rep. 2021 Apr 22;11(1):8746. doi: 10.1038/s41598-021-87833-4.
Genome sequencing projects unearth sequences of all the protein sequences encoded in a genome. As the first step, homology detection is employed to obtain clues to structure and function of these proteins. However, high evolutionary divergence between homologous proteins challenges our ability to detect distant relationships. In the past, an approach involving multiple Position Specific Scoring Matrices (PSSMs) was found to be more effective than traditional single PSSMs. Cascaded search is another successful approach where hits of a search are queried to detect more homologues. We propose a protocol, 'Master Blaster', which combines the principles adopted in these two approaches to enhance our ability to detect remote homologues even further. Assessment of the approach was performed using known relationships available in the SCOP70 database, and the results were compared against that of PSI-BLAST and HHblits, a hidden Markov model-based method. Compared to PSI-BLAST, Master Blaster resulted in 10% improvement with respect to detection of cross superfamily connections, nearly 35% improvement in cross family and more than 80% improvement in intra family connections. From the results it was observed that HHblits is more sensitive in detecting remote homologues compared to Master Blaster. However, there are true hits from 46-folds for which Master Blaster reported homologs that are not reported by HHblits even using the optimal parameters indicating that for detecting remote homologues, use of multiple methods employing a combination of different approaches can be more effective in detecting remote homologs. Master Blaster stand-alone code is available for download in the supplementary archive.
基因组测序项目揭示了基因组中所有蛋白质序列的序列。作为第一步,同源性检测用于获得这些蛋白质的结构和功能线索。然而,同源蛋白之间的高度进化分歧挑战了我们检测远缘关系的能力。过去,发现涉及多个位置特异性评分矩阵(PSSM)的方法比传统的单个 PSSM 更有效。级联搜索是另一种成功的方法,其中搜索的命中被查询以检测更多的同源物。我们提出了一种名为“Master Blaster”的协议,该协议结合了这两种方法中采用的原理,以进一步提高我们检测远程同源物的能力。使用 SCOP70 数据库中可用的已知关系对该方法进行了评估,并将结果与 PSI-BLAST 和 HHblits(基于隐马尔可夫模型的方法)进行了比较。与 PSI-BLAST 相比,Master Blaster 在检测跨超家族连接方面提高了 10%,在跨家族方面提高了近 35%,在家族内连接方面提高了 80%以上。从结果中可以看出,与 Master Blaster 相比,HHblits 在检测远程同源物方面更敏感。然而,有 46 倍的真实命中,Master Blaster 报告了 HHblits 没有报告的同源物,即使使用最优参数也表明,对于检测远程同源物,使用多种方法结合不同方法的组合可以更有效地检测远程同源物。Master Blaster 独立代码可在补充档案中下载。