Reid Adam James, Yeats Corin, Orengo Christine Anne
Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK.
Bioinformatics. 2007 Sep 15;23(18):2353-60. doi: 10.1093/bioinformatics/btm355. Epub 2007 Aug 20.
A recent development in sequence-based remote homologue detection is the introduction of profile-profile comparison methods. These are more powerful than previous technologies and can detect potentially homologous relationships missed by structural classifications such as CATH and SCOP. As structural classifications traditionally act as the gold standard of homology this poses a challenge in benchmarking them.
We present a novel approach which allows an accurate benchmark of these methods against the CATH structural classification. We then apply this approach to assess the accuracy of a range of publicly available methods for remote homology detection including several profile-profile methods (COMPASS, HHSearch, PRC) from two perspectives. First, in distinguishing homologous domains from non-homologues and second, in annotating proteomes with structural domain families. PRC is shown to be the best method for distinguishing homologues. We show that SAM is the best practical method for annotating genomes, whilst using COMPASS for the most remote homologues would increase coverage. Finally, we introduce a simple approach to increase the sensitivity of remote homologue detection by up to 10%. This is achieved by combining multiple methods with a jury vote.
Supplementary data are available at Bioinformatics online.
基于序列的远程同源物检测的最新进展是引入了profile-profile比较方法。这些方法比以前的技术更强大,能够检测出诸如CATH和SCOP等结构分类所遗漏的潜在同源关系。由于结构分类传统上是同源性的金标准,这给对它们进行基准测试带来了挑战。
我们提出了一种新颖的方法,该方法可以针对CATH结构分类对这些方法进行准确的基准测试。然后,我们从两个角度应用此方法来评估一系列用于远程同源性检测的公开可用方法的准确性,包括几种profile-profile方法(COMPASS、HHSearch、PRC)。首先,区分同源结构域和非同源结构域;其次,用结构域家族注释蛋白质组。结果表明,PRC是区分同源物的最佳方法。我们表明,SAM是注释基因组的最佳实用方法,而使用COMPASS检测最远距离的同源物会增加覆盖率。最后,我们引入了一种简单的方法,可将远程同源物检测的灵敏度提高多达10%。这是通过将多种方法与多数投票相结合来实现的。
补充数据可在《生物信息学》在线版获取。