Schaeffer Dustin, Grishin Nick V
Department of Biophysics, University of Texas Southwestern Medical Center, Dallas, TX, USA.
Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX, USA.
Methods Mol Biol. 2019;1851:277-286. doi: 10.1007/978-1-4939-8736-8_15.
Evolutionary domains are protein regions with observable sequence similarity to other known domains. Here we describe how to use common sequence and profile alignment algorithms (i.e., BLAST, HHsearch) to delineate putative domains in novel protein sequences, given a reference library of protein domains. In this case, we use our database of evolutionary domains (ECOD) as a reference, but other domain sequence libraries could be used (e.g., SCOP, CATH). We describe our domain partition algorithm along with specific notes on how to avoid domain indexing errors when working with multiple data sources and software algorithms with differing outputs.
进化结构域是与其他已知结构域具有可观察到的序列相似性的蛋白质区域。在此,我们描述了在给定蛋白质结构域参考库的情况下,如何使用常见的序列和轮廓比对算法(即BLAST、HHsearch)来确定新蛋白质序列中的推定结构域。在这种情况下,我们使用我们的进化结构域数据库(ECOD)作为参考,但也可以使用其他结构域序列库(例如SCOP、CATH)。我们描述了我们的结构域划分算法,以及在使用具有不同输出的多个数据源和软件算法时如何避免结构域索引错误的具体注意事项。