Heringa J, Argos P
European Molecular Biology Laboratory, Heidelberg, Germany.
Proteins. 1993 Dec;17(4):391-41. doi: 10.1002/prot.340170407.
An automated algorithm is presented that delineates protein sequence fragments which display similarity. The method incorporates a selection of a number of local nonoverlapping sequence alignments with the highest similarity scores and a graph-theoretical approach to elucidate the consistent start and end points of the fragments comprising one or more ensembles of related subsequences. The procedure allows the simultaneous identification of different types of repeats within one sequence. A multiple alignment of the resulting fragments is performed and a consensus sequence derived from the ensemble(s). Finally, a profile is constructed from the multiple alignment to detect possible and more distant members within the sequence. The method tolerates mutations in the repeats as well as insertions and deletions. The sequence spans between the various repeats or repeat clusters may be of different lengths. The technique has been applied to a number of proteins where the repeating fragments have been derived from information additional to the protein sequences.
本文提出了一种自动算法,用于描绘显示相似性的蛋白质序列片段。该方法包括选择一些具有最高相似性得分的局部非重叠序列比对,以及一种图论方法,以阐明构成一个或多个相关子序列集合的片段的一致起点和终点。该程序允许在一个序列中同时识别不同类型的重复序列。对所得片段进行多重比对,并从该集合中导出共有序列。最后,根据多重比对构建一个图谱,以检测序列中可能存在的更远距离的成员。该方法能够容忍重复序列中的突变以及插入和缺失。不同重复序列或重复簇之间的序列跨度可能不同。该技术已应用于许多蛋白质,其中重复片段来自于蛋白质序列之外的信息。