Wheelan S J, Marchler-Bauer A, Bryant S H
National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA.
Bioinformatics. 2000 Jul;16(7):613-8. doi: 10.1093/bioinformatics/16.7.613.
The sizes of protein domains observed in the 3D-structure database follow a surprisingly narrow distribution. Structural domains are furthermore formed from a single-chain continuous segment in over 80% of instances. These observations imply that some choices of domain boundaries on an otherwise uncharacterized sequence are more likely than others, based solely on the size and segment number of predicted domains. This property might be used to guess the locations of protein domain boundaries.
To test this possibility we enumerate putative domain boundaries and calculate their relative likelihood under a probability model that considers only the size and segment number of predicted domains. We ask, in a cross-validated test using sequences with known 3D structure, whether the most likely guesses agree with the observed domain structure. We find that domain boundary predictions are surprisingly successful for sequences up to 400 residues long and that guessing domain boundaries in this way can improve the sensitivity of threading analysis.
在三维结构数据库中观察到的蛋白质结构域大小呈现出惊人的狭窄分布。此外,超过80%的情况下,结构域由单链连续片段形成。这些观察结果表明,在一个原本未表征的序列上,仅基于预测结构域的大小和片段数量,某些结构域边界的选择比其他选择更有可能。这一特性可用于猜测蛋白质结构域边界的位置。
为了测试这种可能性,我们列举了假定的结构域边界,并在一个仅考虑预测结构域大小和片段数量的概率模型下计算它们的相对可能性。在使用具有已知三维结构的序列进行的交叉验证测试中,我们询问最有可能的猜测是否与观察到的结构域结构一致。我们发现,对于长度达400个残基的序列,结构域边界预测出奇地成功,并且以这种方式猜测结构域边界可以提高穿线分析的灵敏度。