Chan Simon K, Hsing Michael, Hormozdiari Fereydoun, Cherkasov Artem
CIHR/MSFHR Strategic Training Program in Bioinformatics, Canada's Michael Smith Genome Sciences Centre, Vancouver, BC, Canada.
BMC Bioinformatics. 2007 Jun 28;8:227. doi: 10.1186/1471-2105-8-227.
In a previous study, we demonstrated that some essential proteins from pathogenic organisms contained sizable insertions/deletions (indels) when aligned to human proteins of high sequence similarity. Such indels may provide sufficient spatial differences between the pathogenic protein and human proteins to allow for selective targeting. In one example, an indel difference was targeted via large scale in-silico screening. This resulted in selective antibodies and small compounds which were capable of binding to the deletion-bearing essential pathogen protein without any cross-reactivity to the highly similar human protein. The objective of the current study was to investigate whether indels were found more frequently in essential than non-essential proteins.
We have investigated three species, Bacillus subtilis, Escherichia coli, and Saccharomyces cerevisiae, for which high-quality protein essentiality data is available. Using these data, we demonstrated with t-test calculations that the mean indel frequencies in essential proteins were greater than that of non-essential proteins in the three proteomes. The abundance of indels in both types of proteins was also shown to be accurately modeled by the Weibull distribution. However, Receiver Operator Characteristic (ROC) curves showed that indel frequencies alone could not be used as a marker to accurately discriminate between essential and non-essential proteins in the three proteomes. Finally, we analyzed the protein interaction data available for S. cerevisiae and observed that indel-bearing proteins were involved in more interactions and had greater betweenness values within Protein Interaction Networks (PINs).
Overall, our findings demonstrated that indels were not randomly distributed across the studied proteomes and were likely to occur more often in essential proteins and those that were highly connected, indicating a possible role of sequence insertions and deletions in the regulation and modification of protein-protein interactions. Such observations will provide new insights into indel-based drug design using bioinformatics and cheminformatics tools.
在之前的一项研究中,我们证明,当与具有高度序列相似性的人类蛋白质比对时,致病生物体的一些必需蛋白质含有相当大的插入/缺失(indel)。这种插入/缺失可能在致病蛋白质和人类蛋白质之间提供足够的空间差异,从而实现选择性靶向。在一个例子中,通过大规模的计算机模拟筛选靶向了一个插入/缺失差异。这产生了能够结合携带缺失的必需病原体蛋白质且与高度相似的人类蛋白质无任何交叉反应的选择性抗体和小分子化合物。本研究的目的是调查插入/缺失在必需蛋白质中是否比在非必需蛋白质中更频繁地出现。
我们研究了三种物种,枯草芽孢杆菌、大肠杆菌和酿酒酵母,可获得这些物种的高质量蛋白质必需性数据。利用这些数据,我们通过t检验计算证明,在这三个蛋白质组中,必需蛋白质的平均插入/缺失频率高于非必需蛋白质。还表明,这两种类型蛋白质中插入/缺失的丰度都可以用威布尔分布准确建模。然而,受试者工作特征(ROC)曲线表明,仅插入/缺失频率不能用作准确区分这三个蛋白质组中必需和非必需蛋白质的标志物。最后,我们分析了酿酒酵母可用的蛋白质相互作用数据,观察到携带插入/缺失的蛋白质参与了更多的相互作用,并且在蛋白质相互作用网络(PIN)中的介数中心性值更大。
总体而言,我们的研究结果表明,插入/缺失并非随机分布在所研究的蛋白质组中,并且可能更频繁地出现在必需蛋白质和高度连接的蛋白质中,这表明序列插入和缺失在蛋白质 - 蛋白质相互作用的调节和修饰中可能发挥作用。这些观察结果将为使用生物信息学和化学信息学工具进行基于插入/缺失的药物设计提供新的见解。