Lemos Rafael Pereira, Mariano Diego, Silveira Sabrina De Azevedo, de Melo-Minardi Raquel C
Laboratory of Bioinformatics and Systems, Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Brazil.
Laboratory of Bioinformatics, Visualization and Systems, Department of Informatics, Federal University of Viçosa, Viçosa, Brazil.
Front Bioinform. 2025 Sep 1;5:1630078. doi: 10.3389/fbinf.2025.1630078. eCollection 2025.
Protein interatomic contacts, defined by spatial proximity and physicochemical complementarity at atomic resolution, are fundamental to characterizing molecular interactions and bonding. Methods for calculating contacts are generally categorized as cutoff-dependent, which rely on Euclidean distances, or cutoff-independent, which utilize Delaunay and Voronoi tessellations. While cutoff-dependent methods are recognized for their simplicity, completeness, and reliability, traditional implementations remain computationally expensive, posing significant scalability challenges in the current Big Data era of bioinformatics. Here, we introduce COC DA (COntact search pruning by C Distance Analysis), a Python-based command-line tool for improving search pruning in large-scale interatomic protein contact analysis using alpha-carbon (C ) distance matrices. COC DA detects intra- and inter-chain contacts, and classifies them into seven different types: hydrogen and disulfide bonds; hydrophobic effects; attractive, repulsive, and salt-bridge interactions; and aromatic stackings. To evaluate our tool, we compared it with three traditional approaches in the literature: all-against-all atom distance calculation ("brute-force"), static C distance cutoff (SC), and Biopython's NeighborSearch class (NS). COC DA demonstrated superior performance compared to the other methods, achieving on average 6x faster computation times than advanced data structures like -d trees from NS, in addition to being simpler to implement and fully customizable. The presented tool facilitates exploratory and large-scale analyses of interatomic contacts in proteins in a simple and efficient manner, also enabling the integration of results with other tools and pipelines. The COC DA tool is freely available at https://github.com/LBS-UFMG/COCaDA.
蛋白质原子间接触由原子分辨率下的空间接近度和物理化学互补性定义,是表征分子相互作用和键合的基础。计算接触的方法通常分为依赖截止值的方法(依赖欧几里得距离)和不依赖截止值的方法(利用德劳内三角剖分和沃罗诺伊镶嵌)。虽然依赖截止值的方法因其简单性、完整性和可靠性而得到认可,但传统实现方式在计算上仍然很昂贵,在当前生物信息学的大数据时代带来了重大的可扩展性挑战。在这里,我们介绍了COC DA(通过Cα距离分析进行接触搜索剪枝),这是一个基于Python的命令行工具,用于使用α-碳(Cα)距离矩阵改进大规模蛋白质原子间接触分析中的搜索剪枝。COC DA检测链内和链间接触,并将它们分为七种不同类型:氢键和二硫键;疏水作用;吸引、排斥和盐桥相互作用;以及芳香堆积。为了评估我们的工具,我们将其与文献中的三种传统方法进行了比较:全对全原子距离计算(“暴力法”)、静态Cα距离截止(SC)和Biopython的NeighborSearch类(NS)。与其他方法相比,COC DA表现出卓越的性能,与NS中的kd树等高级数据结构相比,平均计算速度快6倍,此外还更易于实现且完全可定制。所展示的工具以简单高效的方式促进了对蛋白质原子间接触的探索性和大规模分析,还能够将结果与其他工具和管道集成。COC DA工具可在https://github.com/LBS-UFMG/COCaDA上免费获取。