Borges Rafael J, Salvador Guilherme H M, Pimenta Daniel C, Dos Santos Lucilene D, Fontes Marcos R M, Usón Isabel
Departament of Biophysics and Pharmacology, Biosciences Institute, São Paulo State University (UNESP), Botucatu, São Paulo 18618-689, Brazil.
Crystallographic Methods, Institute of Molecular Biology of Barcelona (IBMB-CSIC), Barcelona 08028, Spain.
Nucleic Acids Res. 2022 May 20;50(9):e50. doi: 10.1093/nar/gkac029.
Proteins isolated from natural sources can be composed of a mixture of isoforms with similar physicochemical properties that coexist in the final steps of purification. Yet, even where unverified, the assumed sequence is enforced throughout the structural studies. Herein, we propose a novel perspective to address the usually neglected sequence heterogeneity of natural products by integrating biophysical, genetic and structural data in our program SEQUENCE SLIDER. The aim is to assess the evidence supporting chemical composition in structure determination. Locally, we interrogate the experimental map to establish which side chains are supported by the structural data, and the genetic information relating sequence conservation is integrated into this statistic. Hence, we build a constrained peptide database, containing most probable sequences to interpret mass spectrometry data (MS). In parallel, we perform MS de novo sequencing with genomic-based algorithms to detect point mutations. We calibrated SLIDER with Gallus gallus lysozyme, whose sequence is unequivocally established and numerous natural isoforms are reported. We used SLIDER to characterize a metalloproteinase and a phospholipase A2-like protein from the venom of Bothrops moojeni and a crotoxin from Crotalus durissus collilineatus. This integrated approach offers a more realistic structural descriptor to characterize macromolecules isolated from natural sources.
从天然来源分离的蛋白质可能由具有相似物理化学性质的同工型混合物组成,这些同工型在纯化的最后步骤中共存。然而,即使未经证实,在整个结构研究中也会采用假定的序列。在此,我们提出了一种新的观点,通过在我们的程序SEQUENCE SLIDER中整合生物物理、遗传和结构数据,来解决天然产物中通常被忽视的序列异质性问题。目的是评估在结构测定中支持化学成分的证据。在局部层面,我们审视实验图谱,以确定哪些侧链得到结构数据的支持,并将与序列保守性相关的遗传信息整合到这一统计中。因此,我们构建了一个受限肽数据库,其中包含解释质谱数据(MS)的最可能序列。同时,我们使用基于基因组的算法进行MS从头测序,以检测点突变。我们用鸡溶菌酶对SLIDER进行了校准,其序列已明确确定,并且已报道了多种天然同工型。我们使用SLIDER对来自莫氏矛头蝮毒液的一种金属蛋白酶和一种磷脂酶A2样蛋白以及来自杜氏响尾蛇指名亚种的一种响尾蛇毒素进行了表征。这种综合方法为表征从天然来源分离的大分子提供了更现实的结构描述符。