Boatner Lisa M, Eberhardt Jerome, Shikwana Flowreen, Holcomb Matthew, Lee Peiyuan, Houk Kendall N, Forli Stefano, Backus Keriann M
Biological Chemistry Department, David Geffen School of Medicine, UCLA, Los Angeles, California 90095, United States.
Department of Chemistry and Biochemistry, UCLA, Los Angeles, California 90095, United States.
ACS Chem Biol. 2025 Jul 18;20(7):1669-1682. doi: 10.1021/acschembio.5c00225. Epub 2025 Jun 29.
Cysteine residues play key roles in protein structure and function and can serve as targets for chemical probes and even drugs. Chemoproteomic studies have revealed that heightened cysteine reactivity toward electrophilic probes, such as iodoacetamide alkyne (IAA), is indicative of likely residue functionality. However, while the cysteine coverage of chemoproteomic studies has increased substantially, these methods still provide only a partial assessment of proteome-wide cysteine reactivity, with cysteines from low-abundance proteins and tough-to-detect peptides still largely refractory to chemoproteomic analysis. Here, we integrate cysteine chemoproteomic reactivity data sets with structure-guided computational analysis to delineate key structural features of proteins that favor elevated cysteine reactivity toward IAA. We first generated and aggregated multiple descriptors of cysteine microenvironment, including amino acid content, solvent accessibility, residue proximity, secondary structure, and predicted p. We find that no single feature is sufficient to accurately predict the reactivity. Therefore, we developed the CIAA (Cysteine reactivity toward IodoAcetamide Alkyne) method, which utilizes a Random Forest model to assess cysteine reactivity by incorporating descriptors that characterize the three-dimensional (3D) structural properties of thiol microenvironments. We trained the CIAA model on existing and newly generated cysteine chemoproteomic reactivity data paired with high-resolution crystal structures from the Protein Data Bank (PDB), with cross-validation against an external data set. CIAA analysis reveals key features driving cysteine reactivity, such as backbone hydrogen bond donor atoms, and reveals still underserved needs in the area of computational predictions of cysteine reactivity, including challenges surrounding protein structure selection data set curation. Thus, our work provides a strong foundation for deploying artificial intelligence (AI) on cysteine chemoproteomic data sets.
半胱氨酸残基在蛋白质结构和功能中起着关键作用,可作为化学探针甚至药物的作用靶点。化学蛋白质组学研究表明,半胱氨酸对亲电探针(如碘乙酰胺炔烃,IAA)的反应性增强表明该残基可能具有特定功能。然而,尽管化学蛋白质组学研究对半胱氨酸的覆盖范围已大幅增加,但这些方法仍只能对全蛋白质组范围内半胱氨酸的反应性进行部分评估,低丰度蛋白质和难以检测的肽段中的半胱氨酸在很大程度上仍难以通过化学蛋白质组学分析进行研究。在此,我们将半胱氨酸化学蛋白质组学反应性数据集与结构导向的计算分析相结合,以描绘出有利于半胱氨酸对IAA反应性升高的蛋白质关键结构特征。我们首先生成并汇总了半胱氨酸微环境的多个描述符,包括氨基酸含量、溶剂可及性、残基邻近性、二级结构和预测的p值。我们发现没有单一特征足以准确预测反应性。因此,我们开发了CIAA(半胱氨酸对碘乙酰胺炔烃的反应性)方法,该方法利用随机森林模型,通过纳入表征硫醇微环境三维(3D)结构特性的描述符来评估半胱氨酸反应性。我们使用来自蛋白质数据库(PDB)的现有和新生成的半胱氨酸化学蛋白质组学反应性数据与高分辨率晶体结构进行配对,对CIAA模型进行训练,并针对外部数据集进行交叉验证。CIAA分析揭示了驱动半胱氨酸反应性的关键特征,如主链氢键供体原子,并揭示了在半胱氨酸反应性计算预测领域仍未得到充分满足的需求,包括围绕蛋白质结构选择和数据集整理的挑战。因此,我们的工作为在半胱氨酸化学蛋白质组学数据集上部署人工智能(AI)奠定了坚实基础。