Suppr超能文献

以人工智能预测的蛋白质结构为参考来预测乳腺癌抑癌基因的功能丧失活性。

Using AI-predicted protein structures as a reference to predict loss-of-function activity in tumor suppressor breast cancer genes.

作者信息

Gnanaolivu Rohan, Hart Steven N

机构信息

Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States.

Department of Laboratory Medicine and Pathology, Mayo Clinic, Rochester, MN, United States.

出版信息

Comput Struct Biotechnol J. 2024 Oct 5;23:3472-3480. doi: 10.1016/j.csbj.2024.10.008. eCollection 2024 Dec.

Abstract

BACKGROUND

The loss-of-function (LOF) classification of most missense variants in tumor suppressor breast cancer genes , and remains unclassified and confounds clinical actionability. Classifying these variants is challenging due to their rarity, leading clinicians to rely on predictive methods. Protein stability changes are associated with function, making stability predictors valuable. Stability predictions upon missense variant perturbations require high-resolution protein structures. However, the availability of these high-resolution structures is lacking. This study explores using generative AI to predict high-resolution protein structures, which can then be analyzed with protein stability prediction methods to assess LOF activity in ordered regions of the protein. This study also determines the appropriate protein stability and dedicated missense prediction methods in dbNSFP v4.7 database to predict LOF activity in ordered regions of these four genes. Functional classifications from homology recombination DNA repair (HDR) assays and variant classifications from the ClinVar database provide a reliable dataset for evaluating the performance of these prediction methods.

RESULTS

Complex AlphaFold2 structures of the BRCA1-C terminal (BRCT) domain and the DNA-binding (DB) domain of analyzed using protein stability tool FoldX predicts LOF activity from missense variants significantly better than experimentally-derived structures in ordered regions. The BRCT domain achieved an Area Under the Curve (AUC)= 0.861 (95 % CI:0.858-0.863) and AUC= 0.842 (95 % CI:0.840-0.845), while the DB domain achieved an AUC= 0.836 (95 % CI:0.8322-0.841), compared to AUC= 0.847 (95 % CI:0.844-0.850) and AUC= 0.835 (95 % CI:0.832-0.837) from the BRCT domain, and AUC= 0.830 (95 % CI:0.821-0.8320) from the DB domain from experimentally-derived structures. Protein stability does not predict LOF activity from missense variants better than dedicated missense predictors. Overall, we find that AlphaMissense ranks highly, with an average AUC= 0.890 (95 % CI 0.886-0.895) from ordered regions across these four cancer genes, compared to all other missense predictors present in the dbNSFP database.

CONCLUSIONS

The study reveals that generative AI protein predicted structures can outperform experimentally-derived structures in evaluating LOF activity from predicted protein stability in ordered regions of genes BRCA1, BRCA2, PALB2 and RAD51C. The study also highlights the predictive performance of AlphaMissense as the premier missense prediction method to predict LOF activity from missense variants in these four tumor suppressor breast cancer genes. The code for this study can be downloaded for free on GitHub (https://github.com/rohandavidg/CarePred).

摘要

背景

大多数肿瘤抑制基因中错义变体的功能丧失(LOF)分类仍未明确,这给临床可操作性带来了困扰。由于这些变体较为罕见,对其进行分类具有挑战性,这使得临床医生依赖预测方法。蛋白质稳定性变化与功能相关,这使得稳定性预测工具具有重要价值。对错过义变体扰动后的稳定性进行预测需要高分辨率的蛋白质结构。然而,目前缺乏这些高分辨率结构。本研究探索使用生成式人工智能来预测高分辨率蛋白质结构,然后可以使用蛋白质稳定性预测方法对其进行分析,以评估蛋白质有序区域中的LOF活性。本研究还确定了dbNSFP v4.7数据库中合适的蛋白质稳定性和专用错义预测方法,以预测这四个基因有序区域中的LOF活性。来自同源重组DNA修复(HDR)分析的功能分类和ClinVar数据库中的变体分类为评估这些预测方法的性能提供了可靠的数据集。

结果

使用蛋白质稳定性工具FoldX分析BRCA1的C末端(BRCT)结构域和DNA结合(DB)结构域的复杂AlphaFold2结构,在预测有序区域错义变体的LOF活性方面,比实验获得的结构表现更好。BRCT结构域的曲线下面积(AUC)= 0.861(95%置信区间:0.858 - 0.863)和AUC= 0.842(95%置信区间:0.840 - 0.845),而DB结构域的AUC= 0.836(95%置信区间:0.8322 - 0.841),相比之下,实验获得的BRCT结构域的AUC= 0.847(95%置信区间:0.844 -  0.850)和AUC= 0.835(95%置信区间:0.832 - 0.837),DB结构域的AUC= 0.830(95%置信区间:0.821 - 0.8 320)。蛋白质稳定性在预测错义变体的LOF活性方面并不比专用错义预测器表现更好。总体而言,我们发现AlphaMissense排名靠前,在这四个癌症基因的有序区域中,平均AUC= 0.890(95%置信区间0.886 - 0.895),优于dbNSFP数据库中所有其他错义预测器。

结论

该研究表明,在评估BRCA1、BRCA2、PALB2和RAD51C基因有序区域中预测的蛋白质稳定性的LOF活性时,生成式人工智能预测的蛋白质结构优于实验获得的结构。该研究还强调了AlphaMissense作为预测这四个肿瘤抑制基因中错义变体的LOF活性的首要错义预测方法的预测性能。本研究的代码可在GitHub(https://github.com/rohandavidg/CarePred)上免费下载。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2526/11490748/203ea2ad2d22/ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验