• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

QAFI:一种使用蛋白质特异性预测因子和集成学习对错义变异影响进行定量估计的新方法。

QAFI: a novel method for quantitative estimation of missense variant impact using protein-specific predictors and ensemble learning.

作者信息

Ozkan Selen, Padilla Natàlia, de la Cruz Xavier

机构信息

Research Unit in Clinical and Translational Bioinformatics, Vall d'Hebron Institute of Research (VHIR), Universitat Autònoma de Barcelona, Barcelona, Spain.

Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain.

出版信息

Hum Genet. 2025 Mar;144(2-3):191-208. doi: 10.1007/s00439-024-02692-z. Epub 2024 Jul 24.

DOI:10.1007/s00439-024-02692-z
PMID:39048855
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11976337/
Abstract

Next-generation sequencing (NGS) has revolutionized genetic diagnostics, yet its application in precision medicine remains incomplete, despite significant advances in computational tools for variant annotation. Many variants remain unannotated, and existing tools often fail to accurately predict the range of impacts that variants have on protein function. This limitation restricts their utility in relevant applications such as predicting disease severity and onset age. In response to these challenges, a new generation of computational models is emerging, aimed at producing quantitative predictions of genetic variant impacts. However, the field is still in its early stages, and several issues need to be addressed, including improved performance and better interpretability. This study introduces QAFI, a novel methodology that integrates protein-specific regression models within an ensemble learning framework, utilizing conservation-based and structure-related features derived from AlphaFold models. Our findings indicate that QAFI significantly enhances the accuracy of quantitative predictions across various proteins. The approach has been rigorously validated through its application in the CAGI6 contest, focusing on ARSA protein variants, and further tested on a comprehensive set of clinically labeled variants, demonstrating its generalizability and robust predictive power. The straightforward nature of our models may also contribute to better interpretability of the results.

摘要

下一代测序(NGS)彻底改变了基因诊断,然而,尽管在变异注释的计算工具方面取得了重大进展,但其在精准医学中的应用仍不完整。许多变异仍未得到注释,现有工具往往无法准确预测变异对蛋白质功能的影响范围。这一局限性限制了它们在预测疾病严重程度和发病年龄等相关应用中的效用。为应对这些挑战,新一代计算模型正在涌现,旨在对基因变异影响进行定量预测。然而,该领域仍处于早期阶段,有几个问题需要解决,包括提高性能和更好的可解释性。本研究介绍了QAFI,这是一种在集成学习框架内整合蛋白质特异性回归模型的新方法,利用从AlphaFold模型派生的基于保守性和结构相关的特征。我们的研究结果表明,QAFI显著提高了对各种蛋白质定量预测的准确性。该方法已通过在CAGI6竞赛中针对ARSA蛋白变异的应用进行了严格验证,并在一组全面的临床标记变异上进行了进一步测试,证明了其通用性和强大的预测能力。我们模型的直观性质也可能有助于更好地解释结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3459/11976337/6f817f7cc25c/439_2024_2692_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3459/11976337/aeba08f3367f/439_2024_2692_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3459/11976337/507ad2530e4c/439_2024_2692_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3459/11976337/73010c634d17/439_2024_2692_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3459/11976337/687f29850b0b/439_2024_2692_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3459/11976337/a3a536c5925d/439_2024_2692_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3459/11976337/01404e2be806/439_2024_2692_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3459/11976337/6f817f7cc25c/439_2024_2692_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3459/11976337/aeba08f3367f/439_2024_2692_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3459/11976337/507ad2530e4c/439_2024_2692_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3459/11976337/73010c634d17/439_2024_2692_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3459/11976337/687f29850b0b/439_2024_2692_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3459/11976337/a3a536c5925d/439_2024_2692_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3459/11976337/01404e2be806/439_2024_2692_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/3459/11976337/6f817f7cc25c/439_2024_2692_Fig7_HTML.jpg

相似文献

1
QAFI: a novel method for quantitative estimation of missense variant impact using protein-specific predictors and ensemble learning.QAFI:一种使用蛋白质特异性预测因子和集成学习对错义变异影响进行定量估计的新方法。
Hum Genet. 2025 Mar;144(2-3):191-208. doi: 10.1007/s00439-024-02692-z. Epub 2024 Jul 24.
2
Machine learning random forest for predicting oncosomatic variant NGS analysis.机器学习随机森林预测肿瘤体细胞变异 NGS 分析。
Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y.
3
MISTIC: A prediction tool to reveal disease-relevant deleterious missense variants.MISTIC:一种预测工具,可揭示与疾病相关的有害错义变异。
PLoS One. 2020 Jul 31;15(7):e0236962. doi: 10.1371/journal.pone.0236962. eCollection 2020.
4
MmisAT and MmisP: an efficient and accurate suite of variant analysis toolkit for primary mitochondrial diseases.MmisAT 和 MmisP:用于原发性线粒体疾病的高效准确变异分析工具套件。
Hum Genomics. 2023 Nov 27;17(1):108. doi: 10.1186/s40246-023-00557-6.
5
Prediction of impacts of mutations on protein structure and interactions: SDM, a statistical approach, and mCSM, using machine learning.预测突变对蛋白质结构和相互作用的影响:SDM,一种统计方法,以及使用机器学习的 mCSM。
Protein Sci. 2020 Jan;29(1):247-257. doi: 10.1002/pro.3774. Epub 2019 Nov 25.
6
Predicting mutant outcome by combining deep mutational scanning and machine learning.通过结合深度突变扫描和机器学习预测突变结果。
Proteins. 2022 Jan;90(1):45-57. doi: 10.1002/prot.26184. Epub 2021 Jul 31.
7
Annotation of Human Exome Gene Variants with Consensus Pathogenicity.人类外显子基因变异的共识致病性注释。
Genes (Basel). 2020 Sep 14;11(9):1076. doi: 10.3390/genes11091076.
8
SNooPer: a machine learning-based method for somatic variant identification from low-pass next-generation sequencing.SNooPer:一种基于机器学习从低深度下一代测序中识别体细胞变异的方法。
BMC Genomics. 2016 Nov 14;17(1):912. doi: 10.1186/s12864-016-3281-2.
9
Comprehensive characterization of amino acid positions in protein structures reveals molecular effect of missense variants.全面描述蛋白质结构中氨基酸位置的特征,揭示错义变异的分子效应。
Proc Natl Acad Sci U S A. 2020 Nov 10;117(45):28201-28211. doi: 10.1073/pnas.2002660117. Epub 2020 Oct 26.
10
Cross-protein transfer learning substantially improves disease variant prediction.跨蛋白迁移学习显著提高了疾病变异体预测的性能。
Genome Biol. 2023 Aug 7;24(1):182. doi: 10.1186/s13059-023-03024-6.

引用本文的文献

1
Evaluation of enzyme activity predictions for variants of unknown significance in Arylsulfatase A.芳基硫酸酯酶A中意义未明变异体的酶活性预测评估。
Hum Genet. 2025 Mar;144(2-3):295-308. doi: 10.1007/s00439-025-02731-3. Epub 2025 Mar 8.

本文引用的文献

1
Evaluation of enzyme activity predictions for variants of unknown significance in Arylsulfatase A.芳基硫酸酯酶A中意义未明变异体的酶活性预测评估。
Hum Genet. 2025 Mar;144(2-3):295-308. doi: 10.1007/s00439-025-02731-3. Epub 2025 Mar 8.
2
Rapid genomic sequencing for genetic disease diagnosis and therapy in intensive care units: a review.重症监护病房中用于遗传疾病诊断和治疗的快速基因组测序:综述
NPJ Genom Med. 2024 Feb 27;9(1):17. doi: 10.1038/s41525-024-00404-0.
3
CAGI, the Critical Assessment of Genome Interpretation, establishes progress and prospects for computational genetic variant interpretation methods.
CAGI,即基因组解读的关键评估,旨在评估计算遗传变异解读方法的进展和前景。
Genome Biol. 2024 Feb 22;25(1):53. doi: 10.1186/s13059-023-03113-6.
4
AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences.2024 年的 AlphaFold 蛋白质结构数据库:为超过 2.14 亿个蛋白质序列提供结构覆盖。
Nucleic Acids Res. 2024 Jan 5;52(D1):D368-D375. doi: 10.1093/nar/gkad1011.
5
A New Set of in Silico Tools to Support the Interpretation of ATM Missense Variants Using Graphical Analysis.一组新的计算工具,用于通过图形分析支持 ATM 错义变异的解读。
J Mol Diagn. 2024 Jan;26(1):17-28. doi: 10.1016/j.jmoldx.2023.09.009. Epub 2023 Oct 19.
6
Accurate proteome-wide missense variant effect prediction with AlphaMissense.使用 AlphaMissense 进行精确的全蛋白质错义变异效应预测。
Science. 2023 Sep 22;381(6664):eadg7492. doi: 10.1126/science.adg7492.
7
Integrating deep mutational scanning and low-throughput mutagenesis data to predict the impact of amino acid variants.整合深度突变扫描和低通量诱变数据来预测氨基酸变异的影响。
Gigascience. 2022 Dec 28;12. doi: 10.1093/gigascience/giad073. Epub 2023 Sep 18.
8
Genome-wide prediction of disease variant effects with a deep protein language model.利用深度蛋白质语言模型进行全基因组疾病变异效应预测。
Nat Genet. 2023 Sep;55(9):1512-1522. doi: 10.1038/s41588-023-01465-0. Epub 2023 Aug 10.
9
Predicting functional effect of missense variants using graph attention neural networks.使用图注意力神经网络预测错义变异的功能效应。
Nat Mach Intell. 2022 Nov;4(11):1017-1028. doi: 10.1038/s42256-022-00561-w. Epub 2022 Nov 15.
10
Predicting disease severity in metachromatic leukodystrophy using protein activity and a patient phenotype matrix.使用蛋白活性和患者表型矩阵预测异染性脑白质营养不良的疾病严重程度。
Genome Biol. 2023 Jul 21;24(1):172. doi: 10.1186/s13059-023-03001-z.