一种基于随机森林的BRCA1错义变异分类预测模型：评估错义突变效应的新方法。

A random forest-based predictive model for classifying BRCA1 missense variants: a novel approach for evaluating the missense mutations effect.

作者信息

Ka Hamed, Naghinejad Maryam, Amirfiroozy Akbar, Shamsir Mohd Shahir, Parvizpour Sepideh, Razmara Jafar

机构信息

Department of Computer Science, Faculty of Mathematics, Statistics, and Computer Science, University of Tabriz, Tabriz, Iran.

Department of Medical Genetics, Faculty of Medicine, Tabriz University of Medical Sciences, Tabriz, Iran.

出版信息

J Hum Genet. 2025 Apr 18. doi: 10.1038/s10038-025-01341-1.

DOI:10.1038/s10038-025-01341-1

PMID:40251429

Abstract

The right classification of variants is the key to pre-symptomatic detection of disease and conducting preventive actions. Since BRCA1 has a high incidence and penetrance in breast and ovarian cancers, a high-performance predictive tool can be employed to classify the clinical significance of its variants. Several tools have previously been developed for this purpose which poorly classify the significance in specific cases. The proposed tools commonly assign a score without providing any interpretation behind it. To reach an accurate predictive tool with interpretation abilities, in this study, we propose BRCA1-Forest which works based on random forest as a well-known machine learning technique for making interpretable decisions with high specificity and sensitivity in variants classification. The method involves narrowing down available options until reaching the final decision. To this end, a set of BRCA1 benign and pathogenic missense variants was collected first, and then, the dataset was prepared based on the effect of each variant on the protein sequence. The dataset was enriched by adding physicochemical changes and the conservation score of the amino acid position as pathogenicity criteria. The proposed model was trained based on the dataset to classify the clinical significance of variants. The performance of BRCA1-Forest was compared to four state-of-the-art methods, SIFT, PolyPhen2, CADD, and DANN, in terms of different evaluation metrics including precision, recall, false positive rate (FPR), the area under the receiver operator curve (AUC ROC), the area under the precision-recall curve (AUC-PR), and Mathew correlation coefficient (MCC). The results reveal that the proposed model outperforms the abovementioned tools in all metrics except for recall. The software of BRCA1-Forest is available at https://github.com/HamedKAAC/BRCA1Forest .

摘要

正确分类变异是疾病症状前检测和采取预防措施的关键。由于BRCA1在乳腺癌和卵巢癌中具有高发病率和高外显率，因此可以使用高性能预测工具来分类其变异的临床意义。此前已经开发了几种用于此目的的工具，但在特定情况下对意义的分类效果不佳。这些工具通常只给出一个分数，而不提供任何背后的解释。为了获得一个具有解释能力的准确预测工具，在本研究中，我们提出了BRCA1-Forest，它基于随机森林工作，随机森林是一种著名的机器学习技术，可在变异分类中以高特异性和敏感性做出可解释的决策。该方法包括逐步缩小可用选项范围，直到做出最终决策。为此，首先收集了一组BRCA1良性和致病性错义变异，然后根据每个变异对蛋白质序列的影响准备数据集。通过添加物理化学变化和氨基酸位置的保守性得分作为致病性标准来丰富数据集。基于该数据集对提出的模型进行训练，以分类变异的临床意义。在包括精确率、召回率、假阳性率（FPR）、受试者工作特征曲线下面积（AUC ROC）、精确率-召回率曲线下面积（AUC-PR）和马修相关系数（MCC）等不同评估指标方面，将BRCA1-Forest的性能与四种最先进的方法SIFT、PolyPhen2、CADD和DANN进行了比较。结果表明，除召回率外，所提出的模型在所有指标上均优于上述工具。BRCA1-Forest的软件可在https://github.com/HamedKAAC/BRCA1Forest获取。

相似文献

A random forest-based predictive model for classifying BRCA1 missense variants: a novel approach for evaluating the missense mutations effect.

J Hum Genet. 2025 Apr 18. doi: 10.1038/s10038-025-01341-1.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

Analysis of the conditions for applying BRCA genetic testing to women with breast cancer using the Japanese HBOC consortium and the Japanese organization of hereditary breast and ovarian cancer (JOHBOC) registry project database.

Breast Cancer. 2025 May 5. doi: 10.1007/s12282-025-01704-8.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of topotecan for ovarian cancer.

Health Technol Assess. 2001;5(28):1-110. doi: 10.3310/hta5280.

The clinical effectiveness and cost-effectiveness of enzyme replacement therapy for Gaucher's disease: a systematic review.

Health Technol Assess. 2006 Jul;10(24):iii-iv, ix-136. doi: 10.3310/hta10240.

Rapid, point-of-care antigen tests for diagnosis of SARS-CoV-2 infection.

Cochrane Database Syst Rev. 2022 Jul 22;7(7):CD013705. doi: 10.1002/14651858.CD013705.pub3.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.

Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.

Risk-reducing bilateral salpingo-oophorectomy in women with BRCA1 or BRCA2 mutations.

Cochrane Database Syst Rev. 2018 Aug 24;8(8):CD012464. doi: 10.1002/14651858.CD012464.pub2.

本文引用的文献

Loss-of-function, gain-of-function and dominant-negative mutations have profoundly different effects on protein structure.

Nat Commun. 2022 Jul 6;13(1):3895. doi: 10.1038/s41467-022-31686-6.

Applying Bioinformatic Platforms, In Vitro, and In Vivo Functional Assays in the Characterization of Genetic Variants in the GH/IGF Pathway Affecting Growth and Development.

Cells. 2021 Aug 12;10(8):2063. doi: 10.3390/cells10082063.

Analysis and Interpretation of the Impact of Missense Variants in Cancer.

Int J Mol Sci. 2021 May 21;22(11):5416. doi: 10.3390/ijms22115416.

A case-only study to identify genetic modifiers of breast cancer risk for BRCA1/BRCA2 mutation carriers.

Nat Commun. 2021 Feb 17;12(1):1078. doi: 10.1038/s41467-020-20496-3.

Disruption of Hydrogen-Bond Network in Rhodopsin Mutations Cause Night Blindness.

J Mol Biol. 2020 Sep 4;432(19):5378-5389. doi: 10.1016/j.jmb.2020.08.006. Epub 2020 Aug 11.

Using an integrative machine learning approach utilising homology modelling to clinically interpret genetic variants: CACNA1F as an exemplar.

Eur J Hum Genet. 2020 Sep;28(9):1274-1282. doi: 10.1038/s41431-020-0623-y. Epub 2020 Apr 20.

Aggregation and Cellular Toxicity of Pathogenic or Non-pathogenic Proteins.

Sci Rep. 2020 Mar 20;10(1):5120. doi: 10.1038/s41598-020-62062-3.

Systematic misclassification of missense variants in BRCA1 and BRCA2 "coldspots".

Genet Med. 2020 May;22(5):825-830. doi: 10.1038/s41436-019-0740-6. Epub 2020 Jan 8.

Changes in hydrophobicity mainly promotes the aggregation tendency of ALS associated SOD1 mutants.

Int J Biol Macromol. 2020 Feb 15;145:904-913. doi: 10.1016/j.ijbiomac.2019.09.181. Epub 2019 Oct 24.

DeepInsight: A methodology to transform a non-image data to an image for convolution neural network architecture.

Sci Rep. 2019 Aug 6;9(1):11399. doi: 10.1038/s41598-019-47765-6.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

一种基于随机森林的BRCA1错义变异分类预测模型：评估错义突变效应的新方法。

A random forest-based predictive model for classifying BRCA1 missense variants: a novel approach for evaluating the missense mutations effect.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献