Suppr超能文献

EARN:一种集成机器学习算法,用于预测转移性乳腺癌的驱动基因。

EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer.

机构信息

Department of Biology, Faculty of Science, Payame Noor University, Tehran, Iran.

Laboratory of Genomics and Epigenomics (LGE), Department of Biochemistry, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran.

出版信息

BMC Med Genomics. 2021 May 7;14(1):122. doi: 10.1186/s12920-021-00974-3.

Abstract

BACKGROUND

Today, there are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer. However, our understanding of the drivers that influence cancer aggression is limited.

METHODS

In this work, we study somatic mutation data consists of 450 metastatic breast tumor samples from cBio Cancer Genomics Portal. We use four software tools to extract features from this data. Then, an ensemble classifier (EC) learning algorithm called EARN (Ensemble of Artificial Neural Network, Random Forest, and non-linear Support Vector Machine) is proposed to evaluate plausible driver genes for metastatic breast cancer (MBCA). The decision-making strategy for the proposed ensemble machine is based on the aggregation of the predicted scores obtained from individual learning classifiers to be prioritized homo sapiens genes annotated as protein-coding from NCBI.

RESULTS

This study is an attempt to focus on the findings in several aspects of MBCA prognosis and diagnosis. First, drivers and passengers predicted by SVM, ANN, RF, and EARN are introduced. Second, biological inferences of predictions are discussed based on gene set enrichment analysis. Third, statistical validation and comparison of all learning methods are performed by some evaluation metrics. Finally, the pathway enrichment analysis (PEA) using ReactomeFIVIz tool (FDR < 0.03) for the top 100 genes predicted by EARN leads us to propose a new gene set panel for MBCA. It includes HDAC3, ABAT, GRIN1, PLCB1, and KPNA2 as well as NCOR1, TBL1XR1, SIRT4, KRAS, CACNA1E, PRKCG, GPS2, SIN3A, ACTB, KDM6B, and PRMT1. Furthermore, we compare results for MBCA to other outputs regarding 983 primary tumor samples of breast invasive carcinoma (BRCA) obtained from the Cancer Genome Atlas (TCGA). The comparison between outputs shows that ROC-AUC reaches 99.24% using EARN for MBCA and 99.79% for BRCA. This statistical result is better than three individual classifiers in each case.

CONCLUSIONS

This research using an integrative approach assists precision oncologists to design compact targeted panels that eliminate the need for whole-genome/exome sequencing. The schematic representation of the proposed model is presented as the Graphic abstract.

摘要

背景

如今,原发性乳腺癌等复杂疾病的预后和诊断有很多标志物。然而,我们对影响癌症侵袭性的驱动因素的了解有限。

方法

在这项工作中,我们研究了来自 cBio Cancer Genomics Portal 的 450 个转移性乳腺癌样本的体细胞突变数据。我们使用四个软件工具从该数据中提取特征。然后,提出了一种称为 EARN(人工神经网络、随机森林和非线性支持向量机的集成分类器)的集成分类器(EC)学习算法,用于评估转移性乳腺癌(MBCA)的可能驱动基因。该集成机器的决策策略基于聚合从个体学习分类器获得的预测分数,这些分数优先考虑来自 NCBI 的注释为蛋白质编码的同源人类基因。

结果

本研究试图关注 MBCA 预后和诊断几个方面的发现。首先,介绍了 SVM、ANN、RF 和 EARN 预测的驱动基因和乘客基因。其次,根据基因集富集分析讨论了预测的生物学推论。第三,通过一些评估指标对所有学习方法进行了统计验证和比较。最后,使用 ReactomeFIVIz 工具(FDR<0.03)对 EARN 预测的前 100 个基因进行通路富集分析(PEA),提出了一个新的 MBCA 基因集面板,其中包括 HDAC3、ABAT、GRIN1、PLCB1 和 KPNA2 以及 NCOR1、TBL1XR1、SIRT4、KRAS、CACNA1E、PRKCG、GPS2、SIN3A、ACTB、KDM6B 和 PRMT1。此外,我们将 MBCA 的结果与从癌症基因组图谱(TCGA)获得的 983 个原发性乳腺癌浸润性癌(BRCA)样本的其他结果进行了比较。结果比较表明,EARN 用于 MBCA 的 ROC-AUC 达到 99.24%,用于 BRCA 的 ROC-AUC 达到 99.79%。这一统计结果优于每种情况下的三个个体分类器。

结论

本研究使用综合方法帮助精准肿瘤学家设计紧凑的靶向面板,从而无需进行全基因组/外显子组测序。所提出模型的示意图表示如图摘要所示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b38/8105935/34892f6d7fb4/12920_2021_974_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验