• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

EARN:一种集成机器学习算法,用于预测转移性乳腺癌的驱动基因。

EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer.

机构信息

Department of Biology, Faculty of Science, Payame Noor University, Tehran, Iran.

Laboratory of Genomics and Epigenomics (LGE), Department of Biochemistry, Institute of Biochemistry and Biophysics (IBB), University of Tehran, Tehran, Iran.

出版信息

BMC Med Genomics. 2021 May 7;14(1):122. doi: 10.1186/s12920-021-00974-3.

DOI:10.1186/s12920-021-00974-3
PMID:33962648
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8105935/
Abstract

BACKGROUND

Today, there are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer. However, our understanding of the drivers that influence cancer aggression is limited.

METHODS

In this work, we study somatic mutation data consists of 450 metastatic breast tumor samples from cBio Cancer Genomics Portal. We use four software tools to extract features from this data. Then, an ensemble classifier (EC) learning algorithm called EARN (Ensemble of Artificial Neural Network, Random Forest, and non-linear Support Vector Machine) is proposed to evaluate plausible driver genes for metastatic breast cancer (MBCA). The decision-making strategy for the proposed ensemble machine is based on the aggregation of the predicted scores obtained from individual learning classifiers to be prioritized homo sapiens genes annotated as protein-coding from NCBI.

RESULTS

This study is an attempt to focus on the findings in several aspects of MBCA prognosis and diagnosis. First, drivers and passengers predicted by SVM, ANN, RF, and EARN are introduced. Second, biological inferences of predictions are discussed based on gene set enrichment analysis. Third, statistical validation and comparison of all learning methods are performed by some evaluation metrics. Finally, the pathway enrichment analysis (PEA) using ReactomeFIVIz tool (FDR < 0.03) for the top 100 genes predicted by EARN leads us to propose a new gene set panel for MBCA. It includes HDAC3, ABAT, GRIN1, PLCB1, and KPNA2 as well as NCOR1, TBL1XR1, SIRT4, KRAS, CACNA1E, PRKCG, GPS2, SIN3A, ACTB, KDM6B, and PRMT1. Furthermore, we compare results for MBCA to other outputs regarding 983 primary tumor samples of breast invasive carcinoma (BRCA) obtained from the Cancer Genome Atlas (TCGA). The comparison between outputs shows that ROC-AUC reaches 99.24% using EARN for MBCA and 99.79% for BRCA. This statistical result is better than three individual classifiers in each case.

CONCLUSIONS

This research using an integrative approach assists precision oncologists to design compact targeted panels that eliminate the need for whole-genome/exome sequencing. The schematic representation of the proposed model is presented as the Graphic abstract.

摘要

背景

如今,原发性乳腺癌等复杂疾病的预后和诊断有很多标志物。然而,我们对影响癌症侵袭性的驱动因素的了解有限。

方法

在这项工作中,我们研究了来自 cBio Cancer Genomics Portal 的 450 个转移性乳腺癌样本的体细胞突变数据。我们使用四个软件工具从该数据中提取特征。然后,提出了一种称为 EARN(人工神经网络、随机森林和非线性支持向量机的集成分类器)的集成分类器(EC)学习算法,用于评估转移性乳腺癌(MBCA)的可能驱动基因。该集成机器的决策策略基于聚合从个体学习分类器获得的预测分数,这些分数优先考虑来自 NCBI 的注释为蛋白质编码的同源人类基因。

结果

本研究试图关注 MBCA 预后和诊断几个方面的发现。首先,介绍了 SVM、ANN、RF 和 EARN 预测的驱动基因和乘客基因。其次,根据基因集富集分析讨论了预测的生物学推论。第三,通过一些评估指标对所有学习方法进行了统计验证和比较。最后,使用 ReactomeFIVIz 工具(FDR<0.03)对 EARN 预测的前 100 个基因进行通路富集分析(PEA),提出了一个新的 MBCA 基因集面板,其中包括 HDAC3、ABAT、GRIN1、PLCB1 和 KPNA2 以及 NCOR1、TBL1XR1、SIRT4、KRAS、CACNA1E、PRKCG、GPS2、SIN3A、ACTB、KDM6B 和 PRMT1。此外,我们将 MBCA 的结果与从癌症基因组图谱(TCGA)获得的 983 个原发性乳腺癌浸润性癌(BRCA)样本的其他结果进行了比较。结果比较表明,EARN 用于 MBCA 的 ROC-AUC 达到 99.24%,用于 BRCA 的 ROC-AUC 达到 99.79%。这一统计结果优于每种情况下的三个个体分类器。

结论

本研究使用综合方法帮助精准肿瘤学家设计紧凑的靶向面板,从而无需进行全基因组/外显子组测序。所提出模型的示意图表示如图摘要所示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b38/8105935/126ad4f65810/12920_2021_974_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b38/8105935/34892f6d7fb4/12920_2021_974_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b38/8105935/8bba48221a88/12920_2021_974_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b38/8105935/ba9aa8e432c7/12920_2021_974_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b38/8105935/7c40648ea9c0/12920_2021_974_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b38/8105935/ba110b2c6a1c/12920_2021_974_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b38/8105935/8757c8e43e73/12920_2021_974_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b38/8105935/126ad4f65810/12920_2021_974_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b38/8105935/34892f6d7fb4/12920_2021_974_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b38/8105935/8bba48221a88/12920_2021_974_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b38/8105935/ba9aa8e432c7/12920_2021_974_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b38/8105935/7c40648ea9c0/12920_2021_974_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b38/8105935/ba110b2c6a1c/12920_2021_974_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b38/8105935/8757c8e43e73/12920_2021_974_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7b38/8105935/126ad4f65810/12920_2021_974_Fig7_HTML.jpg

相似文献

1
EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer.EARN:一种集成机器学习算法,用于预测转移性乳腺癌的驱动基因。
BMC Med Genomics. 2021 May 7;14(1):122. doi: 10.1186/s12920-021-00974-3.
2
Feature genes in metastatic breast cancer identified by MetaDE and SVM classifier methods.通过 MetaDE 和 SVM 分类器方法鉴定转移性乳腺癌的特征基因。
Mol Med Rep. 2018 Mar;17(3):4281-4290. doi: 10.3892/mmr.2018.8398. Epub 2018 Jan 9.
3
Mixture classification model based on clinical markers for breast cancer prognosis.基于临床标志物的乳腺癌预后混合分类模型。
Artif Intell Med. 2010 Feb-Mar;48(2-3):129-37. doi: 10.1016/j.artmed.2009.07.008. Epub 2009 Dec 14.
4
Using a machine learning approach to identify key prognostic molecules for esophageal squamous cell carcinoma.采用机器学习方法鉴定食管鳞癌的关键预后分子。
BMC Cancer. 2021 Aug 9;21(1):906. doi: 10.1186/s12885-021-08647-1.
5
Machine learning assisted analysis of breast cancer gene expression profiles reveals novel potential prognostic biomarkers for triple-negative breast cancer.机器学习辅助分析乳腺癌基因表达谱揭示了三阴性乳腺癌新的潜在预后生物标志物。
Comput Struct Biotechnol J. 2022 Mar 24;20:1618-1631. doi: 10.1016/j.csbj.2022.03.019. eCollection 2022.
6
Establishment and validation of an interactive artificial intelligence platform to predict postoperative ambulatory status for patients with metastatic spinal disease: a multicenter analysis.建立和验证交互式人工智能平台,以预测转移性脊柱疾病患者的术后活动状态:一项多中心分析。
Int J Surg. 2024 May 1;110(5):2738-2756. doi: 10.1097/JS9.0000000000001169.
7
Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survival.体细胞突变、表达和功能数据的整合揭示了预测乳腺癌生存的潜在驱动基因。
Bioinformatics. 2015 Aug 15;31(16):2607-13. doi: 10.1093/bioinformatics/btv164. Epub 2015 Mar 24.
8
Machine learning random forest for predicting oncosomatic variant NGS analysis.机器学习随机森林预测肿瘤体细胞变异 NGS 分析。
Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y.
9
CUP-AI-Dx: A tool for inferring cancer tissue of origin and molecular subtype using RNA gene-expression data and artificial intelligence.CUP-AI-Dx:一种使用 RNA 基因表达数据和人工智能推断癌症组织来源和分子亚型的工具。
EBioMedicine. 2020 Nov;61:103030. doi: 10.1016/j.ebiom.2020.103030. Epub 2020 Oct 9.
10
[Identification of Protein-Coding Gene Markers in Breast Invasive Carcinoma Based on Machine Learning].基于机器学习的乳腺浸润性癌蛋白质编码基因标志物的鉴定
Zhongguo Yi Xue Ke Xue Yuan Xue Bao. 2024 Apr;46(2):147-153. doi: 10.3881/j.issn.1000-503X.15717.

引用本文的文献

1
Refining breast cancer biomarker discovery and drug targeting through an advanced data-driven approach.通过先进的数据驱动方法改进乳腺癌生物标志物的发现和药物靶向。
BMC Bioinformatics. 2024 Jan 22;25(1):33. doi: 10.1186/s12859-024-05657-1.
2
Machine learning in metastatic cancer research: Potentials, possibilities, and prospects.转移性癌症研究中的机器学习:潜力、可能性与前景。
Comput Struct Biotechnol J. 2023 Mar 29;21:2454-2470. doi: 10.1016/j.csbj.2023.03.046. eCollection 2023.
3
Calmodulin Mutations in Human Disease.钙调蛋白突变与人类疾病

本文引用的文献

1
Considerations for feature selection using gene pairs and applications in large-scale dataset integration, novel oncogene discovery, and interpretable cancer screening.考虑使用基因对进行特征选择,并将其应用于大规模数据集整合、新癌基因发现和可解释性癌症筛查。
BMC Med Genomics. 2020 Oct 22;13(Suppl 10):148. doi: 10.1186/s12920-020-00778-x.
2
Genetic profiling of primary and secondary tumors from patients with lung adenocarcinoma and bone metastases reveals targeted therapy options.对肺腺癌伴骨转移患者的原发和继发肿瘤进行基因谱分析,揭示了靶向治疗选择。
Mol Med. 2020 Sep 17;26(1):88. doi: 10.1186/s10020-020-00197-9.
3
TFPI2 suppresses breast cancer progression through inhibiting TWIST-integrin α5 pathway.
Channels (Austin). 2023 Dec;17(1):2165278. doi: 10.1080/19336950.2023.2165278.
4
The role of histone deacetylase 3 in breast cancer.组蛋白去乙酰化酶 3 在乳腺癌中的作用。
Med Oncol. 2022 May 17;39(5):84. doi: 10.1007/s12032-022-01681-4.
5
TGFα-EGFR pathway in breast carcinogenesis, association with WWOX expression and estrogen activation.TGFα-EGFR 通路在乳腺癌发生中的作用,与 WWOX 表达和雌激素激活的关联。
J Appl Genet. 2022 May;63(2):339-359. doi: 10.1007/s13353-022-00690-3. Epub 2022 Mar 15.
6
Anti-Ebola: an initiative to predict Ebola virus inhibitors through machine learning.抗埃博拉:通过机器学习预测埃博拉病毒抑制剂的研究计划。
Mol Divers. 2022 Jun;26(3):1635-1644. doi: 10.1007/s11030-021-10291-7. Epub 2021 Aug 6.
TFPI2 通过抑制 TWIST-整合素 α5 通路抑制乳腺癌进展。
Mol Med. 2020 Apr 5;26(1):27. doi: 10.1186/s10020-020-00158-2.
4
Specific chromatin landscapes and transcription factors couple breast cancer subtype with metastatic relapse to lung or brain.特定的染色质景观和转录因子将乳腺癌亚型与肺或脑转移复发联系起来。
BMC Med Genomics. 2020 Mar 6;13(1):33. doi: 10.1186/s12920-020-0695-0.
5
Aneuploid IMR90 cells induced by depletion of pRB, DNMT1 and MAD2 show a common gene expression signature.经 pRB、DNMT1 和 MAD2 耗竭诱导的非整倍体 IMR90 细胞表现出共同的基因表达特征。
Genomics. 2020 May;112(3):2541-2549. doi: 10.1016/j.ygeno.2020.02.006. Epub 2020 Feb 10.
6
Histone deacetylase (HDAC) inhibitors and doxorubicin combinations target both breast cancer stem cells and non-stem breast cancer cells simultaneously.组蛋白去乙酰化酶(HDAC)抑制剂与多柔比星联合应用可同时靶向乳腺癌干细胞和非干细胞乳腺癌细胞。
Breast Cancer Res Treat. 2020 Feb;179(3):615-629. doi: 10.1007/s10549-019-05504-5. Epub 2019 Nov 29.
7
Unsupervised feature selection algorithm for multiclass cancer classification of gene expression RNA-Seq data.无监督特征选择算法在基因表达 RNA-Seq 数据的多类癌症分类中的应用。
Genomics. 2020 Mar;112(2):1916-1925. doi: 10.1016/j.ygeno.2019.11.004. Epub 2019 Nov 20.
8
Molecular determinants of drug response in TNBC cell lines.三阴性乳腺癌细胞系中药物反应的分子决定因素。
Breast Cancer Res Treat. 2020 Jan;179(2):337-347. doi: 10.1007/s10549-019-05473-9. Epub 2019 Oct 26.
9
Synaptic proximity enables NMDAR signalling to promote brain metastasis.突触接近使 NMDAR 信号能够促进脑转移。
Nature. 2019 Sep;573(7775):526-531. doi: 10.1038/s41586-019-1576-6. Epub 2019 Sep 18.
10
Reviewing ensemble classification methods in breast cancer.综述乳腺癌中的集成分类方法。
Comput Methods Programs Biomed. 2019 Aug;177:89-112. doi: 10.1016/j.cmpb.2019.05.019. Epub 2019 May 20.