• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估用于识别癌症驱动基因的机器学习方法。

Evaluating machine learning methodologies for identification of cancer driver genes.

机构信息

Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, P.O. Box 344, Rabigh, 21911, Saudi Arabia.

Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan.

出版信息

Sci Rep. 2021 Jun 10;11(1):12281. doi: 10.1038/s41598-021-91656-8.

DOI:10.1038/s41598-021-91656-8
PMID:34112883
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8192921/
Abstract

Cancer is driven by distinctive sorts of changes and basic variations in genes. Recognizing cancer driver genes is basic for accurate oncological analysis. Numerous methodologies to distinguish and identify drivers presently exist, but efficient tools to combine and optimize them on huge datasets are few. Most strategies for prioritizing transformations depend basically on frequency-based criteria. Strategies are required to dependably prioritize organically dynamic driver changes over inert passengers in high-throughput sequencing cancer information sets. This study proposes a model namely PCDG-Pred which works as a utility capable of distinguishing cancer driver and passenger attributes of genes based on sequencing data. Keeping in view the significance of the cancer driver genes an efficient method is proposed to identify the cancer driver genes. Further, various validation techniques are applied at different levels to establish the effectiveness of the model and to obtain metrics like accuracy, Mathew's correlation coefficient, sensitivity, and specificity. The results of the study strongly indicate that the proposed strategy provides a fundamental functional advantage over other existing strategies for cancer driver genes identification. Subsequently, careful experiments exhibit that the accuracy metrics obtained for self-consistency, independent set, and cross-validation tests are 91.08%., 87.26%, and 92.48% respectively.

摘要

癌症是由基因的独特变化和基本变异驱动的。识别癌症驱动基因是准确进行肿瘤分析的基础。目前已经存在许多识别和鉴定驱动基因的方法,但在大规模数据集上有效地组合和优化它们的有效工具却很少。大多数优先考虑转化的策略主要基于基于频率的标准。需要有策略能够可靠地将高通量测序癌症信息集中的有机动态驱动变化优先于惰性乘客。本研究提出了一种名为 PCDG-Pred 的模型,它可以根据测序数据区分基因的癌症驱动和乘客属性。鉴于癌症驱动基因的重要性,提出了一种有效的方法来识别癌症驱动基因。此外,还在不同层次上应用了各种验证技术来建立模型的有效性,并获得准确性、马修相关系数、灵敏度和特异性等指标。研究结果强烈表明,与其他现有的癌症驱动基因识别策略相比,该策略提供了基本的功能优势。随后,仔细的实验表明,自我一致性、独立集和交叉验证测试的准确性指标分别为 91.08%、87.26%和 92.48%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/3879459b232d/41598_2021_91656_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/d32da5d217a8/41598_2021_91656_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/47290a9afb95/41598_2021_91656_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/303c5a061d25/41598_2021_91656_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/b6c0144cdfcb/41598_2021_91656_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/061d6c4fca14/41598_2021_91656_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/d8c0f14fd140/41598_2021_91656_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/4c626b544c4f/41598_2021_91656_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/521efd5bd948/41598_2021_91656_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/3879459b232d/41598_2021_91656_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/d32da5d217a8/41598_2021_91656_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/47290a9afb95/41598_2021_91656_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/303c5a061d25/41598_2021_91656_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/b6c0144cdfcb/41598_2021_91656_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/061d6c4fca14/41598_2021_91656_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/d8c0f14fd140/41598_2021_91656_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/4c626b544c4f/41598_2021_91656_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/521efd5bd948/41598_2021_91656_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c991/8192921/3879459b232d/41598_2021_91656_Fig9_HTML.jpg

相似文献

1
Evaluating machine learning methodologies for identification of cancer driver genes.评估用于识别癌症驱动基因的机器学习方法。
Sci Rep. 2021 Jun 10;11(1):12281. doi: 10.1038/s41598-021-91656-8.
2
Ontology-based prediction of cancer driver genes.基于本体论的癌症驱动基因预测。
Sci Rep. 2019 Nov 22;9(1):17405. doi: 10.1038/s41598-019-53454-1.
3
The Integrative Method Based on the Module-Network for Identifying Driver Genes in Cancer Subtypes.基于模块网络的癌症亚型驱动基因识别的综合方法。
Molecules. 2018 Jan 24;23(2):183. doi: 10.3390/molecules23020183.
4
LOTUS: A single- and multitask machine learning algorithm for the prediction of cancer driver genes.LOTUS:一种用于癌症驱动基因预测的单任务和多任务机器学习算法。
PLoS Comput Biol. 2019 Sep 30;15(9):e1007381. doi: 10.1371/journal.pcbi.1007381. eCollection 2019 Sep.
5
Evaluating the evaluation of cancer driver genes.评估癌症驱动基因的评估。
Proc Natl Acad Sci U S A. 2016 Dec 13;113(50):14330-14335. doi: 10.1073/pnas.1616440113. Epub 2016 Nov 22.
6
Discovering potential cancer driver genes by an integrated network-based approach.通过基于网络的综合方法发现潜在的癌症驱动基因。
Mol Biosyst. 2016 Aug 16;12(9):2921-31. doi: 10.1039/c6mb00274a.
7
Machine Learning Classification and Structure-Functional Analysis of Cancer Mutations Reveal Unique Dynamic and Network Signatures of Driver Sites in Oncogenes and Tumor Suppressor Genes.机器学习分类和癌症突变的结构-功能分析揭示了癌基因和肿瘤抑制基因中驱动位点的独特动态和网络特征。
J Chem Inf Model. 2018 Oct 22;58(10):2131-2150. doi: 10.1021/acs.jcim.8b00414. Epub 2018 Oct 3.
8
Identification of new driver and passenger mutations within APOBEC-induced hotspot mutations in bladder cancer.鉴定膀胱癌中 APOBEC 诱导热点突变中新的驱动和乘客突变。
Genome Med. 2020 Sep 28;12(1):85. doi: 10.1186/s13073-020-00781-y.
9
Machine learning random forest for predicting oncosomatic variant NGS analysis.机器学习随机森林预测肿瘤体细胞变异 NGS 分析。
Sci Rep. 2021 Nov 8;11(1):21820. doi: 10.1038/s41598-021-01253-y.
10
DriverML: a machine learning algorithm for identifying driver genes in cancer sequencing studies.DriverML:一种用于鉴定癌症测序研究中驱动基因的机器学习算法。
Nucleic Acids Res. 2019 May 7;47(8):e45. doi: 10.1093/nar/gkz096.

引用本文的文献

1
Targeting Regulatory Noncoding RNAs in Human Cancer: The State of the Art in Clinical Trials.靶向人类癌症中的调控性非编码RNA:临床试验的现状
Pharmaceutics. 2025 Apr 4;17(4):471. doi: 10.3390/pharmaceutics17040471.
2
DEL-Thyroid: deep ensemble learning framework for detection of thyroid cancer progression through genomic mutation.DEL-Thyroid:通过基因组突变检测甲状腺癌进展的深度集成学习框架。
BMC Med Inform Decis Mak. 2024 Jul 22;24(1):198. doi: 10.1186/s12911-024-02604-1.
3
Gsw-fi: a GLM model incorporating shrinkage and double-weighted strategies for identifying cancer driver genes with functional impact.

本文引用的文献

1
iSulfoTyr-PseAAC: Identify Tyrosine Sulfation Sites by Incorporating Statistical Moments Chou's 5-steps Rule and Pseudo Components.iSulfoTyr-PseAAC:通过结合统计矩、周氏五步法则和伪组分来识别酪氨酸硫酸化位点
Curr Genomics. 2019 May;20(4):306-320. doi: 10.2174/1389202920666190819091609.
2
iMethylK_pseAAC: Improving Accuracy of Lysine Methylation Sites Identification by Incorporating Statistical Moments and Position Relative Features into General PseAAC Chou's 5-steps Rule.iMethylK_pseAAC:通过将统计矩和位置相关特征纳入通用伪氨基酸组成的周氏五步法则来提高赖氨酸甲基化位点识别的准确性
Curr Genomics. 2019 May;20(4):275-292. doi: 10.2174/1389202920666190809095206.
3
Gsw-fi:一种具有收缩和双重加权策略的 GLM 模型,用于识别具有功能影响的癌症驱动基因。
BMC Bioinformatics. 2024 Mar 6;25(1):99. doi: 10.1186/s12859-024-05707-8.
4
TOP1 and R-loops facilitate transcriptional DSBs at hypertranscribed cancer driver genes.TOP1和R环促进高转录癌症驱动基因处的转录双链断裂。
iScience. 2024 Feb 1;27(3):109082. doi: 10.1016/j.isci.2024.109082. eCollection 2024 Mar 15.
5
m1A-Ensem: accurate identification of 1-methyladenosine sites through ensemble models.m1A-Ensem:通过集成模型准确识别1-甲基腺苷位点。
BioData Min. 2024 Feb 15;17(1):4. doi: 10.1186/s13040-023-00353-x.
6
An ensemble-based deep learning model for detection of mutation causing cutaneous melanoma.基于集成的深度学习模型用于检测导致皮肤黑色素瘤的突变。
Sci Rep. 2023 Dec 14;13(1):22251. doi: 10.1038/s41598-023-49075-4.
7
BBB-PEP-prediction: improved computational model for identification of blood-brain barrier peptides using blending position relative composition specific features and ensemble modeling.血脑屏障肽预测:利用混合位置相对组成特异性特征和集成建模改进的血脑屏障肽识别计算模型。
J Cheminform. 2023 Nov 18;15(1):110. doi: 10.1186/s13321-023-00773-1.
8
Application of Machine Learning in Predicting Hepatic Metastasis or Primary Site in Gastroenteropancreatic Neuroendocrine Tumors.机器学习在预测胃肠胰神经内分泌肿瘤肝转移或原发部位中的应用。
Curr Oncol. 2023 Oct 19;30(10):9244-9261. doi: 10.3390/curroncol30100668.
9
Hemolytic-Pred: A machine learning-based predictor for hemolytic proteins using position and composition-based features.溶血预测器:一种基于机器学习的溶血蛋白预测工具,使用基于位置和组成的特征。
Digit Health. 2023 Jul 5;9:20552076231180739. doi: 10.1177/20552076231180739. eCollection 2023 Jan-Dec.
10
Ensemble Learning for Hormone Binding Protein Prediction: A Promising Approach for Early Diagnosis of Thyroid Hormone Disorders in Serum.用于激素结合蛋白预测的集成学习:血清甲状腺激素紊乱早期诊断的一种有前景的方法。
Diagnostics (Basel). 2023 Jun 1;13(11):1940. doi: 10.3390/diagnostics13111940.
Identification of cancer driver genes based on nucleotide context.
基于核苷酸上下文识别癌症驱动基因。
Nat Genet. 2020 Feb;52(2):208-218. doi: 10.1038/s41588-019-0572-y. Epub 2020 Feb 3.
4
Using CHOU'S 5-Steps Rule to Predict O-Linked Serine Glycosylation Sites by Blending Position Relative Features and Statistical Moment.使用 CHOU'S 5 步规则,通过混合位置相对特征和统计矩来预测 O-链接丝氨酸糖基化位点。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Sep-Oct;18(5):2045-2056. doi: 10.1109/TCBB.2020.2968441. Epub 2021 Oct 11.
5
iProtease-PseAAC(2L): A two-layer predictor for identifying proteases and their types using Chou's 5-step-rule and general PseAAC.iProtease-PseAAC(2L):一种使用周氏五步规则和广义氨基酸组合特征来识别蛋白酶及其类型的两层预测器。
Anal Biochem. 2020 Jan 1;588:113477. doi: 10.1016/j.ab.2019.113477. Epub 2019 Oct 22.
6
iHyd-PseAAC (EPSV): Identifying Hydroxylation Sites in Proteins by Extracting Enhanced Position and Sequence Variant Feature Chou's 5-Step Rule and General Pseudo Amino Acid Composition.iHyd-PseAAC(EPSV):通过提取增强的位置和序列变异特征、周氏五步法则和广义伪氨基酸组成来识别蛋白质中的羟基化位点。
Curr Genomics. 2019 Feb;20(2):124-133. doi: 10.2174/1389202920666190325162307.
7
iPhosH-PseAAC: Identify Phosphohistidine Sites in Proteins by Blending Statistical Moments and Position Relative Features According to the Chou's 5-Step Rule and General Pseudo Amino Acid Composition.iPhosH-PseAAC:根据周的五步法则和广义伪氨基酸组成,通过融合统计矩和位置相对特征来识别蛋白质中的磷酸组氨酸位点。
IEEE/ACM Trans Comput Biol Bioinform. 2021 Mar-Apr;18(2):596-610. doi: 10.1109/TCBB.2019.2919025. Epub 2021 Apr 6.
8
Prediction of antioxidant proteins by incorporating statistical moments based features into Chou's PseAAC.基于统计矩特征的 Chou's PseAAC 算法预测抗氧化蛋白
J Theor Biol. 2019 Jul 21;473:1-8. doi: 10.1016/j.jtbi.2019.04.019. Epub 2019 Apr 18.
9
SPrenylC-PseAAC: A sequence-based model developed via Chou's 5-steps rule and general PseAAC for identifying S-prenylation sites in proteins.SPrenylC-PseAAC:一种基于序列的模型,通过 Chou 的 5 步规则和广义 PseAAC 开发,用于识别蛋白质中的 S- prenylation 位点。
J Theor Biol. 2019 May 7;468:1-11. doi: 10.1016/j.jtbi.2019.02.007. Epub 2019 Feb 12.
10
iPhosY-PseAAC: identify phosphotyrosine sites by incorporating sequence statistical moments into PseAAC.iPhosY-PseAAC:通过将序列统计矩纳入伪氨基酸组成来识别磷酸酪氨酸位点。
Mol Biol Rep. 2018 Dec;45(6):2501-2509. doi: 10.1007/s11033-018-4417-z. Epub 2018 Oct 11.