• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

过拟合的实操训练。

Hands-on training about overfitting.

机构信息

Faculty of Computer and Information Science, University of Ljubljana, Ljubljana, Slovenia.

Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America.

出版信息

PLoS Comput Biol. 2021 Mar 4;17(3):e1008671. doi: 10.1371/journal.pcbi.1008671. eCollection 2021 Mar.

DOI:10.1371/journal.pcbi.1008671
PMID:33661899
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7932115/
Abstract

Overfitting is one of the critical problems in developing models by machine learning. With machine learning becoming an essential technology in computational biology, we must include training about overfitting in all courses that introduce this technology to students and practitioners. We here propose a hands-on training for overfitting that is suitable for introductory level courses and can be carried out on its own or embedded within any data science course. We use workflow-based design of machine learning pipelines, experimentation-based teaching, and hands-on approach that focuses on concepts rather than underlying mathematics. We here detail the data analysis workflows we use in training and motivate them from the viewpoint of teaching goals. Our proposed approach relies on Orange, an open-source data science toolbox that combines data visualization and machine learning, and that is tailored for education in machine learning and explorative data analysis.

摘要

过拟合是机器学习模型开发中的关键问题之一。随着机器学习成为计算生物学中的一项重要技术,我们必须在向学生和从业者介绍该技术的所有课程中纳入关于过拟合的培训。我们在这里提出了一种适用于入门级课程的过拟合实践培训,可以独立进行,也可以嵌入任何数据科学课程中。我们使用基于工作流程的机器学习管道设计、基于实验的教学和注重概念而不是基础数学的实践方法。我们在这里详细介绍我们在培训中使用的数据分析工作流程,并从教学目标的角度对其进行说明。我们提出的方法依赖于 Orange,这是一个开源的数据科学工具箱,它结合了数据可视化和机器学习,并且专门针对机器学习和探索性数据分析的教育而设计。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/a9c488dbd67c/pcbi.1008671.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/e50248bcd6fe/pcbi.1008671.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/93a57866797a/pcbi.1008671.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/e22e792d5fb6/pcbi.1008671.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/f2b77f230366/pcbi.1008671.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/6224064780e8/pcbi.1008671.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/82e02a41be5e/pcbi.1008671.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/82b19660ee5f/pcbi.1008671.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/00ee48d7dc75/pcbi.1008671.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/6793ad4e821d/pcbi.1008671.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/13f1b3890085/pcbi.1008671.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/023d12a119c8/pcbi.1008671.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/e04e77fbec24/pcbi.1008671.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/8fd05c8ed577/pcbi.1008671.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/a9c488dbd67c/pcbi.1008671.g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/e50248bcd6fe/pcbi.1008671.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/93a57866797a/pcbi.1008671.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/e22e792d5fb6/pcbi.1008671.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/f2b77f230366/pcbi.1008671.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/6224064780e8/pcbi.1008671.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/82e02a41be5e/pcbi.1008671.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/82b19660ee5f/pcbi.1008671.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/00ee48d7dc75/pcbi.1008671.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/6793ad4e821d/pcbi.1008671.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/13f1b3890085/pcbi.1008671.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/023d12a119c8/pcbi.1008671.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/e04e77fbec24/pcbi.1008671.g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/8fd05c8ed577/pcbi.1008671.g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e644/7932115/a9c488dbd67c/pcbi.1008671.g014.jpg

相似文献

1
Hands-on training about overfitting.过拟合的实操训练。
PLoS Comput Biol. 2021 Mar 4;17(3):e1008671. doi: 10.1371/journal.pcbi.1008671. eCollection 2021 Mar.
2
scOrange-a tool for hands-on training of concepts from single-cell data analytics.scOrange——单细胞数据分析概念实操训练的工具。
Bioinformatics. 2019 Jul 15;35(14):i4-i12. doi: 10.1093/bioinformatics/btz348.
3
Ten simple rules for starting (and sustaining) an academic data science initiative.启动(并维持)学术数据科学计划的十条简单规则。
PLoS Comput Biol. 2021 Feb 18;17(2):e1008628. doi: 10.1371/journal.pcbi.1008628. eCollection 2021 Feb.
4
Data-Driven Investment Strategies for Peer-to-Peer Lending: A Case Study for Teaching Data Science.数据驱动的 P2P 借贷投资策略:数据科学教学案例研究。
Big Data. 2018 Sep 1;6(3):191-213. doi: 10.1089/big.2018.0092. Epub 2018 Sep 17.
5
Integration of bioinformatics into an undergraduate biology curriculum and the impact on development of mathematical skills.将生物信息学融入本科生物学课程及其对数学技能发展的影响。
Biochem Mol Biol Educ. 2012 Sep-Oct;40(5):310-9. doi: 10.1002/bmb.20637. Epub 2012 Aug 22.
6
A global perspective on evolving bioinformatics and data science training needs.从全球视角看不断发展的生物信息学和数据科学培训需求。
Brief Bioinform. 2019 Mar 22;20(2):398-404. doi: 10.1093/bib/bbx100.
7
Glycowork: A Python package for glycan data science and machine learning.糖组学工作流:用于聚糖数据科学和机器学习的 Python 包。
Glycobiology. 2021 Nov 18;31(10):1240-1244. doi: 10.1093/glycob/cwab067.
8
Supervised and unsupervised algorithms for bioinformatics and data science.生物信息学和数据科学的监督和无监督算法。
Prog Biophys Mol Biol. 2020 Mar;151:14-22. doi: 10.1016/j.pbiomolbio.2019.11.012. Epub 2019 Dec 6.
9
Structural biology meets data science: does anything change?结构生物学与数据科学相遇:会有什么变化吗?
Curr Opin Struct Biol. 2018 Oct;52:95-102. doi: 10.1016/j.sbi.2018.09.003. Epub 2018 Sep 27.
10
An approachable, flexible and practical machine learning workshop for biologists.一门面向生物学家的、易于理解、灵活且实用的机器学习工作坊。
Bioinformatics. 2022 Jun 24;38(Suppl 1):i10-i18. doi: 10.1093/bioinformatics/btac233.

引用本文的文献

1
Integrated multiomics analysis and machine learning refine molecular subtypes and prognosis for thyroid cancer.整合多组学分析和机器学习优化甲状腺癌的分子亚型及预后
Discov Oncol. 2025 Jun 23;16(1):1186. doi: 10.1007/s12672-025-02918-0.
2
Network-based analyses of multiomics data in biomedicine.生物医药中多组学数据的基于网络的分析。
BioData Min. 2025 May 27;18(1):37. doi: 10.1186/s13040-025-00452-x.
3
Machine Learning Approach and Bioinformatics Analysis Discovered Key Genomic Signatures for Hepatitis B Virus-Associated Hepatocyte Remodeling and Hepatocellular Carcinoma.

本文引用的文献

1
Democratized image analytics by visual programming through integration of deep models and small-scale machine learning.通过将深度学习模型和小规模机器学习集成,实现可视化编程的民主化图像分析。
Nat Commun. 2019 Oct 7;10(1):4551. doi: 10.1038/s41467-019-12397-x.
2
scOrange-a tool for hands-on training of concepts from single-cell data analytics.scOrange——单细胞数据分析概念实操训练的工具。
Bioinformatics. 2019 Jul 15;35(14):i4-i12. doi: 10.1093/bioinformatics/btz348.
3
Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities.
机器学习方法与生物信息学分析发现了乙型肝炎病毒相关肝细胞重塑和肝细胞癌的关键基因组特征。
Cancer Inform. 2025 Apr 16;24:11769351251333847. doi: 10.1177/11769351251333847. eCollection 2025.
4
Applications of Machine Learning in the Diagnosis and Prognosis of Patients with Chiari Malformation Type I: A Scoping Review.机器学习在I型Chiari畸形患者诊断和预后中的应用:一项范围综述
Children (Basel). 2025 Feb 18;12(2):244. doi: 10.3390/children12020244.
5
Development and Application of an In Vitro Drug Screening Assay for Schistosomula Using YOLOv5.基于YOLOv5的血吸虫幼虫体外药物筛选检测方法的开发与应用
Biomedicines. 2024 Dec 19;12(12):2894. doi: 10.3390/biomedicines12122894.
6
Hands-on training about data clustering with orange data mining toolbox.使用橙色数据挖掘工具箱进行数据聚类的实践培训。
PLoS Comput Biol. 2024 Dec 18;20(12):e1012574. doi: 10.1371/journal.pcbi.1012574. eCollection 2024 Dec.
7
Application of radiomics for preoperative prediction of lymph node metastasis in colorectal cancer: a systematic review and meta-analysis.基于放射组学的结直肠癌术前淋巴结转移预测的应用:系统评价和荟萃分析。
Int J Surg. 2024 Jun 1;110(6):3795-3813. doi: 10.1097/JS9.0000000000001239.
8
Investigation of the myopic outcomes of the newer intraocular lens power calculation formulas in Korean patients with long eyes.研究长眼球的韩国患者中新型人工晶状体计算公式的近视结果。
Sci Rep. 2024 May 31;14(1):12558. doi: 10.1038/s41598-024-63334-y.
9
Deep learning in bioinformatics.生物信息学中的深度学习。
Turk J Biol. 2023 Dec 18;47(6):366-382. doi: 10.55730/1300-0152.2671. eCollection 2023.
10
Artificial-Intelligence-Enhanced Analysis of In Vivo Confocal Microscopy in Corneal Diseases: A Review.人工智能增强的角膜疾病体内共聚焦显微镜分析:综述
Diagnostics (Basel). 2024 Mar 26;14(7):694. doi: 10.3390/diagnostics14070694.
用于整合生物学和医学数据的机器学习:原理、实践与机遇
Inf Fusion. 2019 Oct;50:71-91. doi: 10.1016/j.inffus.2018.09.012. Epub 2018 Sep 21.
4
Ten quick tips for machine learning in computational biology.计算生物学中机器学习的十条快速提示。
BioData Min. 2017 Dec 8;10:35. doi: 10.1186/s13040-017-0155-3. eCollection 2017.
5
Machine learning in bioinformatics.生物信息学中的机器学习。
Brief Bioinform. 2006 Mar;7(1):86-112. doi: 10.1093/bib/bbk007.
6
Patterns of resistance and incomplete response to docetaxel by gene expression profiling in breast cancer patients.通过基因表达谱分析乳腺癌患者对多西他赛的耐药模式及不完全反应。
J Clin Oncol. 2005 Feb 20;23(6):1169-77. doi: 10.1200/JCO.2005.03.156.
7
VizRank: finding informative data projections in functional genomics by machine learning.VizRank:通过机器学习在功能基因组学中寻找信息丰富的数据投影。
Bioinformatics. 2005 Feb 1;21(3):413-4. doi: 10.1093/bioinformatics/bti016. Epub 2004 Sep 9.
8
Microarray data mining with visual programming.基于可视化编程的微阵列数据挖掘
Bioinformatics. 2005 Feb 1;21(3):396-8. doi: 10.1093/bioinformatics/bth474. Epub 2004 Aug 12.
9
Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification.使用DNA微阵列数据进行诊断和预后分类时的陷阱。
J Natl Cancer Inst. 2003 Jan 1;95(1):14-8. doi: 10.1093/jnci/95.1.14.
10
Knowledge-based analysis of microarray gene expression data by using support vector machines.利用支持向量机对微阵列基因表达数据进行基于知识的分析。
Proc Natl Acad Sci U S A. 2000 Jan 4;97(1):262-7. doi: 10.1073/pnas.97.1.262.