• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

计算生物学中机器学习的十条快速提示。

Ten quick tips for machine learning in computational biology.

作者信息

Chicco Davide

机构信息

Princess Margaret Cancer Centre, PMCR Tower 11-401, 101 College Street, Toronto, Ontario, M5G 1L7 Canada.

出版信息

BioData Min. 2017 Dec 8;10:35. doi: 10.1186/s13040-017-0155-3. eCollection 2017.

DOI:10.1186/s13040-017-0155-3
PMID:29234465
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5721660/
Abstract

Machine learning has become a pivotal tool for many projects in computational biology, bioinformatics, and health informatics. Nevertheless, beginners and biomedical researchers often do not have enough experience to run a data mining project effectively, and therefore can follow incorrect practices, that may lead to common mistakes or over-optimistic results. With this review, we present ten quick tips to take advantage of machine learning in any computational biology context, by avoiding some common errors that we observed hundreds of times in multiple bioinformatics projects. We believe our ten suggestions can strongly help any machine learning practitioner to carry on a successful project in computational biology and related sciences.

摘要

机器学习已成为计算生物学、生物信息学和健康信息学中许多项目的关键工具。然而,初学者和生物医学研究人员通常没有足够的经验来有效地开展数据挖掘项目,因此可能会遵循错误的做法,这可能导致常见错误或过于乐观的结果。通过本综述,我们提出十条快速提示,以在任何计算生物学背景下利用机器学习,避免我们在多个生物信息学项目中数百次观察到的一些常见错误。我们相信我们的十条建议能有力地帮助任何机器学习从业者在计算生物学及相关科学领域开展成功的项目。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e91a/5721660/97d642e34a87/13040_2017_155_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e91a/5721660/f18a91f66fe9/13040_2017_155_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e91a/5721660/35612b734f8e/13040_2017_155_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e91a/5721660/97d642e34a87/13040_2017_155_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e91a/5721660/f18a91f66fe9/13040_2017_155_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e91a/5721660/35612b734f8e/13040_2017_155_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e91a/5721660/97d642e34a87/13040_2017_155_Fig3_HTML.jpg

相似文献

1
Ten quick tips for machine learning in computational biology.计算生物学中机器学习的十条快速提示。
BioData Min. 2017 Dec 8;10:35. doi: 10.1186/s13040-017-0155-3. eCollection 2017.
2
Eleven quick tips for data cleaning and feature engineering.数据清洗和特征工程的 11 个快速技巧。
PLoS Comput Biol. 2022 Dec 15;18(12):e1010718. doi: 10.1371/journal.pcbi.1010718. eCollection 2022 Dec.
3
Ten quick tips for bioinformatics analyses using an Apache Spark distributed computing environment.使用 Apache Spark 分布式计算环境进行生物信息学分析的十个快速技巧。
PLoS Comput Biol. 2023 Jul 20;19(7):e1011272. doi: 10.1371/journal.pcbi.1011272. eCollection 2023 Jul.
4
Ten quick tips for avoiding pitfalls in multi-omics data integration analyses.避免组学数据整合分析陷阱的 10 个快速技巧。
PLoS Comput Biol. 2023 Jul 6;19(7):e1011224. doi: 10.1371/journal.pcbi.1011224. eCollection 2023 Jul.
5
Ten quick tips for clinical electroencephalographic (EEG) data acquisition and signal processing.临床脑电图(EEG)数据采集与信号处理的十条快速提示。
PeerJ Comput Sci. 2024 Sep 3;10:e2256. doi: 10.7717/peerj-cs.2256. eCollection 2024.
6
Seven quick tips for gene-focused computational pangenomic analysis.基因聚焦计算泛基因组分析的七个快速提示。
BioData Min. 2024 Sep 3;17(1):28. doi: 10.1186/s13040-024-00380-2.
7
Ten simple rules for providing bioinformatics support within a hospital.在医院内提供生物信息学支持的十条简单规则。
BioData Min. 2023 Feb 23;16(1):6. doi: 10.1186/s13040-023-00326-0.
8
Ten quick tips for computational analysis of medical images.医学图像计算分析的十个快速技巧。
PLoS Comput Biol. 2023 Jan 5;19(1):e1010778. doi: 10.1371/journal.pcbi.1010778. eCollection 2023 Jan.
9
Ten quick tips for ensuring machine learning model validity.确保机器学习模型有效性的十个快速技巧。
PLoS Comput Biol. 2024 Sep 19;20(9):e1012402. doi: 10.1371/journal.pcbi.1012402. eCollection 2024 Sep.
10
Ten quick tips for fuzzy logic modeling of biomedical systems.生物医学系统模糊逻辑建模的十个快速技巧。
PLoS Comput Biol. 2023 Dec 21;19(12):e1011700. doi: 10.1371/journal.pcbi.1011700. eCollection 2023 Dec.

引用本文的文献

1
What's next for computational systems biology?计算系统生物学的下一步是什么?
Front Syst Biol. 2023 Sep 19;3:1250228. doi: 10.3389/fsysb.2023.1250228. eCollection 2023.
2
Deep limit order book forecasting: a microstructural guide.深度限价订单簿预测:微观结构指南。
Quant Finance. 2025 Jul 22:1-31. doi: 10.1080/14697688.2025.2522911.
3
The Use of Selected Machine Learning Methods in Dairy Cattle Farming: A Review.机器学习方法在奶牛养殖中的应用:综述

本文引用的文献

1
I tried a bunch of things: The dangers of unexpected overfitting in classification of brain data.我尝试了很多方法:大脑数据分类中意想不到的过度拟合的危险。
Neurosci Biobehav Rev. 2020 Dec;119:456-467. doi: 10.1016/j.neubiorev.2020.09.036. Epub 2020 Oct 6.
2
Ten Simple Rules to Enable Multi-site Collaborations through Data Sharing.通过数据共享实现多站点合作的十条简单规则。
PLoS Comput Biol. 2017 Jan 19;13(1):e1005278. doi: 10.1371/journal.pcbi.1005278. eCollection 2017 Jan.
3
Top considerations for creating bioinformatics software documentation.
Animals (Basel). 2025 Jul 10;15(14):2033. doi: 10.3390/ani15142033.
4
Predicting fitness in with transcriptional regulatory network-informed interpretable machine learning.利用转录调控网络信息可解释机器学习预测适应性。
Front Tuberc. 2025;3. doi: 10.3389/ftubr.2025.1500899. Epub 2025 Apr 2.
5
Enhancing Powder Bed Fusion-Laser Beam Process Monitoring: Transfer and Classic Learning Techniques for Convolutional Neural Networks.增强粉末床熔融-激光束工艺监测:卷积神经网络的迁移学习和经典学习技术
Materials (Basel). 2025 Jun 26;18(13):3026. doi: 10.3390/ma18133026.
6
Statistical consideration in nephrology research.肾脏病学研究中的统计学考量
Kidney Res Clin Pract. 2025 Jun 10. doi: 10.23876/j.krcp.25.046.
7
Predicting the Gene Expression Score by a Machine Learning Classifier.通过机器学习分类器预测基因表达分数
Life (Basel). 2025 Apr 29;15(5):723. doi: 10.3390/life15050723.
8
Predictive Models Using Machine Learning to Identify Fetal Growth Restriction in Patients With Preeclampsia: Development and Evaluation Study.使用机器学习识别子痫前期患者胎儿生长受限的预测模型:开发与评估研究。
J Med Internet Res. 2025 May 27;27:e70068. doi: 10.2196/70068.
9
Development of a robust FT-IR typing system for , enhancing performance through hierarchical classification.开发一种强大的傅里叶变换红外光谱分型系统,通过分层分类提高性能。
Microbiol Spectr. 2025 Jul;13(7):e0015925. doi: 10.1128/spectrum.00159-25. Epub 2025 May 27.
10
Practical guidelines for validation of supervised machine learning models in accelerometer-based animal behaviour classification.基于加速度计的动物行为分类中监督式机器学习模型验证的实用指南。
J Anim Ecol. 2025 Jul;94(7):1322-1334. doi: 10.1111/1365-2656.70054. Epub 2025 May 19.
生物信息学软件文档编写的首要考虑因素。
Brief Bioinform. 2018 Jul 20;19(4):693-699. doi: 10.1093/bib/bbw134.
4
Ontology-Based Prediction and Prioritization of Gene Functional Annotations.基于本体的基因功能注释预测与优先级排序
IEEE/ACM Trans Comput Biol Bioinform. 2016 Mar-Apr;13(2):248-60. doi: 10.1109/TCBB.2015.2459694.
5
Software Suite for Gene and Protein Annotation Prediction and Similarity Search.用于基因和蛋白质注释预测及相似性搜索的软件套件。
IEEE/ACM Trans Comput Biol Bioinform. 2015 Jul-Aug;12(4):837-43. doi: 10.1109/TCBB.2014.2382127.
6
Ten Simple Rules for a Computational Biologist's Laboratory Notebook.计算生物学家实验室笔记本的十条简单规则。
PLoS Comput Biol. 2015 Sep 10;11(9):e1004385. doi: 10.1371/journal.pcbi.1004385. eCollection 2015 Sep.
7
Computational algorithms to predict Gene Ontology annotations.预测基因本体注释的计算算法。
BMC Bioinformatics. 2015;16 Suppl 6(Suppl 6):S4. doi: 10.1186/1471-2105-16-S6-S4. Epub 2015 Apr 17.
8
Ten simple rules for reducing overoptimistic reporting in methodological computational research.减少方法学计算研究中过度乐观报告的十条简单规则。
PLoS Comput Biol. 2015 Apr 23;11(4):e1004191. doi: 10.1371/journal.pcbi.1004191. eCollection 2015 Apr.
9
The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets.在不平衡数据集上评估二元分类器时,精确率-召回率曲线比ROC曲线更具信息性。
PLoS One. 2015 Mar 4;10(3):e0118432. doi: 10.1371/journal.pone.0118432. eCollection 2015.
10
Disadvantages of using the area under the receiver operating characteristic curve to assess imaging tests: a discussion and proposal for an alternative approach.使用受试者工作特征曲线下面积评估成像检查的缺点:一种替代方法的讨论与建议
Eur Radiol. 2015 Apr;25(4):932-9. doi: 10.1007/s00330-014-3487-0. Epub 2015 Jan 20.