Suppr超能文献

一份关于生物医学数据科学短期课程的教学提案。

A teaching proposal for a short course on biomedical data science.

作者信息

Chicco Davide, Coelho Vasco

机构信息

Dipartimento di Informatica Sistemistica e Comunicazione, Università di Milano-Bicocca, Milan, Italy.

Institute of Health Policy Management and Evaluation, University of Toronto, Toronto, Ontario, Canada.

出版信息

PLoS Comput Biol. 2025 Apr 14;21(4):e1012946. doi: 10.1371/journal.pcbi.1012946. eCollection 2025 Apr.

Abstract

As the availability of big biomedical data advances, there is a growing need of university students trained professionally on analyzing these data and correctly interpreting their results. We propose here a study plan for a master's degree course on biomedical data science, by describing our experience during the last academic year. In our university course, we explained how to find an open biomedical dataset, how to correctly clean it and how to prepare it for a computational statistics or machine learning phase. By doing so, we introduce common health data science terms and explained how to avoid common mistakes in the process. Moreover, we clarified how to perform an exploratory data analysis (EDA) and how to reasonably interpret its results. We also described how to properly execute a supervised or unsupervised machine learning analysis, and now to understand and interpret its outcomes. Eventually, we explained how to validate the findings obtained. We illustrated all these steps in the context of open science principles, by suggesting to the students to use only open source programming languages (R or Python in particular), open biomedical data (if available), and open access scientific articles (if possible). We believe our teaching proposal can be useful and of interest for anyone wanting to start to prepare a course on biomedical data science.

摘要

随着大型生物医学数据的可得性不断提高,对经过专业培训以分析这些数据并正确解读其结果的大学生的需求也在日益增长。在此,我们通过描述过去一学年的经验,提出一个生物医学数据科学硕士学位课程的学习计划。在我们的大学课程中,我们讲解了如何找到一个开放的生物医学数据集,如何正确清理它以及如何为计算统计或机器学习阶段做好准备。通过这样做,我们引入了常见的健康数据科学术语,并解释了如何在这个过程中避免常见错误。此外,我们阐明了如何进行探索性数据分析(EDA)以及如何合理地解读其结果。我们还描述了如何正确执行监督式或无监督式机器学习分析,以及如何理解和解读其结果。最后,我们解释了如何验证所获得的发现。我们在开放科学原则的背景下阐述了所有这些步骤,建议学生仅使用开源编程语言(特别是R或Python)、开放的生物医学数据(如果有)以及开放获取的科学文章(如果可能)。我们相信我们的教学建议对于任何想要开始准备生物医学数据科学课程的人来说都可能是有用且有趣的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d551/11996213/fa496c508b35/pcbi.1012946.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验