防止数据集转移导致机器学习生物标志物失效。

Preventing dataset shift from breaking machine-learning biomarkers.

机构信息

McGill University, 845 Sherbrooke St W, Montreal, Quebec H3A 0G4, Canada.

INRIA.

出版信息

Gigascience. 2021 Sep 28;10(9). doi: 10.1093/gigascience/giab055.

DOI:10.1093/gigascience/giab055

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8478611/

Abstract

Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g., because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts break machine-learning-extracted biomarkers, as well as detection and correction strategies.

摘要

机器学习带来了从具有丰富生物医学测量数据的队列中提取新生物标志物的希望。一个好的生物标志物是能够可靠检测相应条件的标志物。然而，生物标志物通常是从与目标人群不同的队列中提取的。这种不匹配，称为数据集偏移，可能会破坏生物标志物在新个体中的应用。在生物医学研究中，数据集偏移很常见，例如，由于招募偏差。当发生数据集偏移时，标准的机器学习技术不足以提取和验证生物标志物。本文概述了数据集偏移何时以及如何破坏机器学习提取的生物标志物，以及检测和纠正策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc4b/8478611/c0b24a3a7dd7/giab055fig1.jpg

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验