Suppr超能文献

防止数据集转移导致机器学习生物标志物失效。

Preventing dataset shift from breaking machine-learning biomarkers.

机构信息

McGill University, 845 Sherbrooke St W, Montreal, Quebec H3A 0G4, Canada.

INRIA.

出版信息

Gigascience. 2021 Sep 28;10(9). doi: 10.1093/gigascience/giab055.

Abstract

Machine learning brings the hope of finding new biomarkers extracted from cohorts with rich biomedical measurements. A good biomarker is one that gives reliable detection of the corresponding condition. However, biomarkers are often extracted from a cohort that differs from the target population. Such a mismatch, known as a dataset shift, can undermine the application of the biomarker to new individuals. Dataset shifts are frequent in biomedical research, e.g.,  because of recruitment biases. When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers. This article provides an overview of when and how dataset shifts break machine-learning-extracted biomarkers, as well as detection and correction strategies.

摘要

机器学习带来了从具有丰富生物医学测量数据的队列中提取新生物标志物的希望。一个好的生物标志物是能够可靠检测相应条件的标志物。然而,生物标志物通常是从与目标人群不同的队列中提取的。这种不匹配,称为数据集偏移,可能会破坏生物标志物在新个体中的应用。在生物医学研究中,数据集偏移很常见,例如,由于招募偏差。当发生数据集偏移时,标准的机器学习技术不足以提取和验证生物标志物。本文概述了数据集偏移何时以及如何破坏机器学习提取的生物标志物,以及检测和纠正策略。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bc4b/8478611/c0b24a3a7dd7/giab055fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验