Orouji Seyedmehdi, Liu Martin C, Korem Tal, Peters Megan A K
Department of Cognitive Sciences, University of California Irvine, Irvine, CA, USA.
Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, NY, USA.
Sci Adv. 2024 Dec 20;10(51):eadp6040. doi: 10.1126/sciadv.adp6040.
Machine-learning models are key to modern biology, yet models trained on one dataset are often not generalizable to other datasets from different cohorts or laboratories due to both technical and biological differences. Domain adaptation, a type of transfer learning, alleviates this problem by aligning different datasets so that models can be applied across them. However, most state-of-the-art domain adaptation methods were designed for large-scale data such as images, whereas biological datasets are smaller and have more features, and these are also complex and heterogeneous. This Review discusses domain adaptation methods in the context of such biological data to inform biologists and guide future domain adaptation research. We describe the benefits and challenges of domain adaptation in biological research and critically explore some of its objectives, strengths, and weaknesses. We argue for the incorporation of domain adaptation techniques to the computational biologist's toolkit, with further development of customized approaches.
机器学习模型是现代生物学的关键,但由于技术和生物学差异,在一个数据集上训练的模型通常无法推广到来自不同队列或实验室的其他数据集。域适应作为一种迁移学习,通过对齐不同数据集来缓解这一问题,从而使模型能够在这些数据集上应用。然而,大多数最先进的域适应方法是为图像等大规模数据设计的,而生物学数据集较小且具有更多特征,并且这些特征也复杂且异质。本综述在这类生物学数据的背景下讨论域适应方法,以告知生物学家并指导未来的域适应研究。我们描述了生物学研究中域适应的益处和挑战,并批判性地探讨了其一些目标、优势和劣势。我们主张将域适应技术纳入计算生物学家的工具包,并进一步开发定制方法。