Fan Fan, Martinez Georgia, DeSilvio Thomas, Shin John, Chen Yijiang, Jacobs Jackson, Wang Bangchen, Ozeki Takaya, Lafarge Maxime W, Koelzer Viktor H, Barisoni Laura, Madabhushi Anant, Viswanath Satish E, Janowczyk Andrew
Emory University and Georgia Institute of Technology, Department of Biomedical Engineering, Atlanta, GA USA.
Case Western Reserve University, Department of Biomedical Engineering, Cleveland, OH USA.
Npj Imaging. 2024;2(1):15. doi: 10.1038/s44303-024-00018-2. Epub 2024 Jul 1.
Batch effects (BEs) refer to systematic technical differences in data collection unrelated to biological variations whose noise is shown to negatively impact machine learning (ML) model generalizability. Here we release CohortFinder (http://cohortfinder.com), an open-source tool aimed at mitigating BEs via data-driven cohort partitioning. We demonstrate CohortFinder improves ML model performance in downstream digital pathology and medical image processing tasks. CohortFinder is freely available for download at cohortfinder.com.
批次效应(BEs)是指数据收集过程中与生物学变异无关的系统性技术差异,其噪声已被证明会对机器学习(ML)模型的泛化能力产生负面影响。在此,我们发布了CohortFinder(http://cohortfinder.com),这是一个旨在通过数据驱动的队列划分来减轻批次效应的开源工具。我们证明,CohortFinder可提高下游数字病理学和医学图像处理任务中ML模型的性能。可在cohortfinder.com上免费下载CohortFinder。