• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于机器学习的 MRI 数据调和功效:36 个数据集的多中心研究。

Efficacy of MRI data harmonization in the age of machine learning: a multicenter study across 36 datasets.

机构信息

Department of Statistics, Computer Science and Applications "Giuseppe Parenti", University of Florence, 50134, Florence, Italy.

"Nello Carrara" Institute of Applied Physics (IFAC), National Research Council (CNR), 50019, Sesto Fiorentino, Florence, Italy.

出版信息

Sci Data. 2024 Jan 23;11(1):115. doi: 10.1038/s41597-023-02421-7.

DOI:10.1038/s41597-023-02421-7
PMID:38263181
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10805868/
Abstract

Pooling publicly-available MRI data from multiple sites allows to assemble extensive groups of subjects, increase statistical power, and promote data reuse with machine learning techniques. The harmonization of multicenter data is necessary to reduce the confounding effect associated with non-biological sources of variability in the data. However, when applied to the entire dataset before machine learning, the harmonization leads to data leakage, because information outside the training set may affect model building, and potentially falsely overestimate performance. We propose a 1) measurement of the efficacy of data harmonization; 2) harmonizer transformer, i.e., an implementation of the ComBat harmonization allowing its encapsulation among the preprocessing steps of a machine learning pipeline, avoiding data leakage by design. We tested these tools using brain T-weighted MRI data from 1740 healthy subjects acquired at 36 sites. After harmonization, the site effect was removed or reduced, and we showed the data leakage effect in predicting individual age from MRI data, highlighting that introducing the harmonizer transformer into a machine learning pipeline allows for avoiding data leakage by design.

摘要

从多个站点汇集公开可用的 MRI 数据,可以汇集大量的受试者,增加统计能力,并通过机器学习技术促进数据重用。多中心数据的协调对于减少与数据中非生物学来源的变异性相关的混杂效应是必要的。然而,当将其应用于机器学习之前的整个数据集时,协调会导致数据泄露,因为训练集之外的信息可能会影响模型构建,并可能错误地高估性能。我们提出了 1)协调数据的功效的测量;2)协调器转换器,即 ComBat 协调的实现,允许将其封装在机器学习管道的预处理步骤中,通过设计避免数据泄露。我们使用来自 36 个站点的 1740 名健康受试者的大脑 T 加权 MRI 数据测试了这些工具。协调后,去除或减少了站点效应,我们还展示了从 MRI 数据预测个体年龄时的数据泄露效应,这突出表明,将协调器转换器引入机器学习管道可以通过设计避免数据泄露。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/320492e93875/41597_2023_2421_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/65d5b1e93f66/41597_2023_2421_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/9dce0bfa4023/41597_2023_2421_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/7bec7ba8c38a/41597_2023_2421_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/0c20eb7e0072/41597_2023_2421_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/6117c8f19c29/41597_2023_2421_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/4ecebb031ba1/41597_2023_2421_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/5d0bd62476a2/41597_2023_2421_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/f7cdb2ecc808/41597_2023_2421_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/f7372aeaed8e/41597_2023_2421_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/b6ac7e167006/41597_2023_2421_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/de8bd8c31d62/41597_2023_2421_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/b7c8a628df65/41597_2023_2421_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/320492e93875/41597_2023_2421_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/65d5b1e93f66/41597_2023_2421_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/9dce0bfa4023/41597_2023_2421_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/7bec7ba8c38a/41597_2023_2421_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/0c20eb7e0072/41597_2023_2421_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/6117c8f19c29/41597_2023_2421_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/4ecebb031ba1/41597_2023_2421_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/5d0bd62476a2/41597_2023_2421_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/f7cdb2ecc808/41597_2023_2421_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/f7372aeaed8e/41597_2023_2421_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/b6ac7e167006/41597_2023_2421_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/de8bd8c31d62/41597_2023_2421_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/b7c8a628df65/41597_2023_2421_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d0bb/10805868/320492e93875/41597_2023_2421_Fig13_HTML.jpg

相似文献

1
Efficacy of MRI data harmonization in the age of machine learning: a multicenter study across 36 datasets.基于机器学习的 MRI 数据调和功效:36 个数据集的多中心研究。
Sci Data. 2024 Jan 23;11(1):115. doi: 10.1038/s41597-023-02421-7.
2
Performance comparison of modified ComBat for harmonization of radiomic features for multicenter studies.改良后的 ComBat 用于多中心研究中放射组学特征的调和性能比较。
Sci Rep. 2020 Jun 24;10(1):10248. doi: 10.1038/s41598-020-66110-w.
3
Effect of data harmonization of multicentric dataset in ASD/TD classification.多中心数据集数据整合在自闭症谱系障碍/典型发育分类中的作用。
Brain Inform. 2023 Nov 25;10(1):32. doi: 10.1186/s40708-023-00210-x.
4
Comparison of traveling-subject and ComBat harmonization methods for assessing structural brain characteristics.比较旅行对象和 ComBat 协调方法,以评估结构脑特征。
Hum Brain Mapp. 2021 Nov;42(16):5278-5287. doi: 10.1002/hbm.25615. Epub 2021 Aug 17.
5
A transfer learning approach to facilitate ComBat-based harmonization of multicentre radiomic features in new datasets.一种迁移学习方法,用于促进基于 ComBat 的多中心放射组学特征在新数据集上的协调。
PLoS One. 2021 Jul 1;16(7):e0253653. doi: 10.1371/journal.pone.0253653. eCollection 2021.
6
The impact of harmonization on radiomic features in Parkinson's disease and healthy controls: A multicenter study.标准化对帕金森病和健康对照者影像组学特征的影响:一项多中心研究。
Front Neurosci. 2022 Oct 10;16:1012287. doi: 10.3389/fnins.2022.1012287. eCollection 2022.
7
Ensemble machine learning model trained on a new synthesized dataset generalizes well for stress prediction using wearable devices.在新合成数据集上训练的集成机器学习模型,对于使用可穿戴设备进行压力预测具有良好的泛化能力。
J Biomed Inform. 2023 Dec;148:104556. doi: 10.1016/j.jbi.2023.104556. Epub 2023 Dec 2.
8
ComBat Harmonization for MRI Radiomics: Impact on Nonbinary Tissue Classification by Machine Learning.MRI 放射组学的 Combat 均衡化:机器学习对非二进制组织分类的影响。
Invest Radiol. 2023 Sep 1;58(9):697-701. doi: 10.1097/RLI.0000000000000970.
9
Harmonization of diffusion MRI data sets with adaptive dictionary learning.基于自适应字典学习的弥散磁共振成像数据集的调和。
Hum Brain Mapp. 2020 Nov;41(16):4478-4499. doi: 10.1002/hbm.25117. Epub 2020 Aug 26.
10
A three-dimensional deep learning model for inter-site harmonization of structural MR images of the brain: Extensive validation with a multicenter dataset.一种用于脑结构磁共振图像跨站点协调的三维深度学习模型:基于多中心数据集的广泛验证
Heliyon. 2023 Nov 23;9(12):e22647. doi: 10.1016/j.heliyon.2023.e22647. eCollection 2023 Dec.

引用本文的文献

1
Overcoming Site Variability in Multisite fMRI Studies: an Autoencoder Framework for Enhanced Generalizability of Machine Learning Models.克服多站点功能磁共振成像研究中的位点变异性:一种用于增强机器学习模型通用性的自动编码器框架。
Neuroinformatics. 2025 Sep 2;23(3):46. doi: 10.1007/s12021-025-09746-1.
2
Unlocking the potential of radiomics in identifying fibrosing and inflammatory patterns in interstitial lung disease.挖掘放射组学在识别间质性肺疾病纤维化和炎症模式方面的潜力。
Radiol Med. 2025 Aug 22. doi: 10.1007/s11547-025-02067-y.
3
HeteroMRI: Robust white matter abnormality classification across multi-scanner MRI data.

本文引用的文献

1
ABCD_Harmonizer: An Open-source Tool for Mapping and Controlling for Scanner Induced Variance in the Adolescent Brain Cognitive Development Study.ABCD_Harmonizer:一种用于在青少年大脑认知发展研究中映射和控制扫描仪诱导方差的开源工具。
Neuroinformatics. 2023 Apr;21(2):323-337. doi: 10.1007/s12021-023-09624-8. Epub 2023 Mar 20.
2
Feasibility of radiomic feature harmonization for pooling of [F]FET or [F]GE-180 PET images of gliomas.用于胶质瘤 [F]FET 或 [F]GE-180 PET 图像融合的放射组学特征调和的可行性。
Z Med Phys. 2023 Feb;33(1):91-102. doi: 10.1016/j.zemedi.2022.12.005. Epub 2023 Jan 27.
3
Age-associated sex and asymmetry differentiation in hemispheric and lobar cortical ribbon complexity across adulthood: A UK Biobank imaging study.
异质磁共振成像:跨多台扫描仪磁共振成像数据的稳健白质异常分类
Gigascience. 2025 Jan 6;14. doi: 10.1093/gigascience/giaf092.
4
Current challenges and future directions for brain age prediction in children and adolescents.儿童和青少年脑龄预测的当前挑战与未来方向
Nat Commun. 2025 Aug 20;16(1):7771. doi: 10.1038/s41467-025-63222-7.
5
An evaluation of image-based and statistical techniques for harmonizing brain volume measurements.基于图像和统计技术的脑容量测量一致性评估。
Imaging Neurosci (Camb). 2025 Jul 14;3. doi: 10.1162/IMAG.a.73. eCollection 2025.
6
Superpixel-ComBat modeling: A joint approach for harmonization and characterization of inter-scanner variability in T1-weighted images.超像素ComBat建模:一种用于协调和表征T1加权图像中扫描仪间变异性的联合方法。
Imaging Neurosci (Camb). 2024 Oct 3;2. doi: 10.1162/imag_a_00306. eCollection 2024.
7
Age- and Sex-Specific Cerebral Blood Flow Atlases for Healthy Brain Across the Lifespan.全生命周期健康大脑的年龄和性别特异性脑血流图谱
Sci Data. 2025 Jul 9;12(1):1169. doi: 10.1038/s41597-025-05406-w.
8
Lifespan reference curves for harmonizing multi-site regional brain white matter metrics from diffusion MRI.用于协调来自扩散磁共振成像的多中心区域脑白质指标的寿命参考曲线。
Sci Data. 2025 May 6;12(1):748. doi: 10.1038/s41597-025-05028-2.
9
A critical assessment of artificial intelligence in magnetic resonance imaging of cancer.人工智能在癌症磁共振成像中的批判性评估。
Npj Imaging. 2025;3(1):15. doi: 10.1038/s44303-025-00076-0. Epub 2025 Apr 9.
10
Artificial Intelligence Is Brittle: We Need to Do Better.人工智能是脆弱的:我们需要做得更好。
Radiol Artif Intell. 2025 May;7(3):e250081. doi: 10.1148/ryai.250081.
成年期半球和脑叶皮质带复杂性的年龄相关性别和不对称分化:一项英国生物银行成像研究。
Hum Brain Mapp. 2023 Jan;44(1):49-65. doi: 10.1002/hbm.26076. Epub 2022 Sep 15.
4
Sample size requirement for achieving multisite harmonization using structural brain MRI features.实现结构脑 MRI 特征多站点协调所需的样本量要求。
Neuroimage. 2022 Dec 1;264:119768. doi: 10.1016/j.neuroimage.2022.119768. Epub 2022 Nov 24.
5
The impact of harmonization on radiomic features in Parkinson's disease and healthy controls: A multicenter study.标准化对帕金森病和健康对照者影像组学特征的影响:一项多中心研究。
Front Neurosci. 2022 Oct 10;16:1012287. doi: 10.3389/fnins.2022.1012287. eCollection 2022.
6
Unraveling schizophrenia replicable functional connectivity disruption patterns across sites.解析精神分裂症在各研究点间可复制的功能连接破坏模式。
Hum Brain Mapp. 2023 Jan;44(1):156-169. doi: 10.1002/hbm.26108. Epub 2022 Oct 12.
7
Inflation of test accuracy due to data leakage in deep learning-based classification of OCT images.深度学习分类 OCT 图像中因数据泄露导致的测试精度膨胀。
Sci Data. 2022 Sep 22;9(1):580. doi: 10.1038/s41597-022-01618-6.
8
The alterations of brain functional connectivity networks in major depressive disorder detected by machine learning through multisite rs-fMRI data.机器学习通过多中心 rs-fMRI 数据检测到的重度抑郁症患者脑功能连接网络的改变。
Behav Brain Res. 2022 Oct 28;435:114058. doi: 10.1016/j.bbr.2022.114058. Epub 2022 Aug 20.
9
Sexual dimorphism in the relationship between brain complexity, volume and general intelligence (g): a cross-cohort study.大脑复杂性、体积与一般智力(g)之间关系的性别二态性:一项跨队列研究。
Sci Rep. 2022 Jun 30;12(1):11025. doi: 10.1038/s41598-022-15208-4.
10
Multi-site harmonization of MRI data uncovers machine-learning discrimination capability in barely separable populations: An example from the ABIDE dataset.多站点磁共振成像数据的协调揭示了在几乎无法分离的人群中机器学习的区分能力:来自 ABIDE 数据集的一个例子。
Neuroimage Clin. 2022;35:103082. doi: 10.1016/j.nicl.2022.103082. Epub 2022 Jun 8.