Chintapalli Sai Spandana, Wang Rongguang, Yang Zhijian, Tassopoulou Vasiliki, Yu Fanyang, Bashyam Vishnu, Erus Guray, Chaudhari Pratik, Shou Haochang, Davatzikos Christos
Center for AI and Data Science for Integrated Diagnostics (AI2D), Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
Department of Radiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
ArXiv. 2024 Oct 1:arXiv:2407.12897v2.
Availability of large and diverse medical datasets is often challenged by privacy and data sharing restrictions. For successful application of machine learning techniques for disease diagnosis, prognosis, and precision medicine, large amounts of data are necessary for model building and optimization. To help overcome such limitations in the context of brain MRI, we present GenMIND: a collection of generative models of normative regional volumetric features derived from structural brain imaging. GenMIND models are trained on real brain imaging regional volumetric measures from the iSTAGING consortium, which encompasses over 40,000 MRI scans across 13 studies, incorporating covariates such as age, sex, and race. Leveraging GenMIND, we produce and offer 18,000 synthetic samples spanning the adult lifespan (ages 22-90 years), alongside the model's capability to generate unlimited data. Experimental results indicate that samples generated from GenMIND agree with the distributions obtained from real data. Most importantly, the generated normative data significantly enhance the accuracy of downstream machine learning models on tasks such as disease classification. Data and models are available at: https://huggingface.co/spaces/rongguangw/GenMIND.
大量多样的医学数据集的可用性常常受到隐私和数据共享限制的挑战。为了成功应用机器学习技术进行疾病诊断、预后评估和精准医疗,模型构建和优化需要大量数据。为了在脑磁共振成像(MRI)的背景下帮助克服此类限制,我们提出了GenMIND:一个从结构性脑成像中导出的规范性区域体积特征生成模型的集合。GenMIND模型是根据iSTAGING联盟的真实脑成像区域体积测量数据进行训练的,该联盟涵盖了13项研究中的40000多次MRI扫描,并纳入了年龄、性别和种族等协变量。利用GenMIND,我们生成并提供了18000个涵盖成人寿命(22 - 90岁)的合成样本,以及该模型生成无限数据的能力。实验结果表明,GenMIND生成的样本与从真实数据获得的分布一致。最重要的是,生成的规范性数据显著提高了下游机器学习模型在疾病分类等任务上的准确性。数据和模型可在以下网址获取:https://huggingface.co/spaces/rongguangw/GenMIND 。