Chauhan Ekansh, Sharma Amit, Uppin Megha S, Kondamadugu Manasa, Jawahar C V, Vinod P K
Centre for Visual Information Technology, International Institute of Information Technology, Hyderabad, 500032, India.
Department of Pathology, Nizam's Institute Of Medical Sciences, Hyderabad, 500082, India.
Sci Data. 2024 Dec 19;11(1):1403. doi: 10.1038/s41597-024-04225-9.
The effective management of brain tumors relies on precise typing, subtyping, and grading. We present the IPD-Brain Dataset, a crucial resource for the neuropathological community, comprising 547 high-resolution H&E stained slides from 367 patients for the study of glioma subtypes and immunohistochemical biomarkers. Scanned at 40x magnification, this dataset is one of the largest in Asia, specifically focusing on the Indian demographics. It encompasses detailed clinical annotations, including patient age, sex, radiological findings, diagnosis, CNS WHO grade, and IHC biomarker status (IDH1R132H, ATRX and TP53 along with proliferation index, Ki67), providing a rich foundation for research. The dataset is open for public access and is designed for various applications, from machine learning model training to the exploration of regional and ethnic disease variations. Preliminary validations utilizing Multiple Instance Learning for tasks such as glioma subtype classification and IHC biomarker identification underscore its potential to significantly contribute to global collaboration in brain tumor research, enhancing diagnostic precision and understanding of glioma variability across different populations.
脑肿瘤的有效管理依赖于精确的分型、亚分型和分级。我们展示了IPD - Brain数据集,这是神经病理学领域的一项关键资源,包含来自367名患者的547张高分辨率苏木精和伊红(H&E)染色切片,用于研究胶质瘤亚型和免疫组化生物标志物。该数据集以40倍放大率扫描,是亚洲最大的数据集之一,特别关注印度人群。它包含详细的临床注释,包括患者年龄、性别、放射学检查结果、诊断、中枢神经系统世界卫生组织(CNS WHO)分级以及免疫组化生物标志物状态(异柠檬酸脱氢酶1(IDH1)R132H、α - 地中海贫血/智力发育障碍综合征X连锁基因(ATRX)和肿瘤蛋白p53(TP53)以及增殖指数Ki67),为研究提供了丰富的基础。该数据集可供公众访问,并设计用于各种应用,从机器学习模型训练到区域和种族疾病差异的探索。利用多实例学习对胶质瘤亚型分类和免疫组化生物标志物识别等任务进行的初步验证强调了其对脑肿瘤研究全球合作做出重大贡献的潜力,提高诊断精度并增进对不同人群中胶质瘤变异性的理解。