Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, Georgia, USA.
J Am Med Inform Assoc. 2013 Nov-Dec;20(6):1091-8. doi: 10.1136/amiajnl-2012-001469. Epub 2013 Jul 25.
The integration and visualization of multimodal datasets is a common challenge in biomedical informatics. Several recent studies of The Cancer Genome Atlas (TCGA) data have illustrated important relationships between morphology observed in whole-slide images, outcome, and genetic events. The pairing of genomics and rich clinical descriptions with whole-slide imaging provided by TCGA presents a unique opportunity to perform these correlative studies. However, better tools are needed to integrate the vast and disparate data types.
To build an integrated web-based platform supporting whole-slide pathology image visualization and data integration.
All images and genomic data were directly obtained from the TCGA and National Cancer Institute (NCI) websites.
The Cancer Digital Slide Archive (CDSA) produced is accessible to the public (http://cancer.digitalslidearchive.net) and currently hosts more than 20,000 whole-slide images from 22 cancer types.
The capabilities of CDSA are demonstrated using TCGA datasets to integrate pathology imaging with associated clinical, genomic and MRI measurements in glioblastomas and can be extended to other tumor types. CDSA also allows URL-based sharing of whole-slide images, and has preliminary support for directly sharing regions of interest and other annotations. Images can also be selected on the basis of other metadata, such as mutational profile, patient age, and other relevant characteristics.
With the increasing availability of whole-slide scanners, analysis of digitized pathology images will become increasingly important in linking morphologic observations with genomic and clinical endpoints.
多模态数据集的整合和可视化是生物医学信息学中的一个共同挑战。几项最近对癌症基因组图谱(TCGA)数据的研究表明,在全切片图像、结果和遗传事件之间存在重要关系。TCGA 提供的基因组学和丰富的临床描述与全切片成像的配对为进行这些相关研究提供了独特的机会。然而,需要更好的工具来整合大量不同类型的数据。
构建一个支持全切片病理图像可视化和数据集成的集成式网络平台。
所有图像和基因组数据均直接从 TCGA 和美国国家癌症研究所(NCI)网站获取。
生成的癌症数字切片档案(CDSA)可供公众访问(http://cancer.digitalslidearchive.net),目前托管来自 22 种癌症类型的超过 20,000 张全切片图像。
使用 TCGA 数据集演示了 CDSA 的功能,以将病理成像与胶质瘤相关的临床、基因组和 MRI 测量值进行整合,并可扩展到其他肿瘤类型。CDSA 还允许基于 URL 共享全切片图像,并初步支持直接共享感兴趣区域和其他注释。还可以根据其他元数据(例如突变特征、患者年龄和其他相关特征)选择图像。
随着全切片扫描仪的日益普及,对数字化病理图像的分析将在将形态学观察与基因组和临床终点联系起来方面变得越来越重要。