Jeanquartier Fleur, Jean-Quartier Claire, Holzinger Andreas
1Institute of Interactive Systems and Data Science, Graz University of Technology, Graz, Austria.
2Holzinger Group HCI-KDD, Institute for Medical Informatics, Statistics and Documentation, Medical University Graz, Auenbruggerplatz 2/V, Graz, 8036 Austria.
BioData Min. 2019 Jan 15;12:2. doi: 10.1186/s13040-018-0190-8. eCollection 2019.
A plethora of Web resources are available offering information on clinical, pre-clinical, genomic and theoretical aspects of cancer, including not only the comprehensive cancer projects as ICGC and TCGA, but also less-known and more specialized projects on pediatric diseases such as PCGP. However, in case of data on childhood cancer there is very little information openly available. Several web-based resources and tools offer general biomedical data which are not purpose-built, for neither pediatric nor cancer analysis. Additionally, many Web resources on cancer focus on incidence data and statistical social characteristics as well as self-regulating communities.
We summarize those resources which are open and are considered to support scientific fundamental research, while we address our comparison to 11 identified pediatric cancer-specific resources (5 tools, 6 databases). The evaluation consists of 5 use cases on the example of brain tumor research and covers user-defined search scenarios as well as data mining tasks, also examining interactive visual analysis features.
Web resources differ in terms of information quantity and presentation. Pedican lists an abundance of entries with few selection features. PeCan and PedcBioPortal include visual analysis tools while the latter integrates published and new consortia-based data. UCSC Xena Browser offers an in-depth analysis of genomic data. ICGC data portal provides various features for data analysis and an option to submit own data. Its focus lies on adult Pan-Cancer projects. Pediatric Pan-Cancer datasets are being integrated into PeCan and PedcBioPortal. Comparing information on prominent mutations within glioma discloses well-known, unknown, possible, as well as inapplicable biomarkers. This summary further emphasizes the varying data allocation. Tested tools show advantages and disadvantages, depending on the respective use case scenario, providing inhomogeneous data quantity and information specifics.
Web resources on specific pediatric cancers are less abundant and less-known compared to those offering adult cancer research data. Meanwhile, current efforts of ongoing pediatric data collection and Pan-Cancer projects indicate future opportunities for childhood cancer research, that is greatly needed for both fundamental as well as clinical research.
有大量网络资源可提供有关癌症临床、临床前、基因组和理论方面的信息,不仅包括国际癌症基因组联盟(ICGC)和癌症基因组图谱(TCGA)等综合性癌症项目,还包括像小儿癌症基因组计划(PCGP)这样关于儿科疾病的鲜为人知且更具专业性的项目。然而,关于儿童癌症的数据,公开可用的信息非常少。一些基于网络的资源和工具提供的是一般性生物医学数据,并非专门为儿科或癌症分析而构建。此外,许多癌症相关的网络资源侧重于发病率数据、统计社会特征以及自我管理的社区。
我们总结了那些开放且被认为支持科学基础研究的资源,同时将我们的比较对象设定为11个已确定的儿科癌症特定资源(5个工具、6个数据库)。评估以脑肿瘤研究为例,包括5个用例,涵盖用户定义的搜索场景以及数据挖掘任务,还检查了交互式视觉分析功能。
网络资源在信息量和呈现方式上存在差异。Pedican列出了大量条目,但选择功能较少。PeCan和儿科生物信息门户(PedcBioPortal)包括视觉分析工具,而后者整合了已发表的和新的基于联盟的数据。加州大学圣克鲁兹分校(UCSC)Xena浏览器提供基因组数据的深入分析。ICGC数据门户为数据分析提供了各种功能,并提供提交自己数据的选项。其重点在于成人泛癌项目。儿科泛癌数据集正在被整合到PeCan和PedcBioPortal中。比较胶质瘤中显著突变的信息,揭示了已知、未知、可能以及不适用的生物标志物。这一总结进一步强调了数据分配的差异。经过测试的工具显示出优点和缺点,这取决于各自的用例场景,提供的数据量和信息细节并不均匀。
与提供成人癌症研究数据的网络资源相比,特定儿科癌症的网络资源较少且鲜为人知。与此同时,当前正在进行的儿科数据收集和泛癌项目的努力表明了儿童癌症研究未来的机会,这对于基础研究和临床研究都非常必要。