Suppr超能文献

基因组和蛋白质组数据图谱的可视化挖掘方法。

Methods for visual mining of genomic and proteomic data atlases.

机构信息

Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98092, USA.

出版信息

BMC Bioinformatics. 2012 Apr 23;13:58. doi: 10.1186/1471-2105-13-58.

Abstract

BACKGROUND

As the volume, complexity and diversity of the information that scientists work with on a daily basis continues to rise, so too does the requirement for new analytic software. The analytic software must solve the dichotomy that exists between the need to allow for a high level of scientific reasoning, and the requirement to have an intuitive and easy to use tool which does not require specialist, and often arduous, training to use. Information visualization provides a solution to this problem, as it allows for direct manipulation and interaction with diverse and complex data. The challenge addressing bioinformatics researches is how to apply this knowledge to data sets that are continually growing in a field that is rapidly changing.

RESULTS

This paper discusses an approach to the development of visual mining tools capable of supporting the mining of massive data collections used in systems biology research, and also discusses lessons that have been learned providing tools for both local researchers and the wider community. Example tools were developed which are designed to enable the exploration and analyses of both proteomics and genomics based atlases. These atlases represent large repositories of raw and processed experiment data generated to support the identification of biomarkers through mass spectrometry (the PeptideAtlas) and the genomic characterization of cancer (The Cancer Genome Atlas). Specifically the tools are designed to allow for: the visual mining of thousands of mass spectrometry experiments, to assist in designing informed targeted protein assays; and the interactive analysis of hundreds of genomes, to explore the variations across different cancer genomes and cancer types.

CONCLUSIONS

The mining of massive repositories of biological data requires the development of new tools and techniques. Visual exploration of the large-scale atlas data sets allows researchers to mine data to find new meaning and make sense at scales from single samples to entire populations. Providing linked task specific views that allow a user to start from points of interest (from diseases to single genes) enables targeted exploration of thousands of spectra and genomes. As the composition of the atlases changes, and our understanding of the biology increase, new tasks will continually arise. It is therefore important to provide the means to make the data available in a suitable manner in as short a time as possible. We have done this through the use of common visualization workflows, into which we rapidly deploy visual tools. These visualizations follow common metaphors where possible to assist users in understanding the displayed data. Rapid development of tools and task specific views allows researchers to mine large-scale data almost as quickly as it is produced. Ultimately these visual tools enable new inferences, new analyses and further refinement of the large scale data being provided in atlases such as PeptideAtlas and The Cancer Genome Atlas.

摘要

背景

随着科学家日常处理的信息量、复杂性和多样性持续增长,对新分析软件的需求也在增加。该分析软件必须解决存在的二分法问题,即既要允许高水平的科学推理,又要提供直观且易于使用的工具,而无需专门的、通常是艰苦的培训即可使用。信息可视化为此问题提供了一种解决方案,因为它允许直接操作和交互处理各种复杂数据。生物信息学研究面临的挑战是如何将这一知识应用于在快速变化的领域中不断增长的数据集中。

结果

本文讨论了一种开发可视化挖掘工具的方法,该工具能够支持系统生物学研究中大量数据的挖掘,还讨论了为本地研究人员和更广泛的社区提供工具所吸取的经验教训。已经开发了示例工具,旨在支持基于蛋白质组学和基因组学的图谱的探索和分析。这些图谱代表了通过质谱法(肽图谱)和基因组学鉴定癌症生物标志物(癌症基因组图谱)生成的大量原始和处理后实验数据的大型存储库。具体来说,这些工具旨在实现以下功能:可视化挖掘数千个质谱实验,以协助设计明智的靶向蛋白质检测;以及交互式分析数百个基因组,以探索不同癌症基因组和癌症类型之间的变化。

结论

对大量生物数据存储库的挖掘需要开发新的工具和技术。对大规模图谱数据集的可视化探索使研究人员能够从单个样本到整个群体的数据中挖掘数据,以发现新的含义并理解数据。提供链接的特定任务视图,使用户能够从感兴趣的点(从疾病到单个基因)开始,从而可以对数千个光谱和基因组进行有针对性的探索。随着图谱的组成发生变化,以及我们对生物学的理解不断加深,新的任务将不断出现。因此,重要的是提供以尽可能短的时间以合适的方式提供数据的方法。我们已经通过使用常见的可视化工作流程来实现这一点,我们可以在其中快速部署可视化工具。这些可视化效果尽可能遵循常见的隐喻,以帮助用户理解显示的数据。快速开发工具和特定任务视图使研究人员能够几乎与数据生成一样快地挖掘大规模数据。最终,这些可视化工具使研究人员能够对 PeptideAtlas 和癌症基因组图谱等图谱中提供的大型数据集进行新的推断、新的分析和进一步的改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5e6b/3352268/6fc90b79b891/1471-2105-13-58-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验