基因组和蛋白质组数据图谱的可视化挖掘方法。

Institute for Systems Biology, 401 Terry Ave N, Seattle, WA 98092, USA.

BMC Bioinformatics. 2012 Apr 23;13:58. doi: 10.1186/1471-2105-13-58.

BACKGROUND

As the volume, complexity and diversity of the information that scientists work with on a daily basis continues to rise, so too does the requirement for new analytic software. The analytic software must solve the dichotomy that exists between the need to allow for a high level of scientific reasoning, and the requirement to have an intuitive and easy to use tool which does not require specialist, and often arduous, training to use. Information visualization provides a solution to this problem, as it allows for direct manipulation and interaction with diverse and complex data. The challenge addressing bioinformatics researches is how to apply this knowledge to data sets that are continually growing in a field that is rapidly changing.

RESULTS

This paper discusses an approach to the development of visual mining tools capable of supporting the mining of massive data collections used in systems biology research, and also discusses lessons that have been learned providing tools for both local researchers and the wider community. Example tools were developed which are designed to enable the exploration and analyses of both proteomics and genomics based atlases. These atlases represent large repositories of raw and processed experiment data generated to support the identification of biomarkers through mass spectrometry (the PeptideAtlas) and the genomic characterization of cancer (The Cancer Genome Atlas). Specifically the tools are designed to allow for: the visual mining of thousands of mass spectrometry experiments, to assist in designing informed targeted protein assays; and the interactive analysis of hundreds of genomes, to explore the variations across different cancer genomes and cancer types.

CONCLUSIONS

The mining of massive repositories of biological data requires the development of new tools and techniques. Visual exploration of the large-scale atlas data sets allows researchers to mine data to find new meaning and make sense at scales from single samples to entire populations. Providing linked task specific views that allow a user to start from points of interest (from diseases to single genes) enables targeted exploration of thousands of spectra and genomes. As the composition of the atlases changes, and our understanding of the biology increase, new tasks will continually arise. It is therefore important to provide the means to make the data available in a suitable manner in as short a time as possible. We have done this through the use of common visualization workflows, into which we rapidly deploy visual tools. These visualizations follow common metaphors where possible to assist users in understanding the displayed data. Rapid development of tools and task specific views allows researchers to mine large-scale data almost as quickly as it is produced. Ultimately these visual tools enable new inferences, new analyses and further refinement of the large scale data being provided in atlases such as PeptideAtlas and The Cancer Genome Atlas.

背景

随着科学家日常处理的信息量、复杂性和多样性持续增长，对新分析软件的需求也在增加。该分析软件必须解决存在的二分法问题，即既要允许高水平的科学推理，又要提供直观且易于使用的工具，而无需专门的、通常是艰苦的培训即可使用。信息可视化为此问题提供了一种解决方案，因为它允许直接操作和交互处理各种复杂数据。生物信息学研究面临的挑战是如何将这一知识应用于在快速变化的领域中不断增长的数据集中。

结果

本文讨论了一种开发可视化挖掘工具的方法，该工具能够支持系统生物学研究中大量数据的挖掘，还讨论了为本地研究人员和更广泛的社区提供工具所吸取的经验教训。已经开发了示例工具，旨在支持基于蛋白质组学和基因组学的图谱的探索和分析。这些图谱代表了通过质谱法（肽图谱）和基因组学鉴定癌症生物标志物（癌症基因组图谱）生成的大量原始和处理后实验数据的大型存储库。具体来说，这些工具旨在实现以下功能：可视化挖掘数千个质谱实验，以协助设计明智的靶向蛋白质检测；以及交互式分析数百个基因组，以探索不同癌症基因组和癌症类型之间的变化。

结论

对大量生物数据存储库的挖掘需要开发新的工具和技术。对大规模图谱数据集的可视化探索使研究人员能够从单个样本到整个群体的数据中挖掘数据，以发现新的含义并理解数据。提供链接的特定任务视图，使用户能够从感兴趣的点（从疾病到单个基因）开始，从而可以对数千个光谱和基因组进行有针对性的探索。随着图谱的组成发生变化，以及我们对生物学的理解不断加深，新的任务将不断出现。因此，重要的是提供以尽可能短的时间以合适的方式提供数据的方法。我们已经通过使用常见的可视化工作流程来实现这一点，我们可以在其中快速部署可视化工具。这些可视化效果尽可能遵循常见的隐喻，以帮助用户理解显示的数据。快速开发工具和特定任务视图使研究人员能够几乎与数据生成一样快地挖掘大规模数据。最终，这些可视化工具使研究人员能够对 PeptideAtlas 和癌症基因组图谱等图谱中提供的大型数据集进行新的推断、新的分析和进一步的改进。

相似文献

Methods for visual mining of genomic and proteomic data atlases.

BMC Bioinformatics. 2012 Apr 23;13:58. doi: 10.1186/1471-2105-13-58.

VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data.

BMC Genomics. 2012 Apr 5;13:131. doi: 10.1186/1471-2164-13-131.

PathRings: a web-based tool for exploration of ortholog and expression data in biological pathways.

BMC Bioinformatics. 2015 May 19;16(1):165. doi: 10.1186/s12859-015-0585-1.

Mining PeptideAtlas for biomarkers and therapeutics in human disease.

Curr Pharm Des. 2012;18(6):748-54. doi: 10.2174/138161212799277833.

ProteoLens: a visual analytic tool for multi-scale database-driven biological network data mining.

BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S5. doi: 10.1186/1471-2105-9-S9-S5.

mineXpert: Biological Mass Spectrometry Data Visualization and Mining with Full JavaScript Ability.

J Proteome Res. 2019 May 3;18(5):2254-2259. doi: 10.1021/acs.jproteome.9b00099. Epub 2019 Apr 17.

The PeptideAtlas Project.

Methods Mol Biol. 2010;604:285-96. doi: 10.1007/978-1-60761-444-9_19.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification

The future of Cochrane Neonatal.

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

引用本文的文献

A new pathway for considering trigger factors based on parallel-serial connection models and displaying the relationships of causal factors in low-probability events.

BMC Med Res Methodol. 2023 Apr 15;23(1):93. doi: 10.1186/s12874-023-01919-3.

MS-Helios: a Circos wrapper to visualize multi-omic datasets.

BMC Bioinformatics. 2019 Jan 11;20(1):21. doi: 10.1186/s12859-018-2564-9.

Identification of Copy Number Aberrations in Breast Cancer Subtypes Using Persistence Topology.

Microarrays (Basel). 2015 Aug 12;4(3):339-69. doi: 10.3390/microarrays4030339.

Temporal phenome analysis of a large electronic health record cohort enables identification of hospital-acquired complications.

J Am Med Inform Assoc. 2013 Dec;20(e2):e281-7. doi: 10.1136/amiajnl-2013-001861. Epub 2013 Aug 1.

本文引用的文献

Hive plots--rational approach to visualizing networks.

Brief Bioinform. 2012 Sep;13(5):627-44. doi: 10.1093/bib/bbr069. Epub 2011 Dec 9.

MINT, the molecular interaction database: 2012 update.

Nucleic Acids Res. 2012 Jan;40(Database issue):D857-61. doi: 10.1093/nar/gkr930. Epub 2011 Nov 16.

Systems biology of infectious diseases: a focus on fungal infections.

Immunobiology. 2011 Nov;216(11):1212-27. doi: 10.1016/j.imbio.2011.08.004. Epub 2011 Aug 16.

Correlation of somatic mutation and expression identifies genes important in human glioblastoma progression and survival.

Cancer Res. 2011 Jul 1;71(13):4550-61. doi: 10.1158/0008-5472.CAN-11-0180. Epub 2011 May 9.

ATAQS: A computational software tool for high throughput transition optimization and validation for selected reaction monitoring mass spectrometry.

BMC Bioinformatics. 2011 Mar 18;12:78. doi: 10.1186/1471-2105-12-78.

Managing Chaos: Lessons Learned Developing Software in the Life Sciences.

Comput Sci Eng. 2009 Nov;11(6):20-29. doi: 10.1109/MCSE.2009.198.

A global map of human gene expression.

Nat Biotechnol. 2010 Apr;28(4):322-4. doi: 10.1038/nbt0410-322.

mspecLINE: bridging knowledge of human disease with the proteome.

BMC Med Genomics. 2010 Mar 10;3:7. doi: 10.1186/1755-8794-3-7.

The IntAct molecular interaction database in 2010.

Nucleic Acids Res. 2010 Jan;38(Database issue):D525-31. doi: 10.1093/nar/gkp878. Epub 2009 Oct 22.

Protovis: a graphical toolkit for visualization.

IEEE Trans Vis Comput Graph. 2009 Nov-Dec;15(6):1121-8. doi: 10.1109/TVCG.2009.174.

Suppr 超能文献

核心技术专利：CN118964589B侵权必究

相似文献

Methods for visual mining of genomic and proteomic data atlases.

BMC Bioinformatics. 2012 Apr 23;13:58. doi: 10.1186/1471-2105-13-58.

VESPA: software to facilitate genomic annotation of prokaryotic organisms through integration of proteomic and transcriptomic data.

BMC Genomics. 2012 Apr 5;13:131. doi: 10.1186/1471-2164-13-131.

PathRings: a web-based tool for exploration of ortholog and expression data in biological pathways.

BMC Bioinformatics. 2015 May 19;16(1):165. doi: 10.1186/s12859-015-0585-1.

Mining PeptideAtlas for biomarkers and therapeutics in human disease.

Curr Pharm Des. 2012;18(6):748-54. doi: 10.2174/138161212799277833.

ProteoLens: a visual analytic tool for multi-scale database-driven biological network data mining.

BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S5. doi: 10.1186/1471-2105-9-S9-S5.

mineXpert: Biological Mass Spectrometry Data Visualization and Mining with Full JavaScript Ability.

J Proteome Res. 2019 May 3;18(5):2254-2259. doi: 10.1021/acs.jproteome.9b00099. Epub 2019 Apr 17.

The PeptideAtlas Project.

Methods Mol Biol. 2010;604:285-96. doi: 10.1007/978-1-60761-444-9_19.

Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification

The future of Cochrane Neonatal.

Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.

引用本文的文献

A new pathway for considering trigger factors based on parallel-serial connection models and displaying the relationships of causal factors in low-probability events.

BMC Med Res Methodol. 2023 Apr 15;23(1):93. doi: 10.1186/s12874-023-01919-3.

MS-Helios: a Circos wrapper to visualize multi-omic datasets.

BMC Bioinformatics. 2019 Jan 11;20(1):21. doi: 10.1186/s12859-018-2564-9.

Identification of Copy Number Aberrations in Breast Cancer Subtypes Using Persistence Topology.

Microarrays (Basel). 2015 Aug 12;4(3):339-69. doi: 10.3390/microarrays4030339.

Temporal phenome analysis of a large electronic health record cohort enables identification of hospital-acquired complications.

J Am Med Inform Assoc. 2013 Dec;20(e2):e281-7. doi: 10.1136/amiajnl-2013-001861. Epub 2013 Aug 1.

本文引用的文献

Hive plots--rational approach to visualizing networks.

Brief Bioinform. 2012 Sep;13(5):627-44. doi: 10.1093/bib/bbr069. Epub 2011 Dec 9.

MINT, the molecular interaction database: 2012 update.

Nucleic Acids Res. 2012 Jan;40(Database issue):D857-61. doi: 10.1093/nar/gkr930. Epub 2011 Nov 16.

Systems biology of infectious diseases: a focus on fungal infections.

Immunobiology. 2011 Nov;216(11):1212-27. doi: 10.1016/j.imbio.2011.08.004. Epub 2011 Aug 16.

Correlation of somatic mutation and expression identifies genes important in human glioblastoma progression and survival.

Cancer Res. 2011 Jul 1;71(13):4550-61. doi: 10.1158/0008-5472.CAN-11-0180. Epub 2011 May 9.

ATAQS: A computational software tool for high throughput transition optimization and validation for selected reaction monitoring mass spectrometry.

BMC Bioinformatics. 2011 Mar 18;12:78. doi: 10.1186/1471-2105-12-78.

Managing Chaos: Lessons Learned Developing Software in the Life Sciences.

Comput Sci Eng. 2009 Nov;11(6):20-29. doi: 10.1109/MCSE.2009.198.

A global map of human gene expression.

Nat Biotechnol. 2010 Apr;28(4):322-4. doi: 10.1038/nbt0410-322.

mspecLINE: bridging knowledge of human disease with the proteome.

BMC Med Genomics. 2010 Mar 10;3:7. doi: 10.1186/1755-8794-3-7.

The IntAct molecular interaction database in 2010.

Nucleic Acids Res. 2010 Jan;38(Database issue):D525-31. doi: 10.1093/nar/gkp878. Epub 2009 Oct 22.

Protovis: a graphical toolkit for visualization.

IEEE Trans Vis Comput Graph. 2009 Nov-Dec;15(6):1121-8. doi: 10.1109/TVCG.2009.174.

Suppr
超能文献

Methods for visual mining of genomic and proteomic data atlases.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献

Suppr超能文献

基因组和蛋白质组数据图谱的可视化挖掘方法。

Methods for visual mining of genomic and proteomic data atlases.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献

Suppr
超能文献