贝叶斯方法将公共基因表达库转化为疾病诊断数据库。

Bayesian approach to transforming public gene expression repositories into disease diagnosis databases.

机构信息

Department of Statistics, University of California, Berkeley, CA 94720, USA.

出版信息

Proc Natl Acad Sci U S A. 2010 Apr 13;107(15):6823-8. doi: 10.1073/pnas.0912043107. Epub 2010 Apr 1.

DOI:10.1073/pnas.0912043107

PMID:20360561

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2872390/

Abstract

The rapid accumulation of gene expression data has offered unprecedented opportunities to study human diseases. The National Center for Biotechnology Information Gene Expression Omnibus is currently the largest database that systematically documents the genome-wide molecular basis of diseases. However, thus far, this resource has been far from fully utilized. This paper describes the first study to transform public gene expression repositories into an automated disease diagnosis database. Particularly, we have developed a systematic framework, including a two-stage Bayesian learning approach, to achieve the diagnosis of one or multiple diseases for a query expression profile along a hierarchical disease taxonomy. Our approach, including standardizing cross-platform gene expression data and heterogeneous disease annotations, allows analyzing both sources of information in a unified probabilistic system. A high level of overall diagnostic accuracy was shown by cross validation. It was also demonstrated that the power of our method can increase significantly with the continued growth of public gene expression repositories. Finally, we showed how our disease diagnosis system can be used to characterize complex phenotypes and to construct a disease-drug connectivity map.

摘要

基因表达数据的快速积累为研究人类疾病提供了前所未有的机会。美国国立生物技术信息中心基因表达综合数据库是目前系统记录疾病全基因组分子基础的最大数据库。然而，迄今为止，这一资源还远未得到充分利用。本文描述了将公共基因表达库转化为自动疾病诊断数据库的第一项研究。具体来说，我们开发了一种系统框架，包括两阶段贝叶斯学习方法，以实现沿着层次化疾病分类法对查询表达谱进行一种或多种疾病的诊断。我们的方法包括标准化跨平台基因表达数据和异构疾病注释，允许在统一的概率系统中分析这两种信息源。交叉验证显示了总体诊断准确性达到了较高水平。还证明了随着公共基因表达库的不断增长，我们方法的功效可以显著提高。最后，我们展示了如何使用我们的疾病诊断系统来描述复杂的表型并构建疾病-药物连接图。

相似文献

Bayesian approach to transforming public gene expression repositories into disease diagnosis databases.贝叶斯方法将公共基因表达库转化为疾病诊断数据库。

Proc Natl Acad Sci U S A. 2010 Apr 13;107(15):6823-8. doi: 10.1073/pnas.0912043107. Epub 2010 Apr 1.

Mandatory submission of microarray data to public repositories: how is it working?将微阵列数据强制提交至公共数据库：进展如何？

Physiol Genomics. 2005 Jan 20;20(2):153-6. doi: 10.1152/physiolgenomics.00264.2004.

Computational method for temporal pattern discovery in biomedical genomic databases.生物医学基因组数据库中时间模式发现的计算方法。

Proc IEEE Comput Syst Bioinform Conf. 2005:362-5. doi: 10.1109/csb.2005.25.

GEM-TREND: a web tool for gene expression data mining toward relevant network discovery.GEM-TREND：一个用于挖掘基因表达数据以发现相关网络的网络工具。

BMC Genomics. 2009 Sep 3;10:411. doi: 10.1186/1471-2164-10-411.

MARS: microarray analysis, retrieval, and storage system.MARS：微阵列分析、检索与存储系统。

BMC Bioinformatics. 2005 Apr 18;6:101. doi: 10.1186/1471-2105-6-101.

Bayesian methods in bioinformatics and computational systems biology.生物信息学与计算系统生物学中的贝叶斯方法。

Brief Bioinform. 2007 Mar;8(2):109-16. doi: 10.1093/bib/bbm007. Epub 2007 Apr 12.

Towards precise classification of cancers based on robust gene functional expression profiles.基于稳健的基因功能表达谱实现癌症的精准分类

BMC Bioinformatics. 2005 Mar 17;6:58. doi: 10.1186/1471-2105-6-58.

Quadratic regression analysis for gene discovery and pattern recognition for non-cyclic short time-course microarray experiments.用于非循环短时间进程微阵列实验的基因发现和模式识别的二次回归分析。

BMC Bioinformatics. 2005 Apr 25;6:106. doi: 10.1186/1471-2105-6-106.

Clustering of diverse genomic data using information fusion.利用信息融合对多样的基因组数据进行聚类分析。

Bioinformatics. 2005 Feb 15;21(4):423-9. doi: 10.1093/bioinformatics/bti186. Epub 2004 Dec 17.

An agent- and ontology-based system for integrating public gene, protein, and disease databases.一种基于代理和本体的用于整合公共基因、蛋白质和疾病数据库的系统。

J Biomed Inform. 2007 Feb;40(1):17-29. doi: 10.1016/j.jbi.2006.02.014. Epub 2006 Mar 20.

引用本文的文献

Deep Learning Enables Fast and Accurate Imputation of Gene Expression.深度学习助力基因表达的快速准确插补。

Front Genet. 2021 Apr 13;12:624128. doi: 10.3389/fgene.2021.624128. eCollection 2021.

A Computational Framework for Genome-wide Characterization of the Human Disease Landscape.用于人类疾病全景全基因组特征分析的计算框架。

Cell Syst. 2019 Feb 27;8(2):152-162.e6. doi: 10.1016/j.cels.2018.12.010. Epub 2019 Jan 23.

A review of connectivity map and computational approaches in pharmacogenomics.连通性图谱与药物基因组学计算方法研究综述。

Brief Bioinform. 2018 May 1;19(3):506-523. doi: 10.1093/bib/bbw112.

Cancer diagnostics: The journey from histomorphology to molecular profiling.癌症诊断：从组织形态学到分子图谱分析的历程。

Oncotarget. 2016 Sep 6;7(36):58696-58708. doi: 10.18632/oncotarget.11061.

Omics Profiling in Precision Oncology.精准肿瘤学中的组学分析

Mol Cell Proteomics. 2016 Aug;15(8):2525-36. doi: 10.1074/mcp.O116.059253. Epub 2016 Apr 20.

ONE-CLASS DETECTION OF CELL STATES IN TUMOR SUBTYPES.肿瘤亚型中细胞状态的单类检测

Pac Symp Biocomput. 2016;21:405-16.

Turning publicly available gene expression data into discoveries using gene set context analysis.利用基因集背景分析将公开可用的基因表达数据转化为研究发现。

Nucleic Acids Res. 2016 Jan 8;44(1):e8. doi: 10.1093/nar/gkv873. Epub 2015 Sep 8.

Integrated analysis of numerous heterogeneous gene expression profiles for detecting robust disease-specific biomarkers and proposing drug targets.整合分析众多异质基因表达谱，以检测稳健的疾病特异性生物标志物并提出药物靶点。

Nucleic Acids Res. 2015 Sep 18;43(16):7779-89. doi: 10.1093/nar/gkv810. Epub 2015 Aug 10.

Probe Region Expression Estimation for RNA-Seq Data for Improved Microarray Comparability.用于提高微阵列可比性的RNA测序数据的探针区域表达估计

PLoS One. 2015 May 12;10(5):e0126545. doi: 10.1371/journal.pone.0126545. eCollection 2015.

A novel strategy for gene selection of microarray data based on gene-to-class sensitivity information.一种基于基因对类别敏感性信息的微阵列数据基因选择新策略。

PLoS One. 2014 May 20;9(5):e97530. doi: 10.1371/journal.pone.0097530. eCollection 2014.

本文引用的文献

Disease signatures are robust across tissues and experiments.疾病特征在不同组织和实验中都很稳定。

Mol Syst Biol. 2009;5:307. doi: 10.1038/msb.2009.66. Epub 2009 Sep 15.

Ontology-driven indexing of public datasets for translational bioinformatics.用于转化生物信息学的公共数据集的本体驱动索引编制

BMC Bioinformatics. 2009 Feb 5;10 Suppl 2(Suppl 2):S1. doi: 10.1186/1471-2105-10-S2-S1.

Integrative disease classification based on cross-platform microarray data.基于跨平台微阵列数据的综合疾病分类

BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S25. doi: 10.1186/1471-2105-10-S1-S25.

GEOmetadb: powerful alternative search engine for the Gene Expression Omnibus.GEOmetadb：用于基因表达综合数据库（Gene Expression Omnibus）的强大替代搜索引擎。

Bioinformatics. 2008 Dec 1;24(23):2798-800. doi: 10.1093/bioinformatics/btn520. Epub 2008 Oct 7.

Gene Vector Analysis (Geneva): a unified method to detect differentially-regulated gene sets and similar microarray experiments.基因载体分析（日内瓦）：一种检测差异调节基因集和相似微阵列实验的统一方法。

BMC Bioinformatics. 2008 Aug 22;9:348. doi: 10.1186/1471-2105-9-348.

A critical assessment of Mus musculus gene function prediction using integrated genomic evidence.利用整合基因组证据对小家鼠基因功能预测的批判性评估。

Genome Biol. 2008;9 Suppl 1(Suppl 1):S2. doi: 10.1186/gb-2008-9-s1-s2. Epub 2008 Jun 27.

Topoisomerase inhibitors as anti-arthritic agents.拓扑异构酶抑制剂作为抗关节炎药物。

Inflamm Res. 2008 Mar;57(3):126-34. doi: 10.1007/s00011-007-7163-6.

Exploring the functional landscape of gene expression: directed search of large microarray compendia.探索基因表达的功能全景：对大型微阵列数据集的定向搜索。

Bioinformatics. 2007 Oct 15;23(20):2692-9. doi: 10.1093/bioinformatics/btm403. Epub 2007 Aug 27.

RaPiDS: an algorithm for rapid expression profile database search.RaPiDS：一种用于快速表达谱数据库搜索的算法。

Genome Inform. 2006;17(2):67-76.

Finding disease-related genomic experiments within an international repository: first steps in translational bioinformatics.在国际数据库中查找疾病相关的基因组实验：转化生物信息学的初步步骤。

AMIA Annu Symp Proc. 2006;2006:106-10.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验