Suppr超能文献

基因表达数据分析中基于理论与数据驱动的综合特征选择

Integrated Theory- and Data-driven Feature Selection in Gene Expression Data Analysis.

作者信息

Raghu Vineet K, Ge Xiaoyu, Chrysanthis Panos K, Benos Panayiotis V

机构信息

Department of Computer Science, University of Pittsburgh.

Department of Computational and Systems Biology, University of Pittsburgh.

出版信息

Proc Int Conf Data Eng. 2017 Apr;2017:1525-1532. doi: 10.1109/ICDE.2017.223. Epub 2017 May 18.

Abstract

The exponential growth of high dimensional biological data has led to a rapid increase in demand for automated approaches for knowledge production. Existing methods rely on two general approaches to address this challenge: 1) the Theory-driven approach, which utilizes prior accumulated knowledge, and 2) the Data-driven approach, which solely utilizes the data to deduce scientific knowledge. Both of these approaches alone suffer from bias toward past/present knowledge, as they fail to incorporate all of the current knowledge that is available to make new discoveries. In this paper, we show how an integrated method can effectively address the high dimensionality of big biological data, which is a major problem for pure data-driven analysis approaches. We realize our approach in a novel two-step analytical workflow that incorporates a new feature selection paradigm as the first step to handling high-throughput gene expression data analysis and that utilizes graphical causal modeling as the second step to handle the automatic extraction of causal relationships. Our results, on real-world clinical datasets from The Cancer Genome Atlas (TCGA), demonstrate that our method is capable of intelligently selecting genes for learning effective causal networks.

摘要

高维生物数据的指数级增长导致对知识生产自动化方法的需求迅速增加。现有方法依靠两种通用方法来应对这一挑战:1)理论驱动方法,该方法利用先前积累的知识;2)数据驱动方法,该方法仅利用数据来推导科学知识。这两种方法单独使用都存在对过去/当前知识的偏见,因为它们未能纳入所有可用于做出新发现的现有知识。在本文中,我们展示了一种集成方法如何有效解决大型生物数据的高维度问题,这是纯数据驱动分析方法的一个主要问题。我们在一种新颖的两步分析工作流程中实现了我们的方法,该流程将一种新的特征选择范式作为处理高通量基因表达数据分析的第一步,并利用图形因果建模作为第二步来处理因果关系的自动提取。我们在来自癌症基因组图谱(TCGA)的真实临床数据集上的结果表明,我们的方法能够智能地选择基因以学习有效的因果网络。

相似文献

6
Multilabel Feature Selection: A Local Causal Structure Learning Approach.多标签特征选择:一种局部因果结构学习方法。
IEEE Trans Neural Netw Learn Syst. 2023 Jun;34(6):3044-3057. doi: 10.1109/TNNLS.2021.3111288. Epub 2023 Jun 1.
7
A Pipeline for Integrated Theory and Data-Driven Modeling of Biomedical Data.生物医学数据的理论与数据驱动建模的集成流水线。
IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):811-822. doi: 10.1109/TCBB.2020.3019237. Epub 2021 Jun 3.
10
Data-Driven and Knowledge-Based Algorithms for Gene Network Reconstruction on High-Dimensional Data.基于数据驱动和知识的高维数据基因网络重建算法。
IEEE/ACM Trans Comput Biol Bioinform. 2022 May-Jun;19(3):1545-1557. doi: 10.1109/TCBB.2020.3034861. Epub 2022 Jun 3.

引用本文的文献

6
A Pipeline for Integrated Theory and Data-Driven Modeling of Biomedical Data.生物医学数据的理论与数据驱动建模的集成流水线。
IEEE/ACM Trans Comput Biol Bioinform. 2021 May-Jun;18(3):811-822. doi: 10.1109/TCBB.2020.3019237. Epub 2021 Jun 3.

本文引用的文献

4
Learning the Structure of Mixed Graphical Models.学习混合图形模型的结构
J Comput Graph Stat. 2015 Jan 1;24(1):230-253. doi: 10.1080/10618600.2014.900500.
8
Genenames.org: the HGNC resources in 2015.Genenames.org:2015年的HGNC资源。
Nucleic Acids Res. 2015 Jan;43(Database issue):D1079-85. doi: 10.1093/nar/gku1071. Epub 2014 Oct 31.
9
TOX3 mutations in breast cancer.乳腺癌中的 TOX3 突变。
PLoS One. 2013 Sep 19;8(9):e74102. doi: 10.1371/journal.pone.0074102. eCollection 2013.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验