Han Sung Won, Chen Gong, Cheon Myun-Seok, Zhong Hua
Division of Biostatistics, Departments of Population Health, New York University, New York, NY, USA, 10016.
Pharmaceutical Sciences, Pharma Early Research and Development, Roche Innovation Center New York, New York, NY, USA.
J Am Stat Assoc. 2016;111(515):1004-1019. doi: 10.1080/01621459.2016.1142880. Epub 2016 Oct 18.
Graphical models are a popular approach to find dependence and conditional independence relationships between gene expressions. Directed acyclic graphs (DAGs) are a special class of directed graphical models, where all the edges are directed edges and contain no directed cycles. The DAGs are well known models for discovering causal relationships between genes in gene regulatory networks. However, estimating DAGs without assuming known ordering is challenging due to high dimensionality, the acyclic constraints, and the presence of equivalence class from observational data. To overcome these challenges, we propose a two-stage adaptive Lasso approach, called NS-DIST, which performs neighborhood selection (NS) in stage 1, and then estimates DAGs by the Discrete Improving Search with Tabu (DIST) algorithm within the selected neighborhood. Simulation studies are presented to demonstrate the effectiveness of the method and its computational efficiency. Two real data examples are used to demonstrate the practical usage of our method for gene regulatory network inference.
图形模型是一种用于发现基因表达之间的依赖关系和条件独立关系的常用方法。有向无环图(DAG)是一类特殊的有向图形模型,其中所有边都是有向边且不包含有向环。DAG是用于发现基因调控网络中基因之间因果关系的著名模型。然而,由于高维度、无环约束以及观测数据中存在等价类,在不假设已知顺序的情况下估计DAG具有挑战性。为了克服这些挑战,我们提出了一种两阶段自适应套索方法,称为NS-DIST,它在第一阶段执行邻域选择(NS),然后在选定邻域内通过带禁忌的离散改进搜索(DIST)算法估计DAG。进行了模拟研究以证明该方法的有效性及其计算效率。使用两个真实数据示例来展示我们的方法在基因调控网络推断中的实际应用。