Department of Statistics, University of California, Berkeley, CA 94720.
Children's Hospital Oakland Research Institute, Oakland, CA 94609.
Proc Natl Acad Sci U S A. 2019 Sep 17;116(38):18943-18950. doi: 10.1073/pnas.1820340116. Epub 2019 Sep 4.
Rapid advances in genomic technologies have led to a wealth of diverse data, from which novel discoveries can be gleaned through the application of robust statistical and computational methods. Here, we describe GeneFishing, a semisupervised computational approach to reconstruct context-specific portraits of biological processes by leveraging gene-gene coexpression information. GeneFishing incorporates multiple high-dimensional statistical ideas, including dimensionality reduction, clustering, subsampling, and results aggregation, to produce robust results. To illustrate the power of our method, we applied it using 21 genes involved in cholesterol metabolism as "bait" to "fish out" (or identify) genes not previously identified as being connected to cholesterol metabolism. Using simulation and real datasets, we found that the results obtained through GeneFishing were more interesting for our study than those provided by related gene prioritization methods. In particular, application of GeneFishing to the GTEx liver RNA sequencing (RNAseq) data not only reidentified many known cholesterol-related genes, but also pointed to glyoxalase I () as a gene implicated in cholesterol metabolism. In a follow-up experiment, we found that knockdown in human hepatoma cell lines increased levels of cellular cholesterol ester, validating a role for in cholesterol metabolism. In addition, we performed pantissue analysis by applying GeneFishing on various tissues and identified many potential tissue-specific cholesterol metabolism-related genes. GeneFishing appears to be a powerful tool for identifying related components of complex biological systems and may be used across a wide range of applications.
基因组技术的快速发展带来了丰富多样的数据,通过应用强大的统计和计算方法,可以从中挖掘出新颖的发现。在这里,我们描述了 GeneFishing,这是一种半监督的计算方法,通过利用基因-基因共表达信息来重建特定于上下文的生物过程图谱。GeneFishing 结合了多种高维统计思想,包括降维、聚类、抽样和结果聚合,以产生稳健的结果。为了说明我们方法的强大,我们应用了 21 个参与胆固醇代谢的基因作为“诱饵”,以“钓出”(或识别)以前未被鉴定与胆固醇代谢相关的基因。通过模拟和真实数据集,我们发现通过 GeneFishing 获得的结果比相关基因优先级方法提供的结果更有趣。特别是,将 GeneFishing 应用于 GTEx 肝脏 RNA 测序 (RNAseq) 数据不仅重新鉴定了许多已知的胆固醇相关基因,还指出了醛糖还原酶 I () 是一个与胆固醇代谢相关的基因。在后续实验中,我们发现人肝癌细胞系中的 敲低增加了细胞内胆固醇酯的水平,验证了 在胆固醇代谢中的作用。此外,我们通过在各种组织上应用 GeneFishing 进行了泛组织分析,鉴定出了许多潜在的组织特异性胆固醇代谢相关基因。GeneFishing 似乎是一种识别复杂生物系统相关成分的强大工具,可广泛应用于各种应用。