Splunk Inc., San Francisco, California, USA.
Department of Biostatistics, University of Michigan, Ann Arbor, Michigan, USA.
Stat Med. 2022 Mar 30;41(7):1242-1262. doi: 10.1002/sim.9267. Epub 2021 Nov 23.
Jointly analyzing transcriptomic data and the existing biological networks can yield more robust and informative feature selection results, as well as better understanding of the biological mechanisms. Selecting and classifying node features over genome-scale networks has become increasingly important in genomic biology and genomic medicine. Existing methods have some critical drawbacks. The first is they do not allow flexible modeling of different subtypes of selected nodes. The second is they ignore nodes with missing values, very likely to increase bias in estimation. To address these limitations, we propose a general modeling framework for Bayesian node classification (BNC) with missing values. A new prior model is developed for the class indicators incorporating the network structure. For posterior computation, we resort to the Swendsen-Wang algorithm for efficiently updating class indicators. BNC can naturally handle missing values in the Bayesian modeling framework, which improves the node classification accuracy and reduces the bias in estimating gene effects. We demonstrate the advantages of our methods via extensive simulation studies and the analysis of the cutaneous melanoma dataset from The Cancer Genome Atlas.
联合分析转录组数据和现有的生物网络可以产生更稳健和信息丰富的特征选择结果,并更好地理解生物学机制。在基因组生物学和基因组医学中,对全基因组网络上的节点特征进行选择和分类变得越来越重要。现有的方法存在一些关键的缺点。第一个缺点是,它们不允许对所选节点的不同亚型进行灵活建模。第二个缺点是,它们忽略了具有缺失值的节点,这很可能会增加估计的偏差。为了解决这些局限性,我们提出了一种带有缺失值的贝叶斯节点分类(BNC)的通用建模框架。为了整合网络结构,我们为类别指标开发了一个新的先验模型。对于后验计算,我们求助于 Swendsen-Wang 算法来有效地更新类别指标。BNC 可以在贝叶斯建模框架中自然地处理缺失值,从而提高节点分类的准确性,并减少估计基因效应的偏差。我们通过广泛的模拟研究和对来自癌症基因组图谱的皮肤黑色素瘤数据集的分析,展示了我们方法的优势。