Chen Li-Pang, Tsao Hui-Shan
Department of Statistics, National Chengchi University, Taipei 116, Taiwan (R.O.C.).
Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae731.
In bioinformatics studies, understanding the network structure of gene expression variables is one of the main interests. In the framework of data science, graphical models have been widely used to characterize the dependence structure among multivariate random variables. However, the gene expression data possibly suffer from ultrahigh-dimensionality and measurement error, which make the detection of network structure challenging and difficult. The other important application of gene expression variables is to provide information to classify subjects into various tumors or diseases. In supervised learning, while linear discriminant analysis is a commonly used approach, the conventional implementation is limited in precisely measured variables and computation of their inverse covariance matrix, which is known as the precision matrix. To tackle those challenges and provide a reliable estimation procedure for public use, we develop the R package GUEST, which is known as Graphical models for Ultrahigh-dimensional and Error-prone data by the booSTing algorithm. This R package aims to deal with measurement error effects in high-dimensional variables under various distributions and then applies the boosting algorithm to identify the network structure and estimate the precision matrix. When the precision matrix is estimated, it can be used to construct the linear discriminant function and improve the accuracy of the classification.
The R package is available on https://cran.r-project.org/web/packages/GUEST/index.html.
在生物信息学研究中,理解基因表达变量的网络结构是主要关注点之一。在数据科学框架下,图形模型已被广泛用于刻画多元随机变量之间的依赖结构。然而,基因表达数据可能存在超高维度和测量误差,这使得网络结构的检测具有挑战性且困难。基因表达变量的另一个重要应用是提供信息将受试者分类到各种肿瘤或疾病中。在监督学习中,虽然线性判别分析是常用方法,但传统实现方式在精确测量变量及其逆协方差矩阵(即精度矩阵)的计算方面存在局限性。为应对这些挑战并提供一种可供公众使用的可靠估计程序,我们开发了R包GUEST,即通过boosting算法处理超高维度和易出错数据的图形模型。这个R包旨在处理各种分布下高维变量中的测量误差效应,然后应用boosting算法识别网络结构并估计精度矩阵。当估计出精度矩阵后,可用于构建线性判别函数并提高分类的准确性。
该R包可在https://cran.r-project.org/web/packages/GUEST/index.html获取。