Suppr超能文献

GUEST:一个用于处理易出错基因表达数据的图形结构估计和多分类的R包。

GUEST: an R package for handling estimation of graphical structure and multiclassification for error-prone gene expression data.

作者信息

Chen Li-Pang, Tsao Hui-Shan

机构信息

Department of Statistics, National Chengchi University, Taipei 116, Taiwan (R.O.C.).

出版信息

Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae731.

Abstract

SUMMARY

In bioinformatics studies, understanding the network structure of gene expression variables is one of the main interests. In the framework of data science, graphical models have been widely used to characterize the dependence structure among multivariate random variables. However, the gene expression data possibly suffer from ultrahigh-dimensionality and measurement error, which make the detection of network structure challenging and difficult. The other important application of gene expression variables is to provide information to classify subjects into various tumors or diseases. In supervised learning, while linear discriminant analysis is a commonly used approach, the conventional implementation is limited in precisely measured variables and computation of their inverse covariance matrix, which is known as the precision matrix. To tackle those challenges and provide a reliable estimation procedure for public use, we develop the R package GUEST, which is known as Graphical models for Ultrahigh-dimensional and Error-prone data by the booSTing algorithm. This R package aims to deal with measurement error effects in high-dimensional variables under various distributions and then applies the boosting algorithm to identify the network structure and estimate the precision matrix. When the precision matrix is estimated, it can be used to construct the linear discriminant function and improve the accuracy of the classification.

AVAILABILITY AND IMPLEMENTATION

The R package is available on https://cran.r-project.org/web/packages/GUEST/index.html.

摘要

摘要

在生物信息学研究中,理解基因表达变量的网络结构是主要关注点之一。在数据科学框架下,图形模型已被广泛用于刻画多元随机变量之间的依赖结构。然而,基因表达数据可能存在超高维度和测量误差,这使得网络结构的检测具有挑战性且困难。基因表达变量的另一个重要应用是提供信息将受试者分类到各种肿瘤或疾病中。在监督学习中,虽然线性判别分析是常用方法,但传统实现方式在精确测量变量及其逆协方差矩阵(即精度矩阵)的计算方面存在局限性。为应对这些挑战并提供一种可供公众使用的可靠估计程序,我们开发了R包GUEST,即通过boosting算法处理超高维度和易出错数据的图形模型。这个R包旨在处理各种分布下高维变量中的测量误差效应,然后应用boosting算法识别网络结构并估计精度矩阵。当估计出精度矩阵后,可用于构建线性判别函数并提高分类的准确性。

可用性与实现方式

该R包可在https://cran.r-project.org/web/packages/GUEST/index.html获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b9d/11655624/5484f00360a2/btae731f1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验