GUEST：一个用于处理易出错基因表达数据的图形结构估计和多分类的R包。

GUEST: an R package for handling estimation of graphical structure and multiclassification for error-prone gene expression data.

作者信息

Chen Li-Pang, Tsao Hui-Shan

机构信息

Department of Statistics, National Chengchi University, Taipei 116, Taiwan (R.O.C.).

出版信息

Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae731.

DOI:10.1093/bioinformatics/btae731

PMID:39660781

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11655624/

Abstract

SUMMARY

In bioinformatics studies, understanding the network structure of gene expression variables is one of the main interests. In the framework of data science, graphical models have been widely used to characterize the dependence structure among multivariate random variables. However, the gene expression data possibly suffer from ultrahigh-dimensionality and measurement error, which make the detection of network structure challenging and difficult. The other important application of gene expression variables is to provide information to classify subjects into various tumors or diseases. In supervised learning, while linear discriminant analysis is a commonly used approach, the conventional implementation is limited in precisely measured variables and computation of their inverse covariance matrix, which is known as the precision matrix. To tackle those challenges and provide a reliable estimation procedure for public use, we develop the R package GUEST, which is known as Graphical models for Ultrahigh-dimensional and Error-prone data by the booSTing algorithm. This R package aims to deal with measurement error effects in high-dimensional variables under various distributions and then applies the boosting algorithm to identify the network structure and estimate the precision matrix. When the precision matrix is estimated, it can be used to construct the linear discriminant function and improve the accuracy of the classification.

AVAILABILITY AND IMPLEMENTATION

The R package is available on https://cran.r-project.org/web/packages/GUEST/index.html.

摘要

在生物信息学研究中，理解基因表达变量的网络结构是主要关注点之一。在数据科学框架下，图形模型已被广泛用于刻画多元随机变量之间的依赖结构。然而，基因表达数据可能存在超高维度和测量误差，这使得网络结构的检测具有挑战性且困难。基因表达变量的另一个重要应用是提供信息将受试者分类到各种肿瘤或疾病中。在监督学习中，虽然线性判别分析是常用方法，但传统实现方式在精确测量变量及其逆协方差矩阵（即精度矩阵）的计算方面存在局限性。为应对这些挑战并提供一种可供公众使用的可靠估计程序，我们开发了R包GUEST，即通过boosting算法处理超高维度和易出错数据的图形模型。这个R包旨在处理各种分布下高维变量中的测量误差效应，然后应用boosting算法识别网络结构并估计精度矩阵。当估计出精度矩阵后，可用于构建线性判别函数并提高分类的准确性。

可用性与实现方式

该R包可在https://cran.r-project.org/web/packages/GUEST/index.html获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0b9d/11655624/5484f00360a2/btae731f1.jpg

相似文献

GUEST: an R package for handling estimation of graphical structure and multiclassification for error-prone gene expression data.GUEST：一个用于处理易出错基因表达数据的图形结构估计和多分类的R包。

Bioinformatics. 2024 Nov 28;40(12). doi: 10.1093/bioinformatics/btae731.

SILGGM: An extensive R package for efficient statistical inference in large-scale gene networks.SILGGM：一个用于大规模基因网络中高效统计推断的扩展 R 包。

PLoS Comput Biol. 2018 Aug 13;14(8):e1006369. doi: 10.1371/journal.pcbi.1006369. eCollection 2018 Aug.

AFFECT: an R package for accelerated functional failure time model with error-contaminated survival times and applications to gene expression data.AFFECT：一个用于加速带有误差污染生存时间的功能失效时间模型的 R 包，以及在基因表达数据中的应用。

BMC Bioinformatics. 2024 Aug 13;25(1):265. doi: 10.1186/s12859-024-05831-5.

Bayesian network feature finder (BANFF): an R package for gene network feature selection.贝叶斯网络特征查找器（BANFF）：一个用于基因网络特征选择的R包。

Bioinformatics. 2016 Dec 1;32(23):3685-3687. doi: 10.1093/bioinformatics/btw522. Epub 2016 Aug 8.

Deviance residuals-based sparse PLS and sparse kernel PLS regression for censored data.基于偏差残差的稀疏偏最小二乘和稀疏核偏最小二乘回归用于删失数据。

Bioinformatics. 2015 Feb 1;31(3):397-404. doi: 10.1093/bioinformatics/btu660. Epub 2014 Oct 6.

BOOME: A Python package for handling misclassified disease and ultrahigh-dimensional error-prone gene expression data.BOOME：一个用于处理误分类疾病和超高维易出错基因表达数据的 Python 包。

PLoS One. 2022 Oct 27;17(10):e0276664. doi: 10.1371/journal.pone.0276664. eCollection 2022.

wTO: an R package for computing weighted topological overlap and a consensus network with integrated visualization tool.wTO：一个用于计算加权拓扑重叠和共识网络的 R 包，具有集成的可视化工具。

BMC Bioinformatics. 2018 Oct 24;19(1):392. doi: 10.1186/s12859-018-2351-7.

Model-based boosting in high dimensions.高维空间中基于模型的提升算法

Bioinformatics. 2006 Nov 15;22(22):2828-9. doi: 10.1093/bioinformatics/btl462. Epub 2006 Aug 29.

PCIT: an R package for weighted gene co-expression networks based on partial correlation and information theory approaches.PCIT：一个基于偏相关和信息理论方法的加权基因共表达网络的 R 包。

Bioinformatics. 2010 Feb 1;26(3):411-3. doi: 10.1093/bioinformatics/btp674. Epub 2009 Dec 9.

Spathial: an R package for the evolutionary analysis of biological data.Spathial：用于生物数据进化分析的 R 包。

Bioinformatics. 2020 Nov 1;36(17):4664-4667. doi: 10.1093/bioinformatics/btaa273.

本文引用的文献

A note of feature screening via a rank-based coefficient of correlation.基于秩相关系数的特征筛选注释。

Biom J. 2023 Aug;65(6):e2100373. doi: 10.1002/bimj.202100373. Epub 2023 May 9.

Ultrahigh Dimensional Precision Matrix Estimation via Refitted Cross Validation.通过重新拟合交叉验证进行超高维精度矩阵估计

J Econom. 2020 Mar;215(1):118-130. doi: 10.1016/j.jeconom.2019.08.004. Epub 2019 Sep 25.

XMRF: an R package to fit Markov Networks to high-throughput genetics data.XMRF：一个用于将马尔可夫网络应用于高通量遗传学数据的R软件包。

BMC Syst Biol. 2016 Aug 26;10 Suppl 3(Suppl 3):69. doi: 10.1186/s12918-016-0313-0.

Joint Estimation of Multiple Graphical Models from High Dimensional Time Series.基于高维时间序列的多个图形模型联合估计

J R Stat Soc Series B Stat Methodol. 2016 Mar 1;78(2):487-504. doi: 10.1111/rssb.12123. Epub 2015 Jul 6.

Inference of gene regulatory networks with sparse structural equation models exploiting genetic perturbations.利用基因扰动推断具有稀疏结构方程模型的基因调控网络。

PLoS Comput Biol. 2013;9(5):e1003068. doi: 10.1371/journal.pcbi.1003068. Epub 2013 May 23.

Penalized classification using Fisher's linear discriminant.使用费舍尔线性判别法的惩罚分类

J R Stat Soc Series B Stat Methodol. 2011 Nov;73(5):753-772. doi: 10.1111/j.1467-9868.2011.00783.x.

Sparse Regulatory Networks.稀疏调控网络

Ann Appl Stat. 2010 Jun;4(2):663-686. doi: 10.1214/10-aoas350.

Survival of the sparsest: robust gene networks are parsimonious.最精简者的生存：稳健的基因网络是简约的。

Mol Syst Biol. 2008;4:213. doi: 10.1038/msb.2008.52. Epub 2008 Aug 5.

Sparse inverse covariance estimation with the graphical lasso.使用图模型选择法进行稀疏逆协方差估计。

Biostatistics. 2008 Jul;9(3):432-41. doi: 10.1093/biostatistics/kxm045. Epub 2007 Dec 12.

Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.利用基因表达谱和人工神经网络进行癌症的分类与诊断预测。

Nat Med. 2001 Jun;7(6):673-9. doi: 10.1038/89044.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

GUEST：一个用于处理易出错基因表达数据的图形结构估计和多分类的R包。

GUEST: an R package for handling estimation of graphical structure and multiclassification for error-prone gene expression data.

作者信息

机构信息

出版信息

SUMMARY

AVAILABILITY AND IMPLEMENTATION

摘要

可用性与实现方式

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献