• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于模型的RNA测序数据聚类的数据转换评估

Assessment of data transformations for model-based clustering of RNA-Seq data.

作者信息

Noel-MacDonnell Janelle R, Usset Joseph, Goode Ellen L, Fridley Brooke L

机构信息

Department of Biostatistics, University of Kansas Medical Center, Kansas City, KS, United States of America.

Department of Health Services and Outcomes Research, Children's Mercy Hospital, Kansas City, MO, United States of America.

出版信息

PLoS One. 2018 Feb 27;13(2):e0191758. doi: 10.1371/journal.pone.0191758. eCollection 2018.

DOI:10.1371/journal.pone.0191758
PMID:29485993
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5828440/
Abstract

Quality control, global biases, normalization, and analysis methods for RNA-Seq data are quite different than those for microarray-based studies. The assumption of normality is reasonable for microarray based gene expression data; however, RNA-Seq data tend to follow an over-dispersed Poisson or negative binomial distribution. Little research has been done to assess how data transformations impact Gaussian model-based clustering with respect to clustering performance and accuracy in estimating the correct number of clusters in RNA-Seq data. In this article, we investigate Gaussian model-based clustering performance and accuracy in estimating the correct number of clusters by applying four data transformations (i.e., naïve, logarithmic, Blom, and variance stabilizing transformation) to simulated RNA-Seq data. To do so, an extensive simulation study was carried out in which the scenarios varied in terms of: how genes were selected to be included in the clustering analyses, size of the clusters, and number of clusters. Following the application of the different transformations to the simulated data, Gaussian model-based clustering was carried out. To assess clustering performance for each of the data transformations, the adjusted rand index, clustering error rate, and concordance index were utilized. As expected, our results showed that clustering performance was gained in scenarios where data transformations were applied to make the data appear "more" Gaussian in distribution.

摘要

RNA测序数据的质量控制、全局偏差、归一化和分析方法与基于微阵列的研究有很大不同。对于基于微阵列的基因表达数据,正态性假设是合理的;然而,RNA测序数据往往遵循过度分散的泊松分布或负二项分布。关于数据转换如何影响基于高斯模型的聚类在RNA测序数据中的聚类性能和估计正确聚类数目的准确性方面,几乎没有研究。在本文中,我们通过对模拟的RNA测序数据应用四种数据转换(即朴素转换、对数转换、布洛姆转换和方差稳定转换)来研究基于高斯模型的聚类在估计正确聚类数目的性能和准确性。为此,我们进行了一项广泛的模拟研究,其中场景在以下方面有所不同:如何选择基因纳入聚类分析、聚类大小和聚类数量。在对模拟数据应用不同的转换之后,进行基于高斯模型的聚类。为了评估每种数据转换的聚类性能,我们使用了调整兰德指数、聚类错误率和一致性指数。正如预期的那样,我们的结果表明,在应用数据转换使数据在分布上显得“更”呈高斯分布的场景中,聚类性能得到了提高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21ca/5828440/15db37c1a73f/pone.0191758.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21ca/5828440/957f64641ee4/pone.0191758.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21ca/5828440/907374d58377/pone.0191758.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21ca/5828440/52bda5adb3c2/pone.0191758.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21ca/5828440/15db37c1a73f/pone.0191758.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21ca/5828440/957f64641ee4/pone.0191758.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21ca/5828440/907374d58377/pone.0191758.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21ca/5828440/52bda5adb3c2/pone.0191758.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21ca/5828440/15db37c1a73f/pone.0191758.g004.jpg

相似文献

1
Assessment of data transformations for model-based clustering of RNA-Seq data.基于模型的RNA测序数据聚类的数据转换评估
PLoS One. 2018 Feb 27;13(2):e0191758. doi: 10.1371/journal.pone.0191758. eCollection 2018.
2
Subject level clustering using a negative binomial model for small transcriptomic studies.使用负二项模型进行小转录组研究的主题水平聚类。
BMC Bioinformatics. 2018 Dec 12;19(1):474. doi: 10.1186/s12859-018-2556-9.
3
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
4
A clustering procedure for three-way RNA sequencing data using data transformations and matrix-variate Gaussian mixture models.基于数据变换和矩阵变量高斯混合模型的三方 RNA 测序数据聚类方法。
BMC Bioinformatics. 2024 Mar 1;25(1):90. doi: 10.1186/s12859-024-05717-6.
5
Model-based clustering and data transformations for gene expression data.基于模型的基因表达数据聚类与数据转换
Bioinformatics. 2001 Oct;17(10):977-87. doi: 10.1093/bioinformatics/17.10.977.
6
Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis.基于自动编码器的单细胞 RNA-seq 数据分析聚类集成。
BMC Bioinformatics. 2019 Dec 24;20(Suppl 19):660. doi: 10.1186/s12859-019-3179-5.
7
Transforming RNA-Seq data to improve the performance of prognostic gene signatures.将 RNA-Seq 数据转化以提高预后基因标志物的性能。
PLoS One. 2014 Jan 8;9(1):e85150. doi: 10.1371/journal.pone.0085150. eCollection 2014.
8
Transformation and model choice for RNA-seq co-expression analysis.RNA-seq 共表达分析的转换和模型选择。
Brief Bioinform. 2018 May 1;19(3):425-436. doi: 10.1093/bib/bbw128.
9
A parameter-free deep embedded clustering method for single-cell RNA-seq data.一种无参数深度嵌入聚类方法,用于单细胞 RNA-seq 数据。
Brief Bioinform. 2022 Sep 20;23(5). doi: 10.1093/bib/bbac172.
10
Clustering of RNA-Seq samples: Comparison study on cancer data.RNA-Seq 样本聚类:癌症数据的比较研究。
Methods. 2018 Jan 1;132:42-49. doi: 10.1016/j.ymeth.2017.07.023. Epub 2017 Aug 2.

引用本文的文献

1
Robust identification of regulatory variants (eQTLs) using a differential expression framework developed for RNA-sequencing.使用为RNA测序开发的差异表达框架对调控变异(eQTL)进行稳健识别。
J Anim Sci Biotechnol. 2023 May 5;14(1):62. doi: 10.1186/s40104-023-00861-0.
2
MODEL-BASED FEATURE SELECTION AND CLUSTERING OF RNA-SEQ DATA FOR UNSUPERVISED SUBTYPE DISCOVERY.基于模型的RNA测序数据特征选择与聚类用于无监督亚型发现
Ann Appl Stat. 2021 Mar;15(1):481-508. doi: 10.1214/20-aoas1407. Epub 2021 Mar 18.

本文引用的文献

1
Characterization of fusion genes in common and rare epithelial ovarian cancer histologic subtypes.常见和罕见上皮性卵巢癌组织学亚型中融合基因的特征分析
Oncotarget. 2017 Jul 18;8(29):46891-46899. doi: 10.18632/oncotarget.16781.
2
Pooled Clustering of High-Grade Serous Ovarian Cancer Gene Expression Leads to Novel Consensus Subtypes Associated with Survival and Surgical Outcomes.高级别浆液性卵巢癌基因表达的合并聚类导致与生存和手术结果相关的新型共识亚型。
Clin Cancer Res. 2017 Aug 1;23(15):4077-4085. doi: 10.1158/1078-0432.CCR-17-0246. Epub 2017 Mar 9.
3
Comprehensive Cross-Population Analysis of High-Grade Serous Ovarian Cancer Supports No More Than Three Subtypes.
高级别浆液性卵巢癌的综合跨人群分析支持不超过三种亚型。
G3 (Bethesda). 2016 Dec 7;6(12):4097-4103. doi: 10.1534/g3.116.033514.
4
Transforming RNA-Seq data to improve the performance of prognostic gene signatures.将 RNA-Seq 数据转化以提高预后基因标志物的性能。
PLoS One. 2014 Jan 8;9(1):e85150. doi: 10.1371/journal.pone.0085150. eCollection 2014.
5
Model-based clustering for RNA-seq data.基于模型的 RNA-seq 数据聚类。
Bioinformatics. 2014 Jan 15;30(2):197-205. doi: 10.1093/bioinformatics/btt632. Epub 2013 Nov 4.
6
Prognostically relevant gene signatures of high-grade serous ovarian carcinoma.高级别浆液性卵巢癌的预后相关基因特征。
J Clin Invest. 2013 Jan;123(1):517-25. doi: 10.1172/JCI65833. Epub 2012 Dec 21.
7
RNA-Seq vs dual- and single-channel microarray data: sensitivity analysis for differential expression and clustering.RNA-Seq 与双通道和单通道微阵列数据:差异表达和聚类的敏感性分析。
PLoS One. 2012;7(12):e50986. doi: 10.1371/journal.pone.0050986. Epub 2012 Dec 10.
8
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome.RSEM:有或无参考基因组的 RNA-Seq 数据的准确转录本定量。
BMC Bioinformatics. 2011 Aug 4;12:323. doi: 10.1186/1471-2105-12-323.
9
Differences in tumor type in low-stage versus high-stage ovarian carcinomas.低分期与高分期卵巢癌的肿瘤类型差异。
Int J Gynecol Pathol. 2010 May;29(3):203-11. doi: 10.1097/PGP.0b013e3181c042b6.
10
Ovarian low-grade and high-grade serous carcinoma: pathogenesis, clinicopathologic and molecular biologic features, and diagnostic problems.卵巢低级别和高级别浆液性癌:发病机制、临床病理及分子生物学特征,以及诊断问题
Adv Anat Pathol. 2009 Sep;16(5):267-82. doi: 10.1097/PAP.0b013e3181b4fffa.