Suppr超能文献

纳入肿瘤纯度信息的基因表达数据分析

The Analysis of Gene Expression Data Incorporating Tumor Purity Information.

作者信息

Ahn Seungjun, Grimes Tyler, Datta Somnath

机构信息

Department of Biostatistics, University of Florida, Gainesville, FL, United States.

出版信息

Front Genet. 2021 Aug 23;12:642759. doi: 10.3389/fgene.2021.642759. eCollection 2021.

Abstract

The tumor microenvironment is composed of tumor cells, stroma cells, immune cells, blood vessels, and other associated non-cancerous cells. Gene expression measurements on tumor samples are an average over cells in the microenvironment. However, research questions often seek answers about tumor cells rather than the surrounding non-tumor tissue. Previous studies have suggested that the tumor purity (TP)-the proportion of tumor cells in a solid tumor sample-has a confounding effect on differential expression (DE) analysis of high vs. low survival groups. We investigate three ways incorporating the TP information in the two statistical methods used for analyzing gene expression data, namely, differential network (DN) analysis and DE analysis. Analysis 1 ignores the TP information completely, Analysis 2 uses a truncated sample by removing the low TP samples, and Analysis 3 uses TP as a covariate in the underlying statistical models. We use three gene expression data sets related to three different cancers from the Cancer Genome Atlas (TCGA) for our investigation. The networks from Analysis 2 have greater amount of differential connectivity in the two networks than that from Analysis 1 in all three cancer datasets. Similarly, Analysis 1 identified more differentially expressed genes than Analysis 2. Results of DN and DE analyses using Analysis 3 were mostly consistent with those of Analysis 1 across three cancers. However, Analysis 3 identified additional cancer-related genes in both DN and DE analyses. Our findings suggest that using TP as a covariate in a linear model is appropriate for DE analysis, but a more robust model is needed for DN analysis. However, because true DN or DE patterns are not known for the empirical datasets, simulated datasets can be used to study the statistical properties of these methods in future studies.

摘要

肿瘤微环境由肿瘤细胞、基质细胞、免疫细胞、血管及其他相关非癌细胞组成。肿瘤样本的基因表达测量是微环境中细胞的平均值。然而,研究问题通常寻求关于肿瘤细胞而非周围非肿瘤组织的答案。先前的研究表明,肿瘤纯度(TP)——实体瘤样本中肿瘤细胞的比例——对高生存组与低生存组的差异表达(DE)分析有混杂效应。我们研究了在用于分析基因表达数据的两种统计方法,即差异网络(DN)分析和DE分析中纳入TP信息的三种方法。分析1完全忽略TP信息,分析2通过去除低TP样本使用截短样本,分析3在基础统计模型中使用TP作为协变量。我们使用来自癌症基因组图谱(TCGA)的与三种不同癌症相关的三个基因表达数据集进行研究。在所有三个癌症数据集中,分析2得到的两个网络中的差异连通性比分析1更多。同样,分析1比分析2鉴定出更多差异表达基因。使用分析3进行的DN和DE分析结果在三种癌症中大多与分析1一致。然而,分析3在DN和DE分析中都鉴定出了额外的癌症相关基因。我们的研究结果表明,在线性模型中使用TP作为协变量适用于DE分析,但DN分析需要更稳健的模型。然而,由于经验数据集的真实DN或DE模式未知,未来研究中可使用模拟数据集来研究这些方法的统计特性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8118/8419469/5129f3de8f07/fgene-12-642759-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验