Suppr超能文献

非线性岭回归改进了细胞类型特异性差异表达分析。

Nonlinear ridge regression improves cell-type-specific differential expression analysis.

作者信息

Takeuchi Fumihiko, Kato Norihiro

机构信息

Department of Gene Diagnostics and Therapeutics, Research Institute, National Center for Global Health and Medicine (NCGM), 1-21-1 Toyama, Shinjuku-ku, Tokyo, 162-8655, Japan.

出版信息

BMC Bioinformatics. 2021 Mar 22;22(1):141. doi: 10.1186/s12859-021-03982-3.

Abstract

BACKGROUND

Epigenome-wide association studies (EWAS) and differential gene expression analyses are generally performed on tissue samples, which consist of multiple cell types. Cell-type-specific effects of a trait, such as disease, on the omics expression are of interest but difficult or costly to measure experimentally. By measuring omics data for the bulk tissue, cell type composition of a sample can be inferred statistically. Subsequently, cell-type-specific effects are estimated by linear regression that includes terms representing the interaction between the cell type proportions and the trait. This approach involves two issues, scaling and multicollinearity.

RESULTS

First, although cell composition is analyzed in linear scale, differential methylation/expression is analyzed suitably in the logit/log scale. To simultaneously analyze two scales, we applied nonlinear regression. Second, we show that the interaction terms are highly collinear, which is obstructive to ordinary regression. To cope with the multicollinearity, we applied ridge regularization. In simulated data, nonlinear ridge regression attained well-balanced sensitivity, specificity and precision. Marginal model attained the lowest precision and highest sensitivity and was the only algorithm to detect weak signal in real data.

CONCLUSION

Nonlinear ridge regression performed cell-type-specific association test on bulk omics data with well-balanced performance. The omicwas package for R implements nonlinear ridge regression for cell-type-specific EWAS, differential gene expression and QTL analyses. The software is freely available from https://github.com/fumi-github/omicwas.

摘要

背景

表观基因组全关联研究(EWAS)和差异基因表达分析通常在由多种细胞类型组成的组织样本上进行。性状(如疾病)对组学表达的细胞类型特异性影响备受关注,但通过实验测量既困难又昂贵。通过测量大块组织的组学数据,可以通过统计推断样本的细胞类型组成。随后,通过线性回归估计细胞类型特异性效应,该回归包括表示细胞类型比例与性状之间相互作用的项。这种方法涉及两个问题,即缩放和多重共线性。

结果

首先,虽然细胞组成是在线性尺度上进行分析,但差异甲基化/表达适合在对数几率/对数尺度上进行分析。为了同时分析两个尺度,我们应用了非线性回归。其次,我们表明相互作用项高度共线性,这对普通回归有阻碍作用。为了应对多重共线性,我们应用了岭正则化。在模拟数据中,非线性岭回归获得了平衡良好的灵敏度、特异性和精度。边际模型的精度最低,灵敏度最高,并且是唯一能够检测真实数据中微弱信号的算法。

结论

非线性岭回归在大块组学数据上进行细胞类型特异性关联测试,性能平衡良好。用于R的omicwas软件包实现了用于细胞类型特异性EWAS、差异基因表达和QTL分析的非线性岭回归。该软件可从https://github.com/fumi-github/omicwas免费获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e68d/7986289/f429403c4884/12859_2021_3982_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验