Division of Statistics and Data Science, Department of Mathematical Sciences, University of Cincinnati, Cincinnati, OH, USA.
Division of Biostatistics and Epidemiology, Department of Pediatrics, Cincinnati Children's Hospital Medical Center, University of Cincinnati, Cincinnati, OH, USA.
Sci Rep. 2022 May 23;12(1):8643. doi: 10.1038/s41598-022-12199-0.
Recent progress in RNA sequencing (RNA-seq) allows us to explore whole-genome gene expression profiles and to develop predictive model for disease risk. The objective of this study was to develop and validate an RNA-seq-based transcriptomic risk score (RSRS) for disease risk prediction that can simultaneously accommodate demographic information. We analyzed RNA-seq gene expression data from 441 asthmatic and 254 non-asthmatic samples. Logistic least absolute shrinkage and selection operator (Lasso) regression analysis in the training set identified 73 differentially expressed genes (DEG) to form a weighted RSRS that discriminated asthmatics from healthy subjects with area under the curve (AUC) of 0.80 in the testing set after adjustment for age and gender. The 73-gene RSRS was validated in three independent RNA-seq datasets and achieved AUCs of 0.70, 0.77 and 0.60, respectively. To explore their biological and molecular functions in asthma phenotype, we examined the 73 genes by enrichment pathway analysis and found that these genes were significantly (p < 0.0001) enriched for DNA replication, recombination, and repair, cell-to-cell signaling and interaction, and eumelanin biosynthesis and developmental disorder. Further in-silico analyses of the 73 genes using Connectivity map shows that drugs (mepacrine, dactolisib) and genetic perturbagens (PAK1, GSR, RBM15 and TNFRSF12A) were identified and could potentially be repurposed for treating asthma. These findings show the promise for RNA-seq risk scores to stratify and predict disease risk.
RNA 测序(RNA-seq)的最新进展使我们能够探索全基因组基因表达谱,并开发疾病风险的预测模型。本研究的目的是开发和验证一种基于 RNA-seq 的转录组风险评分(RSRS),用于疾病风险预测,同时可以适应人口统计学信息。我们分析了来自 441 例哮喘和 254 例非哮喘样本的 RNA-seq 基因表达数据。在训练集中,逻辑最小绝对收缩和选择算子(Lasso)回归分析确定了 73 个差异表达基因(DEG),形成一个加权 RSRS,在调整年龄和性别后,在测试集中区分哮喘患者和健康受试者的曲线下面积(AUC)为 0.80。73 基因 RSRS 在三个独立的 RNA-seq 数据集得到验证,AUC 分别为 0.70、0.77 和 0.60。为了探讨这些基因在哮喘表型中的生物学和分子功能,我们通过富集途径分析检查了 73 个基因,发现这些基因在 DNA 复制、重组和修复、细胞间信号和相互作用以及真黑素生物合成和发育障碍方面显著富集(p<0.0001)。进一步使用连接映射对 73 个基因进行的计算分析表明,药物(mepacrine、dactolisib)和遗传干扰物(PAK1、GSR、RBM15 和 TNFRSF12A)被鉴定出来,可能被重新用于治疗哮喘。这些发现表明 RNA-seq 风险评分有潜力用于分层和预测疾病风险。