Suppr超能文献

BLASSO:将生物学知识整合到正则化线性模型中。

BLASSO: integration of biological knowledge into a regularized linear model.

作者信息

Urda Daniel, Aragón Francisco, Bautista Rocío, Franco Leonardo, Veredas Francisco J, Claros Manuel Gonzalo, Jerez José Manuel

机构信息

Universidad de Cádiz, Departamento de Ingeniería Informática, Avda. de la Universidad de Cádiz n°10, Puerto Real, Cádiz, 11519, Spain.

Universidad de Málaga, Departamento de Lenguajes y Ciencias de la Computación, Bulevar Louis Pasteur, 35. Campus de Teatinos, Málaga, 29071, Spain.

出版信息

BMC Syst Biol. 2018 Nov 20;12(Suppl 5):94. doi: 10.1186/s12918-018-0612-8.

Abstract

BACKGROUND

In RNA-Seq gene expression analysis, a genetic signature or biomarker is defined as a subset of genes that is probably involved in a given complex human trait and usually provide predictive capabilities for that trait. The discovery of new genetic signatures is challenging, as it entails the analysis of complex-nature information encoded at gene level. Moreover, biomarkers selection becomes unstable, since high correlation among the thousands of genes included in each sample usually exists, thus obtaining very low overlapping rates between the genetic signatures proposed by different authors. In this sense, this paper proposes BLASSO, a simple and highly interpretable linear model with l-regularization that incorporates prior biological knowledge to the prediction of breast cancer outcomes. Two different approaches to integrate biological knowledge in BLASSO, Gene-specific and Gene-disease, are proposed to test their predictive performance and biomarker stability on a public RNA-Seq gene expression dataset for breast cancer. The relevance of the genetic signature for the model is inspected by a functional analysis.

RESULTS

BLASSO has been compared with a baseline LASSO model. Using 10-fold cross-validation with 100 repetitions for models' assessment, average AUC values of 0.7 and 0.69 were obtained for the Gene-specific and the Gene-disease approaches, respectively. These efficacy rates outperform the average AUC of 0.65 obtained with the LASSO. With respect to the stability of the genetic signatures found, BLASSO outperformed the baseline model in terms of the robustness index (RI). The Gene-specific approach gave RI of 0.15±0.03, compared to RI of 0.09±0.03 given by LASSO, thus being 66% times more robust. The functional analysis performed to the genetic signature obtained with the Gene-disease approach showed a significant presence of genes related with cancer, as well as one gene (IFNK) and one pseudogene (PCNAP1) which a priori had not been described to be related with cancer.

CONCLUSIONS

BLASSO has been shown as a good choice both in terms of predictive efficacy and biomarker stability, when compared to other similar approaches. Further functional analyses of the genetic signatures obtained with BLASSO has not only revealed genes with important roles in cancer, but also genes that should play an unknown or collateral role in the studied disease.

摘要

背景

在RNA测序基因表达分析中,基因特征或生物标志物被定义为可能与特定复杂人类性状相关的一组基因,通常可为该性状提供预测能力。发现新的基因特征具有挑战性,因为这需要分析基因水平编码的复杂性质信息。此外,生物标志物的选择变得不稳定,因为每个样本中包含的数千个基因之间通常存在高度相关性,因此不同作者提出的基因特征之间的重叠率非常低。从这个意义上讲,本文提出了BLASSO,这是一种简单且高度可解释的线性模型,具有l正则化,将先验生物学知识纳入乳腺癌预后的预测中。提出了两种在BLASSO中整合生物学知识的不同方法,即基因特异性方法和基因疾病方法,以在公开的乳腺癌RNA测序基因表达数据集上测试它们的预测性能和生物标志物稳定性。通过功能分析检查基因特征与模型的相关性。

结果

将BLASSO与基线LASSO模型进行了比较。使用10折交叉验证并重复100次进行模型评估,基因特异性方法和基因疾病方法的平均AUC值分别为0.7和0.69。这些有效率优于LASSO获得的平均AUC值0.65。关于所发现的基因特征的稳定性,BLASSO在稳健性指数(RI)方面优于基线模型。基因特异性方法的RI为0.15±0.03,而LASSO的RI为0.09±0.03,因此其稳健性高66%。对通过基因疾病方法获得的基因特征进行的功能分析表明,存在与癌症相关的基因,以及一个先验未被描述与癌症相关的基因(IFNK)和一个假基因(PCNAP1)。

结论

与其他类似方法相比,BLASSO在预测效能和生物标志物稳定性方面均被证明是一个不错的选择。对通过BLASSO获得的基因特征进行的进一步功能分析不仅揭示了在癌症中起重要作用的基因,还揭示了在研究疾病中应起未知或附带作用的基因。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/387f/6245593/3c0278e9b4da/12918_2018_612_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验