Suppr超能文献

基于通路评分的预测模型:通路集合的稳健性和显著性。

Predictive modelling using pathway scores: robustness and significance of pathway collections.

机构信息

Computational and Systems Medicine, Department of Surgery and Cancer, Sir Alexander Fleming building, Imperial College, London, SW1 2AZ, UK.

Division of Cancer, Department of Surgery and Cancer, Imperial College London, Hammersmith Hospital Campus, W12 0NN, London, UK.

出版信息

BMC Bioinformatics. 2019 Nov 4;20(1):543. doi: 10.1186/s12859-019-3163-0.

Abstract

BACKGROUND

Transcriptomic data is often used to build statistical models which are predictive of a given phenotype, such as disease status. Genes work together in pathways and it is widely thought that pathway representations will be more robust to noise in the gene expression levels. We aimed to test this hypothesis by constructing models based on either genes alone, or based on sample specific scores for each pathway, thus transforming the data to a 'pathway space'. We progressively degraded the raw data by addition of noise and examined the ability of the models to maintain predictivity.

RESULTS

Models in the pathway space indeed had higher predictive robustness than models in the gene space. This result was independent of the workflow, parameters, classifier and data set used. Surprisingly, randomised pathway mappings produced models of similar accuracy and robustness to true mappings, suggesting that the success of pathway space models is not conferred by the specific definitions of the pathway. Instead, predictive models built on the true pathway mappings led to prediction rules with fewer influential pathways than those built on randomised pathways. The extent of this effect was used to differentiate pathway collections coming from a variety of widely used pathway databases.

CONCLUSIONS

Prediction models based on pathway scores are more robust to degradation of gene expression information than the equivalent models based on ungrouped genes. While models based on true pathway scores are not more robust or accurate than those based on randomised pathways, true pathways produced simpler prediction rules, emphasizing a smaller number of pathways.

摘要

背景

转录组数据常被用于构建统计学模型,以预测特定表型,如疾病状态。基因在通路中共同发挥作用,人们普遍认为通路的表示形式对基因表达水平的噪声更稳健。我们旨在通过构建基于基因本身或基于每个通路的样本特定分数的模型来检验这一假设,从而将数据转换为“通路空间”。我们通过添加噪声逐渐降低原始数据的质量,并检查模型保持可预测性的能力。

结果

通路空间中的模型确实比基因空间中的模型具有更高的预测稳健性。该结果独立于使用的工作流程、参数、分类器和数据集。令人惊讶的是,随机通路映射产生的模型与真实映射的准确性和稳健性相似,这表明通路空间模型的成功并非归因于通路的特定定义。相反,基于真实通路映射构建的预测模型导致具有较少影响通路的预测规则,而不是基于随机通路构建的模型。这种效果的程度可用于区分来自各种广泛使用的通路数据库的通路集合。

结论

基于通路得分的预测模型比基于未分组基因的等效模型更能抵抗基因表达信息的降解。虽然基于真实通路得分的模型不如基于随机通路的模型稳健或准确,但真实通路产生了更简单的预测规则,强调了较少数量的通路。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/17ae/6827178/3a1bbec10715/12859_2019_3163_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验