Lim Sangsoo, Lee Sangseon, Jung Inuk, Rhee Sungmin, Kim Sun
Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Korea.
Department of Computer Science and Engineering, Seoul National University, Seoul, Korea.
Brief Bioinform. 2020 Jan 17;21(1):36-46. doi: 10.1093/bib/bby097.
Biological pathways are extensively used for the analysis of transcriptome data to characterize biological mechanisms underlying various phenotypes. There are a number of computational tools that summarize transcriptome data at the pathway level. However, there is no comparative study on how well these tools produce useful information at the cohort level, enabling comparison of many samples or patients.
In this study, we systematically compared and evaluated 13 different pathway activity inference tools based on 5 comparison criteria using pan-cancer data set. This study has two major contributions. First, our study provides a comprehensive survey on computational techniques used by existing pathway activity inference tools. The tools use different strategies and assume different requirements on data: input transformation, use of labels, necessity of cohort-level input data, use of gene relations and scoring metric. Second, we performed extensive evaluations on the performance of these tools. Because different tools use different methods to map samples to the pathway dimension, the tools are evaluated at the pathway level using five comparison criteria. Starting from measuring how well a tool maintains the characteristics of original gene expression values, robustness was also investigated by adding noise into gene expression data. Classification tasks on three clinical variables (tumor versus normal, survival and cancer subtypes) were performed to evaluate the utility of tools for their clinical applications. In addition, the inferred activity values were compared between the tools to see how similar they are along with the scoring schemes they use.
生物通路被广泛用于转录组数据分析,以表征各种表型背后的生物学机制。有许多计算工具可在通路水平上总结转录组数据。然而,对于这些工具在队列水平上产生有用信息的能力,即对许多样本或患者进行比较的能力,尚无比较研究。
在本研究中,我们使用泛癌数据集,基于5个比较标准,系统地比较和评估了13种不同的通路活性推断工具。本研究有两个主要贡献。首先,我们的研究对现有通路活性推断工具所使用的计算技术进行了全面综述。这些工具采用不同的策略,并对数据有不同的要求:输入转换、标签的使用、队列水平输入数据的必要性、基因关系的使用以及评分指标。其次,我们对这些工具的性能进行了广泛评估。由于不同工具使用不同方法将样本映射到通路维度,因此使用五个比较标准在通路水平上对这些工具进行评估。从衡量工具保持原始基因表达值特征的程度开始,还通过向基因表达数据中添加噪声来研究稳健性。针对三个临床变量(肿瘤与正常、生存和癌症亚型)进行分类任务,以评估工具在临床应用中的效用。此外,还比较了工具之间推断的活性值,以查看它们的相似程度以及所使用的评分方案。