Suppr超能文献

SCIA:一种适用于具有不同特征数据的新型基因集分析方法。

SCIA: A Novel Gene Set Analysis Applicable to Data With Different Characteristics.

作者信息

Li Yiqun, Wu Ying, Zhang Xiaohan, Bai Yunfan, Akthar Luqman Muhammad, Lu Xin, Shi Ming, Zhao Jianxiang, Jiang Qinghua, Li Yu

机构信息

Department of Laboratory of Cancer Biology, School of Life Science and Technology, Harbin Institute of Technology, Harbin, China.

Department of Biostatistics, School of Public Health, Southern Medical University, Guangzhou, China.

出版信息

Front Genet. 2019 Jun 25;10:598. doi: 10.3389/fgene.2019.00598. eCollection 2019.

Abstract

Gene set analysis is commonly used in functional enrichment and molecular pathway analyses. Most of the present methods are based on the competitive testing methods which assume each gene is independent of the others. However, the false discovery rates of competitive methods are amplified when they are applied to datasets with high inter-gene correlations. The self-contained testing methods could solve this problem, but there are other restrictions on data characteristics. Therefore, a statistically rigorous testing method applicable to different datasets with various complex characteristics is needed to obtain unbiased and comparable results. We propose a self-contained and competitive incorporated analysis (SCIA) to alleviate the bias caused by the limited application scope of existing gene set analysis methods. This is accomplished through a novel permutation strategy using biological networks to selectively permute gene labels with different probabilities. In simulation studies, SCIA was compared with four representative analysis methods (GSEA, CAMERA, ROAST, and NES), and produced the best performance in both false discovery rate and sensitivity under most conditions with different parameter settings. Further, the KEGG pathway analysis on two real datasets of lung cancer showed that the results found by SCIA in both of the two datasets are much more than that of GSEA and most of them could be supported by literature. Overall, SCIA promisingly offers researchers more reliable and comparable results with different datasets.

摘要

基因集分析常用于功能富集和分子通路分析。目前大多数方法基于竞争测试方法,这些方法假定每个基因彼此独立。然而,当将竞争方法应用于基因间相关性高的数据集时,其错误发现率会被放大。自包含测试方法可以解决这个问题,但对数据特征还有其他限制。因此,需要一种统计严格的测试方法,适用于具有各种复杂特征的不同数据集,以获得无偏且可比的结果。我们提出一种自包含与竞争相结合的分析方法(SCIA),以减轻现有基因集分析方法应用范围有限所导致的偏差。这是通过一种新颖的置换策略实现的,该策略利用生物网络以不同概率选择性地置换基因标签。在模拟研究中,将SCIA与四种代表性分析方法(GSEA、CAMERA、ROAST和NES)进行了比较,在大多数不同参数设置的条件下,SCIA在错误发现率和灵敏度方面均表现最佳。此外,对两个肺癌真实数据集的KEGG通路分析表明,SCIA在两个数据集中发现的结果比GSEA多得多,并且其中大多数结果都能得到文献支持。总体而言,SCIA有望为研究人员提供与不同数据集更可靠且可比的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d37/6603225/dc8ecfe1edab/fgene-10-00598-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验