Wang Dong, Li Jie, Liu Rui, Wang Yadong
School of Computer Science and Technology, Harbin Institute of Technology, West Da-Zhi Street, Harbin, China.
BMC Syst Biol. 2018 Dec 31;12(Suppl 9):133. doi: 10.1186/s12918-018-0659-6.
With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enrichment analysis heavily relies on the quality and integrity of gene set annotations. Although several methods were developed to annotate gene sets, there is still a lack of high quality annotation methods. Here, we propose a novel method to improve the annotation accuracy through combining the GO structure and gene expression data.
We propose a novel approach for optimizing gene set annotations to get more accurate annotation results. The proposed method filters the inconsistent annotations using GO structure information and probabilistic gene set clusters calculated by a range of cluster sizes over multiple bootstrap resampled datasets. The proposed method is employed to analyze p53 cell lines, colon cancer and breast cancer gene expression data. The experimental results show that the proposed method can filter a number of annotations unrelated to experimental data and increase gene set enrichment power and decrease the inconsistent of annotations.
A novel gene set annotation optimization approach is proposed to improve the quality of gene annotations. Experimental results indicate that the proposed method effectively improves gene set annotation quality based on the GO structure and gene expression data.
随着基因组数据的快速积累,对这些数据进行注释和解读已成为一个具有挑战性的问题。作为一种代表性方法,基因集富集分析已被广泛用于解读生物学实验产生的大分子数据集。基因集富集分析的结果在很大程度上依赖于基因集注释的质量和完整性。尽管已经开发了几种方法来注释基因集,但仍然缺乏高质量的注释方法。在此,我们提出一种通过结合基因本体(GO)结构和基因表达数据来提高注释准确性的新方法。
我们提出了一种优化基因集注释以获得更准确注释结果的新方法。该方法利用GO结构信息以及在多个自展重采样数据集上通过一系列聚类大小计算得到的概率性基因集簇来筛选不一致的注释。所提出的方法被用于分析p53细胞系、结肠癌和乳腺癌基因表达数据。实验结果表明,该方法能够筛选出许多与实验数据无关的注释,提高基因集富集能力,并减少注释的不一致性。
提出了一种新的基因集注释优化方法以提高基因注释质量。实验结果表明,该方法基于GO结构和基因表达数据有效地提高了基因集注释质量。