Suppr超能文献

GSAn:一种用于注释基因集的富集分析替代方法。

GSAn: an alternative to enrichment analysis for annotating gene sets.

作者信息

Ayllon-Benitez Aaron, Bourqui Romain, Thébault Patricia, Mougin Fleur

机构信息

University of Bordeaux, Inserm UMR 1219, Bordeaux Population Health Research Center, team ERIAS, Bordeaux 33000, France.

University of Bordeaux, CNRS UMR 5800, LaBRI, Bordeaux 33400, France.

出版信息

NAR Genom Bioinform. 2020 Mar 14;2(2):lqaa017. doi: 10.1093/nargab/lqaa017. eCollection 2020 Jun.

Abstract

The revolution in new sequencing technologies is greatly leading to new understandings of the relations between genotype and phenotype. To interpret and analyze data that are grouped according to a phenotype of interest, methods based on statistical enrichment became a standard in biology. However, these methods synthesize the biological information by selecting the over-represented terms and may suffer from focusing on the most studied genes that represent a limited coverage of annotated genes within a gene set. Semantic similarity measures have shown great results within the pairwise gene comparison by making advantage of the underlying structure of the Gene Ontology. We developed GSAn, a novel gene set annotation method that uses semantic similarity measures to synthesize Gene Ontology annotation terms. The originality of our approach is to identify the best compromise between the number of retained annotation terms that has to be drastically reduced and the number of related genes that has to be as large as possible. Moreover, GSAn offers interactive visualization facilities dedicated to the multi-scale analysis of gene set annotations. Compared to enrichment analysis tools, GSAn has shown excellent results in terms of maximizing the gene coverage while minimizing the number of terms.

摘要

新测序技术的革命极大地推动了人们对基因型与表型之间关系的新认识。为了解释和分析根据感兴趣的表型分组的数据,基于统计富集的方法已成为生物学中的标准方法。然而,这些方法通过选择过度代表的术语来综合生物信息,可能会因专注于研究最多的基因而受到影响,这些基因在基因集中仅占有限的注释基因覆盖范围。语义相似性度量通过利用基因本体论的底层结构,在成对基因比较中显示出了很好的效果。我们开发了GSAn,一种新颖的基因集注释方法,它使用语义相似性度量来综合基因本体论注释术语。我们方法的独特之处在于,要在必须大幅减少的保留注释术语数量与尽可能多的相关基因数量之间找到最佳平衡。此外,GSAn提供了专门用于基因集注释多尺度分析的交互式可视化工具。与富集分析工具相比,GSAn在最大化基因覆盖范围同时最小化术语数量方面显示出了优异的结果。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/650e/7671311/31f36a47fc38/lqaa017fig1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验