• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用带有基于本体的细化算子的概念规则学习在组学数据中寻找语义模式。

Finding semantic patterns in omics data using concept rule learning with an ontology-based refinement operator.

作者信息

Malinka František, Železný Filip, Kléma Jiří

机构信息

Department of Computer Science, Czech Technical University in Prague, Karlovo náměstí 13, Prague, 121 35 Czech Republic.

Czech Centre for Phenogenomics, Institute of Molecular Genetics of the Czech Academy of Sciences, Prague, Czech Republic.

出版信息

BioData Min. 2020 Sep 1;13:13. doi: 10.1186/s13040-020-00219-6. eCollection 2020.

DOI:10.1186/s13040-020-00219-6
PMID:32905086
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7466824/
Abstract

BACKGROUND

Identification of non-trivial and meaningful patterns in omics data is one of the most important biological tasks. The patterns help to better understand biological systems and interpret experimental outcomes. A well-established method serving to explain such biological data is Gene Set Enrichment Analysis. However, this type of analysis is restricted to a specific type of evaluation. Abstracting from details, the analyst provides a sorted list of genes and ontological annotations of the individual genes; the method outputs a subset of ontological terms enriched in the gene list. Here, in contrary to enrichment analysis, we introduce a new tool/framework that allows for the induction of more complex patterns of 2-dimensional binary omics data. This extension allows to discover and describe semantically coherent biclusters.

RESULTS

We present a new rapid method called sem1R that reveals interpretable hidden rules in omics data. These rules capture semantic differences between two classes: a target class as a collection of positive examples and a non-target class containing negative examples. The method is inspired by the CN2 rule learner and introduces a new refinement operator that exploits prior knowledge in the form of ontologies. In our work this knowledge serves to create accurate and interpretable rules. The novel refinement operator uses two reduction procedures: Redundant Generalization and Redundant Non-potential, both of which help to dramatically prune the rule space and consequently, speed-up the entire process of rule induction in comparison with the traditional refinement operator as is presented in CN2.

CONCLUSIONS

Efficiency and effectivity of the novel refinement operator were tested on three real different gene expression datasets. Concretely, the Dresden Ovary Dataset, DISC, and m2816 were employed. The experiments show that the ontology-based refinement operator speeds-up the pattern induction drastically. The algorithm is written in C++ and is published as an R package available at http://github.com/fmalinka/sem1r.

摘要

背景

在组学数据中识别重要且有意义的模式是最重要的生物学任务之一。这些模式有助于更好地理解生物系统并解释实验结果。一种用于解释此类生物数据的成熟方法是基因集富集分析。然而,这种类型的分析仅限于特定类型的评估。概括来说,分析人员提供一份排序的基因列表以及各个基因的本体注释;该方法输出基因列表中富集的本体术语子集。在此,与富集分析相反,我们引入了一种新工具/框架,它允许对二维二元组学数据进行更复杂模式的归纳。这种扩展使得能够发现和描述语义连贯的双聚类。

结果

我们提出了一种名为sem1R的新的快速方法,该方法能揭示组学数据中可解释的隐藏规则。这些规则捕捉两个类别之间的语义差异:作为正例集合的目标类别和包含负例的非目标类别。该方法受CN2规则学习器的启发,并引入了一种新的细化算子,该算子利用本体形式的先验知识。在我们的工作中,这些知识用于创建准确且可解释的规则。新颖的细化算子使用两种约简过程:冗余泛化和冗余非潜力,与CN2中提出的传统细化算子相比,这两种过程都有助于大幅修剪规则空间,从而加快规则归纳的整个过程。

结论

在三个真实不同的基因表达数据集上测试了新颖细化算子的效率和有效性。具体而言,使用了德累斯顿卵巢数据集、DISC和m2816。实验表明,基于本体的细化算子极大地加快了模式归纳。该算法用C++编写,并作为R包发布,可在http://github.com/fmalinka/sem1r获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/458c/7466824/b1b5d9ab1077/13040_2020_219_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/458c/7466824/4b7aba688b85/13040_2020_219_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/458c/7466824/0eb161e1ef89/13040_2020_219_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/458c/7466824/065e0cbac80f/13040_2020_219_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/458c/7466824/ab11a4d7ddec/13040_2020_219_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/458c/7466824/b1b5d9ab1077/13040_2020_219_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/458c/7466824/4b7aba688b85/13040_2020_219_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/458c/7466824/0eb161e1ef89/13040_2020_219_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/458c/7466824/065e0cbac80f/13040_2020_219_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/458c/7466824/ab11a4d7ddec/13040_2020_219_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/458c/7466824/b1b5d9ab1077/13040_2020_219_Fig5_HTML.jpg

相似文献

1
Finding semantic patterns in omics data using concept rule learning with an ontology-based refinement operator.使用带有基于本体的细化算子的概念规则学习在组学数据中寻找语义模式。
BioData Min. 2020 Sep 1;13:13. doi: 10.1186/s13040-020-00219-6. eCollection 2020.
2
Semantic biclustering for finding local, interpretable and predictive expression patterns.语义二分聚类用于发现局部、可解释和可预测的表达模式。
BMC Genomics. 2017 Oct 16;18(Suppl 7):752. doi: 10.1186/s12864-017-4132-5.
3
Integrating biological knowledge based on functional annotations for biclustering of gene expression data.基于功能注释整合生物学知识以进行基因表达数据的双聚类分析。
Comput Methods Programs Biomed. 2015 May;119(3):163-80. doi: 10.1016/j.cmpb.2015.02.010. Epub 2015 Mar 18.
4
Automated ontology generation framework powered by linked biomedical ontologies for disease-drug domain.基于链接生物医学本体的疾病-药物领域自动化本体生成框架。
Comput Methods Programs Biomed. 2018 Oct;165:117-128. doi: 10.1016/j.cmpb.2018.08.010. Epub 2018 Aug 16.
5
A relation based measure of semantic similarity for Gene Ontology annotations.一种基于关系的基因本体注释语义相似度度量方法。
BMC Bioinformatics. 2008 Nov 4;9:468. doi: 10.1186/1471-2105-9-468.
6
Interactive knowledge discovery and data mining on genomic expression data with numeric formal concept analysis.基于数值形式概念分析的基因组表达数据交互式知识发现与数据挖掘
BMC Bioinformatics. 2016 Sep 15;17(1):374. doi: 10.1186/s12859-016-1234-z.
7
R.ROSETTA: an interpretable machine learning framework.R.ROSETTA:一个可解释的机器学习框架。
BMC Bioinformatics. 2021 Mar 6;22(1):110. doi: 10.1186/s12859-021-04049-z.
8
Translational Metabolomics of Head Injury: Exploring Dysfunctional Cerebral Metabolism with Ex Vivo NMR Spectroscopy-Based Metabolite Quantification头部损伤的转化代谢组学:基于体外核磁共振波谱的代谢物定量分析探索脑代谢功能障碍
9
A drug target slim: using gene ontology and gene ontology annotations to navigate protein-ligand target space in ChEMBL.药物靶点精简:利用基因本体论和基因本体注释在ChEMBL中探索蛋白质-配体靶点空间
J Biomed Semantics. 2016 Sep 27;7(1):59. doi: 10.1186/s13326-016-0102-0.
10
NetGen: a novel network-based probabilistic generative model for gene set functional enrichment analysis.NetGen:一种用于基因集功能富集分析的基于网络的新型概率生成模型。
BMC Syst Biol. 2017 Sep 21;11(Suppl 4):75. doi: 10.1186/s12918-017-0456-7.

引用本文的文献

1
Semantic clustering analysis of E3-ubiquitin ligases in gastrointestinal tract defines genes ontology clusters with tissue expression patterns.胃肠道 E3 泛素连接酶的语义聚类分析定义了具有组织表达模式的基因本体论聚类。
BMC Gastroenterol. 2022 Apr 12;22(1):186. doi: 10.1186/s12876-022-02265-2.

本文引用的文献

1
Semantic biclustering for finding local, interpretable and predictive expression patterns.语义二分聚类用于发现局部、可解释和可预测的表达模式。
BMC Genomics. 2017 Oct 16;18(Suppl 7):752. doi: 10.1186/s12864-017-4132-5.
2
KEGG: new perspectives on genomes, pathways, diseases and drugs.京都基因与基因组百科全书(KEGG):关于基因组、通路、疾病和药物的新视角。
Nucleic Acids Res. 2017 Jan 4;45(D1):D353-D361. doi: 10.1093/nar/gkw1092. Epub 2016 Nov 28.
3
Expansion of the Gene Ontology knowledgebase and resources.基因本体知识库及资源的扩展。
Nucleic Acids Res. 2017 Jan 4;45(D1):D331-D338. doi: 10.1093/nar/gkw1108. Epub 2016 Nov 29.
4
More effort - more results: recent advances in integrative 'omics' data analysis.更多努力,更多成果:整合组学数据分析的最新进展。
Curr Opin Plant Biol. 2016 Apr;30:57-61. doi: 10.1016/j.pbi.2015.12.010. Epub 2016 Feb 15.
5
Expression Atlas update--an integrated database of gene and protein expression in humans, animals and plants.表达图谱更新——一个关于人类、动物和植物基因与蛋白质表达的综合数据库。
Nucleic Acids Res. 2016 Jan 4;44(D1):D746-52. doi: 10.1093/nar/gkv1045. Epub 2015 Oct 19.
6
KEGG as a reference resource for gene and protein annotation.KEGG作为基因和蛋白质注释的参考资源。
Nucleic Acids Res. 2016 Jan 4;44(D1):D457-62. doi: 10.1093/nar/gkv1070. Epub 2015 Oct 17.
7
Systematic imaging reveals features and changing localization of mRNAs in Drosophila development.系统性成像揭示了果蝇发育过程中mRNA的特征和定位变化。
Elife. 2015 Apr 2;4:e05003. doi: 10.7554/eLife.05003.
8
Disease Ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data.《疾病本体论2015年更新:一个通过疾病数据连接生物医学知识的经过扩展和更新的人类疾病数据库》
Nucleic Acids Res. 2015 Jan;43(Database issue):D1071-8. doi: 10.1093/nar/gku1011. Epub 2014 Oct 27.
9
The Drosophila anatomy ontology.果蝇解剖学本体论。
J Biomed Semantics. 2013 Oct 18;4(1):32. doi: 10.1186/2041-1480-4-32.
10
Evolutionary dynamics of gene and isoform regulation in Mammalian tissues.哺乳动物组织中基因和异构体调控的进化动态。
Science. 2012 Dec 21;338(6114):1593-9. doi: 10.1126/science.1228186.