• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用StatBicRM分析大型基因表达和甲基化数据概况:基于统计双聚类的规则挖掘

Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

作者信息

Maulik Ujjwal, Mallik Saurav, Mukhopadhyay Anirban, Bandyopadhyay Sanghamitra

机构信息

Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India.

Machine Intelligence Unit, Indian Statistical Institute, Kolkata, West Bengal, India.

出版信息

PLoS One. 2015 Apr 1;10(4):e0119448. doi: 10.1371/journal.pone.0119448. eCollection 2015.

DOI:10.1371/journal.pone.0119448
PMID:25830807
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4382191/
Abstract

Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.

摘要

微阵列和珠芯片是生物信息学中用于测量基因表达和甲基化数据的两种最有效的技术。双聚类处理基因和样本的同时聚类。在本文中,我们提出了一种计算规则挖掘框架StatBicRM(即基于统计双聚类的规则挖掘),以使用来自生物数据集的统计和二元包含-最大双聚类技术的集成方法来识别特殊类型的规则和潜在的生物标志物。首先,利用一种新颖的统计策略以某种方式消除无意义/低显著性/冗余基因,使得显著性水平必须满足数据分布特性(即,正态分布或非正态分布)。然后,数据被依次离散化和后离散化。此后,应用双聚类技术来识别最大频繁封闭同质子集。然后从选定的子集中提取相应的特殊类型的规则。我们提出的规则挖掘方法比其他规则挖掘算法表现更好,因为它生成最大频繁封闭同质子集而不是频繁子集。因此,它节省了运行时间,并且可以处理大数据集。使用David数据库对进化规则的基因进行通路和基因本体分析。对出现在进化规则中的基因进行频率分析以确定潜在的生物标志物。此外,我们还对数据进行分类,以了解进化规则能够多准确地描述其余测试(未知)数据。随后,我们还将平均分类准确率和其他相关因素与其他基于规则的分类器进行比较。还进行统计显著性检验以验证比较结果的统计相关性。这里,其他每种规则挖掘方法或基于规则的分类器也都从相同的后离散化数据矩阵开始。最后,我们还纳入了基因表达和甲基化的综合分析,以确定表观遗传效应(即甲基化效应)对基因表达水平的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/595897dc667d/pone.0119448.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/a71af619d922/pone.0119448.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/0e6cd27e4d68/pone.0119448.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/a3dd80a15ffd/pone.0119448.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/d428d87b463f/pone.0119448.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/b252596e9127/pone.0119448.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/e6052e4571ff/pone.0119448.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/2e08bc587316/pone.0119448.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/b5a7fe6b3247/pone.0119448.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/dc03b09e1cf2/pone.0119448.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/fe800b271de5/pone.0119448.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/595897dc667d/pone.0119448.g011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/a71af619d922/pone.0119448.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/0e6cd27e4d68/pone.0119448.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/a3dd80a15ffd/pone.0119448.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/d428d87b463f/pone.0119448.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/b252596e9127/pone.0119448.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/e6052e4571ff/pone.0119448.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/2e08bc587316/pone.0119448.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/b5a7fe6b3247/pone.0119448.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/dc03b09e1cf2/pone.0119448.g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/fe800b271de5/pone.0119448.g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/962e/4382191/595897dc667d/pone.0119448.g011.jpg

相似文献

1
Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.使用StatBicRM分析大型基因表达和甲基化数据概况:基于统计双聚类的规则挖掘
PLoS One. 2015 Apr 1;10(4):e0119448. doi: 10.1371/journal.pone.0119448. eCollection 2015.
2
RANWAR: rank-based weighted association rule mining from gene expression and methylation data.RANWAR:从基因表达和甲基化数据中进行基于秩的加权关联规则挖掘。
IEEE Trans Nanobioscience. 2015 Jan;14(1):59-66. doi: 10.1109/TNB.2014.2359494. Epub 2014 Sep 23.
3
DTFP-Growth: Dynamic Threshold-Based FP-Growth Rule Mining Algorithm Through Integrating Gene Expression, Methylation, and Protein-Protein Interaction Profiles.DTFP-Growth:通过整合基因表达、甲基化和蛋白质-蛋白质相互作用谱的基于动态阈值的 FP 增长规则挖掘算法。
IEEE Trans Nanobioscience. 2018 Apr;17(2):117-125. doi: 10.1109/TNB.2018.2803021.
4
Integrating Multiple Data Sources for Combinatorial Marker Discovery: A Study in Tumorigenesis.整合多种数据源进行组合标记物发现:在肿瘤发生中的研究。
IEEE/ACM Trans Comput Biol Bioinform. 2018 Mar-Apr;15(2):673-687. doi: 10.1109/TCBB.2016.2636207. Epub 2016 Dec 6.
5
Dynamic biclustering of microarray data by multi-objective immune optimization.基于多目标免疫优化算法的基因表达数据动态双聚类分析
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S11. doi: 10.1186/1471-2164-12-S2-S11. Epub 2011 Jul 27.
6
It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data.是时候应用双聚类了:对生物和生物医学数据中双聚类应用的全面综述。
Brief Bioinform. 2019 Jul 19;20(4):1449-1464. doi: 10.1093/bib/bby014.
7
A biclustering algorithm for extracting bit-patterns from binary datasets.一种从二进制数据集中提取位模式的双向聚类算法。
Bioinformatics. 2011 Oct 1;27(19):2738-45. doi: 10.1093/bioinformatics/btr464. Epub 2011 Aug 8.
8
Bit-table based biclustering and frequent closed itemset mining in high-dimensional binary data.基于位表的高维二进制数据双聚类与频繁闭项集挖掘
ScientificWorldJournal. 2014 Jan 30;2014:870406. doi: 10.1155/2014/870406. eCollection 2014.
9
Analysis of Gene Expression Patterns Using Biclustering.使用双聚类分析基因表达模式。
Methods Mol Biol. 2016;1375:91-103. doi: 10.1007/7651_2015_280.
10
High confidence rule mining for microarray analysis.用于微阵列分析的高置信度规则挖掘
IEEE/ACM Trans Comput Biol Bioinform. 2007 Oct-Dec;4(4):611-623. doi: 10.1109/tcbb.2007.1050.

引用本文的文献

1
Optimal ranking and directional signature classification using the integral strategy of multi-objective optimization-based association rule mining of multi-omics data.使用基于多组学数据的多目标优化关联规则挖掘的积分策略进行最优排序和方向特征分类。
Front Bioinform. 2023 Jul 27;3:1182176. doi: 10.3389/fbinf.2023.1182176. eCollection 2023.
2
3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection.3PNMF-MKL:一种基于非负矩阵分解的多模态数据集成多内核学习方法及其在基因特征检测中的应用。
Front Genet. 2023 Feb 14;14:1095330. doi: 10.3389/fgene.2023.1095330. eCollection 2023.
3

本文引用的文献

1
A Survey and Comparative Study of Statistical Tests for Identifying Differential Expression from Microarray Data.用于从微阵列数据中识别差异表达的统计检验的调查与比较研究
IEEE/ACM Trans Comput Biol Bioinform. 2014 Jan-Feb;11(1):95-115. doi: 10.1109/TCBB.2013.147.
2
A network module-based method for identifying cancer prognostic signatures.一种基于网络模块的癌症预后特征识别方法。
Genome Biol. 2012 Dec 10;13(12):R112. doi: 10.1186/gb-2012-13-12-r112.
3
Expression of transient receptor potential channel 6 in cervical cancer.瞬时受体电位通道6在宫颈癌中的表达
Coordinated medical care for children with neurofibromatosis type 1 and related RASopathies in Poland.
波兰1型神经纤维瘤病及相关RAS病患儿的协调医疗护理。
Arch Med Sci. 2019 May 17;17(5):1221-1231. doi: 10.5114/aoms.2019.85143. eCollection 2021.
4
Detecting methylation signatures in neurodegenerative disease by density-based clustering of applications with reducing noise.通过基于密度的应用程序聚类减少噪声来检测神经退行性疾病中的甲基化特征。
Sci Rep. 2020 Dec 17;10(1):22164. doi: 10.1038/s41598-020-78463-3.
5
Molecular signatures identified by integrating gene expression and methylation in non-seminoma and seminoma of testicular germ cell tumours.整合基因表达和甲基化鉴定非精原细胞瘤和精原细胞瘤的分子特征。
Epigenetics. 2021 Jan-Feb;16(2):162-176. doi: 10.1080/15592294.2020.1790108. Epub 2020 Jul 13.
6
Cytoplasm Types Affect DNA Methylation among Different Cytoplasmic Male Sterility Lines and Their Maintainer Line in Soybean ( L.).细胞质类型对大豆(L.)不同细胞质雄性不育系及其保持系DNA甲基化的影响
Plants (Basel). 2020 Mar 20;9(3):385. doi: 10.3390/plants9030385.
7
MicroRNA and transcription factor co-regulatory networks and subtype classification of seminoma and non-seminoma in testicular germ cell tumors.微小 RNA 和转录因子的共同调控网络以及睾丸生殖细胞肿瘤中精原细胞瘤和非精原细胞瘤的亚型分类。
Sci Rep. 2020 Jan 21;10(1):852. doi: 10.1038/s41598-020-57834-w.
8
Multi-Objective Optimized Fuzzy Clustering for Detecting Cell Clusters from Single-Cell Expression Profiles.基于单细胞表达谱的多目标优化模糊聚类检测细胞簇。
Genes (Basel). 2019 Aug 13;10(8):611. doi: 10.3390/genes10080611.
9
Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data.基于图和规则的学习算法:使用基因组数据对癌症类型分类和预后的应用的全面综述。
Brief Bioinform. 2020 Mar 23;21(2):368-394. doi: 10.1093/bib/bby120.
10
Identification of gene signatures from RNA-seq data using Pareto-optimal cluster algorithm.使用帕累托最优聚类算法从RNA测序数据中识别基因特征。
BMC Syst Biol. 2018 Dec 21;12(Suppl 8):126. doi: 10.1186/s12918-018-0650-2.
Onco Targets Ther. 2012;5:171-6. doi: 10.2147/OTT.S33550. Epub 2012 Sep 3.
4
Expression and prognostic significance of centromere protein A in human lung adenocarcinoma.中心体蛋白 A 在人肺腺癌中的表达及预后意义。
Lung Cancer. 2012 Aug;77(2):407-14. doi: 10.1016/j.lungcan.2012.04.007. Epub 2012 Apr 28.
5
A novel biclustering approach to association rule mining for predicting HIV-1-human protein interactions.一种用于预测 HIV-1 人类蛋白质相互作用的关联规则挖掘的新型双聚类方法。
PLoS One. 2012;7(4):e32289. doi: 10.1371/journal.pone.0032289. Epub 2012 Apr 23.
6
Genome-wide DNA methylation indicates silencing of tumor suppressor genes in uterine leiomyoma.全基因组 DNA 甲基化表明子宫平滑肌瘤中肿瘤抑制基因的沉默。
PLoS One. 2012;7(3):e33284. doi: 10.1371/journal.pone.0033284. Epub 2012 Mar 13.
7
A robust tool for discriminative analysis and feature selection in paired samples impacts the identification of the genes essential for reprogramming lung tissue to adenocarcinoma.一种强大的用于配对样本判别分析和特征选择的工具,影响了对将肺组织重编程为腺癌所必需的基因的识别。
BMC Genomics. 2011 Nov 30;12 Suppl 3(Suppl 3):S24. doi: 10.1186/1471-2164-12-S3-S24.
8
Gene expression signature of cigarette smoking and its role in lung adenocarcinoma development and survival.吸烟的基因表达特征及其在肺腺癌发生发展和生存中的作用。
PLoS One. 2008 Feb 20;3(2):e1651. doi: 10.1371/journal.pone.0001651.
9
Linear models and empirical bayes methods for assessing differential expression in microarray experiments.用于评估微阵列实验中差异表达的线性模型和经验贝叶斯方法。
Stat Appl Genet Mol Biol. 2004;3:Article3. doi: 10.2202/1544-6115.1027. Epub 2004 Feb 12.
10
A two-sample Bayesian t-test for microarray data.针对微阵列数据的双样本贝叶斯t检验。
BMC Bioinformatics. 2006 Mar 10;7:126. doi: 10.1186/1471-2105-7-126.