• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于最大相关最小冗余特征选择的多组学数据表观遗传生物标志物识别

Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data.

作者信息

Mallik Saurav, Bhadra Tapas, Maulik Ujjwal

出版信息

IEEE Trans Nanobioscience. 2017 Jan;16(1):3-10. doi: 10.1109/TNB.2017.2650217. Epub 2017 Jan 9.

DOI:10.1109/TNB.2017.2650217
PMID:28092570
Abstract

Epigenetic Biomarker discovery is an important task in bioinformatics. In this article, we develop a new framework of identifying statistically significant epigenetic biomarkers using maximal-relevance and minimal-redundancy criterion based feature (gene) selection for multi-omics dataset. Firstly, we determine the genes that have both expression as well as methylation values, and follow normal distribution. Similarly, we identify the genes which consist of both expression and methylation values, but do not follow normal distribution. For each case, we utilize a gene-selection method that provides maximal-relevant, but variable-weighted minimum-redundant genes as top ranked genes. For statistical validation, we apply t-test on both the expression and methylation data consisting of only the normally distributed top ranked genes to determine how many of them are both differentially expressed andmethylated. Similarly, we utilize Limma package for performing non-parametric Empirical Bayes test on both expression and methylation data comprising only the non-normally distributed top ranked genes to identify how many of them are both differentially expressed and methylated. We finally report the top-ranking significant gene-markerswith biological validation. Moreover, our framework improves positive predictive rate and reduces false positive rate in marker identification. In addition, we provide a comparative analysis of our gene-selection method as well as othermethods based on classificationperformances obtained using several well-known classifiers.

摘要

表观遗传生物标志物的发现是生物信息学中的一项重要任务。在本文中,我们开发了一种新的框架,用于使用基于最大相关性和最小冗余标准的特征(基因)选择方法,从多组学数据集中识别具有统计学意义的表观遗传生物标志物。首先,我们确定那些既有表达值又有甲基化值且服从正态分布的基因。同样,我们识别那些既有表达值又有甲基化值但不服从正态分布的基因。对于每种情况,我们使用一种基因选择方法,该方法提供最大相关但可变权重的最小冗余基因作为排名靠前的基因。为了进行统计验证,我们对仅由正态分布的排名靠前的基因组成的表达数据和甲基化数据应用t检验,以确定其中有多少基因同时存在差异表达和甲基化。同样,我们使用Limma软件包对仅由非正态分布的排名靠前的基因组成的表达数据和甲基化数据进行非参数经验贝叶斯检验,以识别其中有多少基因同时存在差异表达和甲基化。我们最终报告经过生物学验证的排名靠前的显著基因标志物。此外,我们的框架提高了阳性预测率,并降低了标志物识别中的假阳性率。此外,我们基于使用几种知名分类器获得的分类性能,对我们的基因选择方法以及其他方法进行了比较分析。

相似文献

1
Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data.基于最大相关最小冗余特征选择的多组学数据表观遗传生物标志物识别
IEEE Trans Nanobioscience. 2017 Jan;16(1):3-10. doi: 10.1109/TNB.2017.2650217. Epub 2017 Jan 9.
2
A Linear Regression and Deep Learning Approach for Detecting Reliable Genetic Alterations in Cancer Using DNA Methylation and Gene Expression Data.基于 DNA 甲基化和基因表达数据的线性回归和深度学习方法在癌症中检测可靠的遗传改变。
Genes (Basel). 2020 Aug 12;11(8):931. doi: 10.3390/genes11080931.
3
High-specificity bioinformatics framework for epigenomic profiling of discordant twins reveals specific and shared markers for ACPA and ACPA-positive rheumatoid arthritis.用于不一致双胞胎表观基因组分析的高特异性生物信息学框架揭示了抗环瓜氨酸肽抗体(ACPA)及ACPA阳性类风湿性关节炎的特异性和共享标志物。
Genome Med. 2016 Nov 22;8(1):124. doi: 10.1186/s13073-016-0374-0.
4
An efficient statistical feature selection approach for classification of gene expression data.一种用于基因表达数据分类的高效统计特征选择方法。
J Biomed Inform. 2011 Aug;44(4):529-35. doi: 10.1016/j.jbi.2011.01.001. Epub 2011 Jan 15.
5
EBADIMEX: an empirical Bayes approach to detect joint differential expression and methylation and to classify samples.EBADIMEX:一种用于检测联合差异表达和甲基化以及对样本进行分类的经验贝叶斯方法。
Stat Appl Genet Mol Biol. 2019 Nov 16;18(6):/j/sagmb.2019.18.issue-6/sagmb-2018-0050/sagmb-2018-0050.xml. doi: 10.1515/sagmb-2018-0050.
6
Identifying marker genes in transcription profiling data using a mixture of feature relevance experts.使用特征相关性专家混合方法在转录谱数据中识别标记基因。
Physiol Genomics. 2001 Mar 8;5(2):99-111. doi: 10.1152/physiolgenomics.2001.5.2.99.
7
Integrative analysis of gene expression and DNA methylation using unsupervised feature extraction for detecting candidate cancer biomarkers.使用无监督特征提取对基因表达和DNA甲基化进行综合分析以检测候选癌症生物标志物。
J Bioinform Comput Biol. 2018 Apr;16(2):1850006. doi: 10.1142/S0219720018500063. Epub 2018 Feb 22.
8
Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data.基于多组学数据预测卵巢癌生存的最小冗余最大相关性多视图特征选择。
BMC Med Genomics. 2018 Sep 14;11(Suppl 3):71. doi: 10.1186/s12920-018-0388-0.
9
Multi-network approach to identify differentially methylated gene communities in cancer.多网络方法鉴定癌症中差异甲基化基因群落。
Gene. 2019 May 20;697:227-237. doi: 10.1016/j.gene.2019.02.007. Epub 2019 Feb 22.
10
MiRNA-TF-gene network analysis through ranking of biomolecules for multi-informative uterine leiomyoma dataset.通过对多信息子宫平滑肌瘤数据集的生物分子进行排名来进行miRNA-TF-基因网络分析。
J Biomed Inform. 2015 Oct;57:308-19. doi: 10.1016/j.jbi.2015.08.014. Epub 2015 Aug 19.

引用本文的文献

1
DOMSCNet: a deep learning model for the classification of stomach cancer using multi-layer omics data.DOMSCNet:一种使用多层组学数据进行胃癌分类的深度学习模型。
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf115.
2
Utility of Machine Learning Models to Predict Lymph Node Metastasis of Japanese Localized Prostate Cancer.机器学习模型预测日本局限性前列腺癌淋巴结转移的效用
Cancers (Basel). 2024 Dec 5;16(23):4073. doi: 10.3390/cancers16234073.
3
A review on advancements in feature selection and feature extraction for high-dimensional NGS data analysis.
一篇关于高通量测序数据分析中特征选择和特征提取进展的综述。
Funct Integr Genomics. 2024 Aug 19;24(5):139. doi: 10.1007/s10142-024-01415-x.
4
3PNMF-MKL: A non-negative matrix factorization-based multiple kernel learning method for multi-modal data integration and its application to gene signature detection.3PNMF-MKL:一种基于非负矩阵分解的多模态数据集成多内核学习方法及其在基因特征检测中的应用。
Front Genet. 2023 Feb 14;14:1095330. doi: 10.3389/fgene.2023.1095330. eCollection 2023.
5
DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method.使用一种新的分布式并行偏最小二乘法进行全癌早期诊断和预后的DNA甲基化位点鉴定。
Front Genet. 2022 Oct 19;13:940214. doi: 10.3389/fgene.2022.940214. eCollection 2022.
6
Comparison of five supervised feature selection algorithms leading to top features and gene signatures from multi-omics data in cancer.比较五种监督特征选择算法,这些算法可从癌症的多组学数据中得到顶级特征和基因特征。
BMC Bioinformatics. 2022 Apr 28;23(Suppl 3):153. doi: 10.1186/s12859-022-04678-y.
7
A Novel Biomarker Identification Approach for Gastric Cancer Using Gene Expression and DNA Methylation Dataset.一种利用基因表达和DNA甲基化数据集鉴定胃癌新型生物标志物的方法。
Front Genet. 2021 Mar 25;12:644378. doi: 10.3389/fgene.2021.644378. eCollection 2021.
8
Computational learning of features for automated colonic polyp classification.基于计算学习的结肠息肉自动分类特征
Sci Rep. 2021 Feb 23;11(1):4347. doi: 10.1038/s41598-021-83788-8.
9
In silico ranking of phenolics for therapeutic effectiveness on cancer stem cells.基于计算机的酚类化合物治疗癌症干细胞疗效的排名。
BMC Bioinformatics. 2020 Dec 28;21(Suppl 21):499. doi: 10.1186/s12859-020-03849-z.
10
Detecting methylation signatures in neurodegenerative disease by density-based clustering of applications with reducing noise.通过基于密度的应用程序聚类减少噪声来检测神经退行性疾病中的甲基化特征。
Sci Rep. 2020 Dec 17;10(1):22164. doi: 10.1038/s41598-020-78463-3.