利用领域信息重构生物预测。

Leveraging domain information to restructure biological prediction.

机构信息

Department of Computer and Information Science, University of Mississippi, USA.

出版信息

BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S22. doi: 10.1186/1471-2105-12-S10-S22.

DOI:10.1186/1471-2105-12-S10-S22

PMID:22166097

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3236845/

Abstract

BACKGROUND

It is commonly believed that including domain knowledge in a prediction model is desirable. However, representing and incorporating domain information in the learning process is, in general, a challenging problem. In this research, we consider domain information encoded by discrete or categorical attributes. A discrete or categorical attribute provides a natural partition of the problem domain, and hence divides the original problem into several non-overlapping sub-problems. In this sense, the domain information is useful if the partition simplifies the learning task. The goal of this research is to develop an algorithm to identify discrete or categorical attributes that maximally simplify the learning task.

RESULTS

We consider restructuring a supervised learning problem via a partition of the problem space using a discrete or categorical attribute. A naive approach exhaustively searches all the possible restructured problems. It is computationally prohibitive when the number of discrete or categorical attributes is large. We propose a metric to rank attributes according to their potential to reduce the uncertainty of a classification task. It is quantified as a conditional entropy achieved using a set of optimal classifiers, each of which is built for a sub-problem defined by the attribute under consideration. To avoid high computational cost, we approximate the solution by the expected minimum conditional entropy with respect to random projections. This approach is tested on three artificial data sets, three cheminformatics data sets, and two leukemia gene expression data sets. Empirical results demonstrate that our method is capable of selecting a proper discrete or categorical attribute to simplify the problem, i.e., the performance of the classifier built for the restructured problem always beats that of the original problem.

CONCLUSIONS

The proposed conditional entropy based metric is effective in identifying good partitions of a classification problem, hence enhancing the prediction performance.

摘要

背景

人们普遍认为，在预测模型中纳入领域知识是可取的。然而，在学习过程中表示和结合领域信息通常是一个具有挑战性的问题。在这项研究中，我们考虑由离散或分类属性编码的领域信息。离散或分类属性提供了问题域的自然划分，从而将原始问题划分为几个不重叠的子问题。从这个意义上说，如果划分简化了学习任务，那么领域信息是有用的。本研究的目的是开发一种算法，以识别最大程度简化学习任务的离散或分类属性。

结果

我们考虑通过使用离散或分类属性对问题空间进行分区来重新构建监督学习问题。一种简单的方法是通过穷举搜索所有可能的重构问题。当离散或分类属性的数量很大时，这种方法在计算上是不可行的。我们提出了一种根据属性减少分类任务不确定性的潜力对属性进行排序的度量标准。它被量化为使用一组最优分类器实现的条件熵，每个分类器都是为考虑中的属性定义的子问题构建的。为了避免高计算成本，我们通过随机投影的期望最小条件熵来近似求解。该方法在三个人工数据集、三个化学信息学数据集和两个白血病基因表达数据集上进行了测试。实验结果表明，我们的方法能够选择合适的离散或分类属性来简化问题，即构建的重构问题分类器的性能始终优于原始问题的性能。

结论

所提出的基于条件熵的度量标准在识别分类问题的良好分区方面是有效的，从而提高了预测性能。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9885/3236845/794eb60cb4bd/1471-2105-12-S10-S22-1.jpg

相似文献

Leveraging domain information to restructure biological prediction.利用领域信息重构生物预测。

BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S22. doi: 10.1186/1471-2105-12-S10-S22.

R-Ensembler: A greedy rough set based ensemble attribute selection algorithm with kNN imputation for classification of medical data.R-Ensembler：一种基于粗糙集的贪婪集成属性选择算法，具有 kNN 插补功能，用于医学数据的分类。

Comput Methods Programs Biomed. 2020 Feb;184:105122. doi: 10.1016/j.cmpb.2019.105122. Epub 2019 Oct 8.

Rough set based information theoretic approach for clustering uncertain categorical data.基于粗糙集的信息论聚类不确定分类数据方法。

PLoS One. 2022 May 13;17(5):e0265190. doi: 10.1371/journal.pone.0265190. eCollection 2022.

Information-theoretic semi-supervised metric learning via entropy regularization.通过熵正则化的信息论半监督度量学习

Neural Comput. 2014 Aug;26(8):1717-62. doi: 10.1162/NECO_a_00614. Epub 2014 May 30.

An Attribute Reduction Method Using Neighborhood Entropy Measures in Neighborhood Rough Sets.一种基于邻域粗糙集邻域熵测度的属性约简方法。

Entropy (Basel). 2019 Feb 7;21(2):155. doi: 10.3390/e21020155.

A conditional entropy minimization criterion for dimensionality reduction and multiple kernel learning.一种用于降维和多核学习的条件熵最小化准则。

Neural Comput. 2010 Nov;22(11):2887-923. doi: 10.1162/NECO_a_00027.

A Framework for Optimal Attribute Evaluation and Selection in Hesitant Fuzzy Environment Based on Enhanced Ordered Weighted Entropy Approach for Medical Dataset.基于增强有序加权熵方法的医学数据集犹豫模糊环境下最优属性评估与选择框架

J Biomed Phys Eng. 2019 Jun 1;9(3):327-334. doi: 10.31661/jbpe.v0i0.1033. eCollection 2019 Jun.

A Unified Entropy-Based Distance Metric for Ordinal-and-Nominal-Attribute Data Clustering.一种用于有序和标称属性数据聚类的基于统一熵的距离度量。

IEEE Trans Neural Netw Learn Syst. 2020 Jan;31(1):39-52. doi: 10.1109/TNNLS.2019.2899381. Epub 2019 Mar 19.

A Neighborhood Rough Sets-Based Attribute Reduction Method Using Lebesgue and Entropy Measures.一种基于邻域粗糙集的使用勒贝格测度和熵测度的属性约简方法。

Entropy (Basel). 2019 Feb 1;21(2):138. doi: 10.3390/e21020138.

Coupled attribute similarity learning on categorical data.基于类别数据的耦合属性相似性学习。

IEEE Trans Neural Netw Learn Syst. 2015 Apr;26(4):781-97. doi: 10.1109/TNNLS.2014.2325872.

引用本文的文献

Proceedings of the 2012 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) conference. Introduction.2012年中南计算生物学与生物信息学学会（MCBIOS）会议论文集。引言。

BMC Bioinformatics. 2012;13 Suppl 15(Suppl 15):S1. doi: 10.1186/1471-2105-13-S15-S1. Epub 2012 Sep 11.

Proceedings of the 2011 MidSouth Computational Biology and Bioinformatics Society (MCBIOS) conference. Introduction.2011年中南计算生物学与生物信息学学会（MCBIOS）会议论文集。引言。

BMC Bioinformatics. 2011 Oct 18;12 Suppl 10(Suppl 10):S1. doi: 10.1186/1471-2105-12-S10-S1.

本文引用的文献

Prior knowledge based mining functional modules from Yeast PPI networks with gene ontology.基于先验知识的酵母蛋白质相互作用网络中基因本体功能模块的挖掘。

BMC Bioinformatics. 2010 Dec 14;11 Suppl 11(Suppl 11):S3. doi: 10.1186/1471-2105-11-S11-S3.

Dealing with sparse data in predicting outcomes of HIV combination therapies.处理 HIV 联合疗法结果预测中的稀疏数据。

Bioinformatics. 2010 Sep 1;26(17):2085-92. doi: 10.1093/bioinformatics/btq361. Epub 2010 Jul 12.

Knowledge-based data analysis comes of age.基于知识的数据分析已经成熟。

Brief Bioinform. 2010 Jan;11(1):30-9. doi: 10.1093/bib/bbp044. Epub 2009 Oct 23.

Knowledge-based variable selection for learning rules from proteomic data.基于知识的变量选择，用于从蛋白质组学数据中学习规则。

BMC Bioinformatics. 2009 Sep 17;10 Suppl 9(Suppl 9):S16. doi: 10.1186/1471-2105-10-S9-S16.

A boosting approach to structure learning of graphs with and without prior knowledge.基于提升方法的有向和无向图结构学习

Bioinformatics. 2009 Nov 15;25(22):2929-36. doi: 10.1093/bioinformatics/btp485. Epub 2009 Aug 20.

A hypergraph-based learning algorithm for classifying gene expression and arrayCGH data with prior knowledge.基于超图的学习算法，用于对具有先验知识的基因表达和 arrayCGH 数据进行分类。

Bioinformatics. 2009 Nov 1;25(21):2831-8. doi: 10.1093/bioinformatics/btp467. Epub 2009 Jul 30.

Predicting functionality of protein-DNA interactions by integrating diverse evidence.通过整合多种证据预测蛋白质- DNA相互作用的功能

Bioinformatics. 2009 Jun 15;25(12):i137-44. doi: 10.1093/bioinformatics/btp213.

Advantages of predicted phenotypes and statistical learning models in inferring virological response to antiretroviral therapy from HIV genotype.预测表型和统计学习模型在从HIV基因型推断抗逆转录病毒治疗的病毒学反应中的优势。

Antivir Ther. 2009;14(2):273-83.

Integrating shotgun proteomics and mRNA expression data to improve protein identification.整合鸟枪法蛋白质组学和mRNA表达数据以改进蛋白质鉴定。

Bioinformatics. 2009 Jun 1;25(11):1397-403. doi: 10.1093/bioinformatics/btp168. Epub 2009 Mar 24.

Identifying functional modules using expression profiles and confidence-scored protein interactions.利用表达谱和可信度评分的蛋白质相互作用来识别功能模块。

Bioinformatics. 2009 May 1;25(9):1158-64. doi: 10.1093/bioinformatics/btp118. Epub 2009 Mar 17.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用领域信息重构生物预测。

Leveraging domain information to restructure biological prediction.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献