• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

生物信息学中的综合决策树模型。

Comprehensive decision tree models in bioinformatics.

机构信息

Faculty of Health Sciences, University of Maribor, Maribor, Slovenia.

出版信息

PLoS One. 2012;7(3):e33812. doi: 10.1371/journal.pone.0033812. Epub 2012 Mar 30.

DOI:10.1371/journal.pone.0033812
PMID:22479449
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3316502/
Abstract

PURPOSE

Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible.

METHODS

This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree.

RESULTS

The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree.

CONCLUSIONS

The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.

摘要

目的

分类是生物信息学中一种重要且广泛使用的机器学习技术。机器学习软件的研究人员和其他最终用户通常更愿意使用可理解的模型,这些模型可以提取知识并解释分类模型背后的推理。

方法

本文提出了对现有机器学习环境的扩展以及对决策树分类器的可视化调整的研究。这项研究的动机来自于通过所谓的一键式数据挖掘方法构建有效且易于解释的决策树模型的需求,在这种方法中不需要进行参数调整。为了避免分类偏差,在调整模型时不使用任何分类性能度量标准,该模型仅受生成的决策树的维度限制。

结果

在所提出的决策树可视化调整中,我们在包含经典机器学习问题的 40 个数据集和来自生物信息学领域的 31 个数据集上进行了评估。虽然我们预计在分类性能方面不会有显著差异,但结果表明,在视觉上调整后的决策树中,准确性显著提高,并且决策树更简单。与经典机器学习基准数据集相比,我们观察到生物信息学数据集的准确性增益更高。此外,进行了用户研究以验证这样的假设,即与手动调整决策树相比,所提出的方法的树调整时间明显更短。

结论

实验结果表明,通过构建受预定义可视化边界约束的简单模型,不仅可以实现良好的可理解性,而且还可以获得非常好的分类性能,其与通常使用经典决策树算法的默认设置构建的更复杂模型没有区别。此外,我们的研究还表明,对于具有二进制类属性和许多可能冗余属性的数据集,可视化调整后的决策树是合适的,这些数据集在生物信息学中非常常见。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9547/3316502/3aaa71a0e77c/pone.0033812.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9547/3316502/667b2c76b8c8/pone.0033812.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9547/3316502/aa5b31d61521/pone.0033812.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9547/3316502/3f81fc1f6ae4/pone.0033812.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9547/3316502/3aaa71a0e77c/pone.0033812.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9547/3316502/667b2c76b8c8/pone.0033812.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9547/3316502/aa5b31d61521/pone.0033812.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9547/3316502/3f81fc1f6ae4/pone.0033812.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9547/3316502/3aaa71a0e77c/pone.0033812.g004.jpg

相似文献

1
Comprehensive decision tree models in bioinformatics.生物信息学中的综合决策树模型。
PLoS One. 2012;7(3):e33812. doi: 10.1371/journal.pone.0033812. Epub 2012 Mar 30.
2
Multi-test decision tree and its application to microarray data classification.多测试决策树及其在微阵列数据分类中的应用。
Artif Intell Med. 2014 May;61(1):35-44. doi: 10.1016/j.artmed.2014.01.005. Epub 2014 Feb 10.
3
Decision tree and ensemble learning algorithms with their applications in bioinformatics.决策树和集成学习算法及其在生物信息学中的应用。
Adv Exp Med Biol. 2011;696:191-9. doi: 10.1007/978-1-4419-7046-6_19.
4
Examining the significance of fingerprint-based classifiers.审视基于指纹的分类器的重要性。
BMC Bioinformatics. 2008 Dec 17;9:545. doi: 10.1186/1471-2105-9-545.
5
Decision forest for classification of gene expression data.决策森林用于基因表达数据分类。
Comput Biol Med. 2010 Aug;40(8):698-704. doi: 10.1016/j.compbiomed.2010.06.004. Epub 2010 Jun 29.
6
Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm.基于粒子群优化算法的决策树模型在癌症识别中的基因选择。
BMC Bioinformatics. 2014 Feb 20;15:49. doi: 10.1186/1471-2105-15-49.
7
On optimal settings of classification tree ensembles for medical decision support.分类树集成在医学决策支持中的最优设置。
Health Informatics J. 2013 Mar;19(1):3-15. doi: 10.1177/1460458212446096.
8
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
9
Reviewing ensemble classification methods in breast cancer.综述乳腺癌中的集成分类方法。
Comput Methods Programs Biomed. 2019 Aug;177:89-112. doi: 10.1016/j.cmpb.2019.05.019. Epub 2019 May 20.
10
Kernel-imbedded Gaussian processes for disease classification using microarray gene expression data.使用微阵列基因表达数据的用于疾病分类的核嵌入高斯过程。
BMC Bioinformatics. 2007 Feb 28;8:67. doi: 10.1186/1471-2105-8-67.

引用本文的文献

1
Machine learning approaches for predicting the link of the global trade network of liquefied natural gas.用于预测液化天然气全球贸易网络关联的机器学习方法。
PLoS One. 2025 Jul 30;20(7):e0326952. doi: 10.1371/journal.pone.0326952. eCollection 2025.
2
Machine Learning Approaches for Assessing Risk Factors of Adrenal Insufficiency in Patients Undergoing Immune Checkpoint Inhibitor Therapy.用于评估接受免疫检查点抑制剂治疗患者肾上腺功能不全风险因素的机器学习方法
Pharmaceuticals (Basel). 2023 Aug 3;16(8):1097. doi: 10.3390/ph16081097.
3
Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases.

本文引用的文献

1
Development and validation of a computer-aided diagnostic tool to screen for age-related macular degeneration by optical coherence tomography.开发和验证一种基于光学相干断层扫描的计算机辅助诊断工具,用于筛查年龄相关性黄斑变性。
Br J Ophthalmol. 2012 Apr;96(4):503-7. doi: 10.1136/bjophthalmol-2011-300660. Epub 2011 Aug 26.
2
Stability of ranked gene lists in large microarray analysis studies.大型微阵列分析研究中排名基因列表的稳定性。
J Biomed Biotechnol. 2010;2010:616358. doi: 10.1155/2010/616358. Epub 2010 Jun 27.
3
Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins.
描述性森林:一种新颖的树结构泛化方法在描述心血管疾病中的实验。
BMC Med Inform Decis Mak. 2023 Jul 28;23(1):141. doi: 10.1186/s12911-023-02228-x.
4
Improved downstream functional analysis of single-cell RNA-sequence data using DGAN.基于深度生成对抗网络(DGAN)提高单细胞 RNA-seq 数据下游功能分析。
Sci Rep. 2023 Jan 28;13(1):1618. doi: 10.1038/s41598-023-28952-y.
5
Diagnosis of temporomandibular disorders using artificial intelligence technologies: A systematic review and meta-analysis.使用人工智能技术诊断颞下颌关节紊乱:系统评价和荟萃分析。
PLoS One. 2022 Aug 18;17(8):e0272715. doi: 10.1371/journal.pone.0272715. eCollection 2022.
6
Artificial Intelligence in Bariatric Surgery: Current Status and Future Perspectives.人工智能在减重手术中的应用:现状与未来展望。
Obes Surg. 2022 Aug;32(8):2772-2783. doi: 10.1007/s11695-022-06146-1. Epub 2022 Jun 17.
7
Extracting New Temporal Features to Improve the Interpretability of Undiagnosed Type 2 Diabetes Mellitus Prediction Models.提取新的时间特征以提高未诊断2型糖尿病预测模型的可解释性。
J Pers Med. 2022 Feb 28;12(3):368. doi: 10.3390/jpm12030368.
8
Unbox the black-box for the medical explainable AI via multi-modal and multi-centre data fusion: A mini-review, two showcases and beyond.通过多模态和多中心数据融合开启医学可解释人工智能的黑匣子:一篇综述、两个案例展示及其他
Inf Fusion. 2022 Jan;77:29-52. doi: 10.1016/j.inffus.2021.07.016.
9
An Adaptive Rank Aggregation-Based Ensemble Multi-Filter Feature Selection Method in Software Defect Prediction.一种基于自适应秩聚合的集成多滤波器特征选择方法在软件缺陷预测中的应用
Entropy (Basel). 2021 Sep 29;23(10):1274. doi: 10.3390/e23101274.
10
Explainable artificial intelligence models using real-world electronic health record data: a systematic scoping review.使用真实世界电子健康记录数据的可解释人工智能模型:系统范围界定综述。
J Am Med Inform Assoc. 2020 Jul 1;27(7):1173-1185. doi: 10.1093/jamia/ocaa053.
通过对大肠杆菌蛋白质整体集合进行聚集分析揭示的双峰蛋白质溶解度分布。
Proc Natl Acad Sci U S A. 2009 Mar 17;106(11):4201-6. doi: 10.1073/pnas.0811922106. Epub 2009 Feb 27.
4
Protein solubility: sequence based prediction and experimental verification.蛋白质溶解度:基于序列的预测与实验验证。
Bioinformatics. 2007 Oct 1;23(19):2536-42. doi: 10.1093/bioinformatics/btl623. Epub 2006 Dec 6.
5
Rotation forest: A new classifier ensemble method.旋转森林:一种新的分类器集成方法。
IEEE Trans Pattern Anal Mach Intell. 2006 Oct;28(10):1619-30. doi: 10.1109/TPAMI.2006.211.
6
REFOLD: an analytical database of protein refolding methods.REFOLD:蛋白质重折叠方法分析数据库。
Protein Expr Purif. 2006 Mar;46(1):166-71. doi: 10.1016/j.pep.2005.07.022. Epub 2005 Aug 15.