• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MS分类器:基于中位数补充模型的自动知识发现分类工具。

MSclassifier: median-supplement model-based classification tool for automated knowledge discovery.

作者信息

Adabor Emmanuel S, Acquaah-Mensah George K, Mazandu Gaston K

机构信息

School of Technology, Ghana Institute of Management and Public Administration, Accra, Ghana.

Pharmaceutical Sciences Department, Massachusetts College of Pharmacy and Health Sciences, Worcester, MA, USA.

出版信息

F1000Res. 2020 Sep 10;9:1114. doi: 10.12688/f1000research.25501.1. eCollection 2020.

DOI:10.12688/f1000research.25501.1
PMID:33456763
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7788522/
Abstract

High-throughput technologies have resulted in an exponential growth of publicly available and accessible datasets for biomedical research. Efficient computational models, algorithms and tools are required to exploit the datasets for knowledge discovery to aid medical decisions. Here, we introduce a new tool, MSclassifier, based on median-supplement approaches to machine learning to enable an automated and effective binary classification for optimal decision making. The MSclassifier package estimates medians of features (attributes) to deduce supplementary data, which is subsequently introduced into the training set for balancing and building superior models for classification. To test our approach, it is used to determine HER2 receptor expression status phenotypes in breast cancer and also predict protein subcellular localization (plasma membrane and nucleus). Using independent sample and cross-validation tests, the performance of MSclassifier is evaluated and compared with well established tools that could perform such tasks. In the HER2 receptor expression status phenotype identification tasks, MSclassifier achieved statistically significant higher classification rates than the best performing existing tool (90.30% versus 89.83%, p=8.62e-3). In the subcellular localization prediction tasks, MSclassifier and one other existing tool achieved equally high performances (93.42% versus 93.19%, p=0.06) although they both outperformed tools based on Naive Bayes classifiers. Overall, the application and evaluation of MSclassifier reveal its potential to be applied to varieties of binary classification problems. The MSclassifier package provides an R-portable and user-friendly application to a broad audience, enabling experienced end-users as well as non-programmers to perform an effective classification in biomedical and other fields of study.

摘要

高通量技术使得生物医学研究中可公开获取的数据集呈指数级增长。需要高效的计算模型、算法和工具来利用这些数据集进行知识发现,以辅助医疗决策。在此,我们介绍一种新工具MSclassifier,它基于机器学习的中位数补充方法,能够实现自动化且有效的二元分类,以做出最优决策。MSclassifier软件包估计特征(属性)的中位数以推导补充数据,随后将其引入训练集以进行平衡并构建用于分类的优质模型。为了测试我们的方法,将其用于确定乳腺癌中HER2受体表达状态表型,还用于预测蛋白质亚细胞定位(质膜和细胞核)。通过独立样本和交叉验证测试,对MSclassifier的性能进行评估,并与能够执行此类任务的成熟工具进行比较。在HER2受体表达状态表型识别任务中,MSclassifier实现的分类率在统计学上显著高于表现最佳的现有工具(90.30%对89.83%,p = 8.62e - 3)。在亚细胞定位预测任务中,MSclassifier和另一种现有工具实现了同样高的性能(93.42%对93.19%,p = 0.06),尽管它们都优于基于朴素贝叶斯分类器的工具。总体而言,MSclassifier的应用和评估揭示了其应用于各种二元分类问题的潜力。MSclassifier软件包为广大用户提供了一个R可移植且用户友好的应用程序,使有经验的终端用户以及非程序员能够在生物医学和其他研究领域进行有效的分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9682/7788522/6ccf1aef677d/f1000research-9-28142-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9682/7788522/f6af26dee974/f1000research-9-28142-g0000.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9682/7788522/6ccf1aef677d/f1000research-9-28142-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9682/7788522/f6af26dee974/f1000research-9-28142-g0000.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9682/7788522/6ccf1aef677d/f1000research-9-28142-g0001.jpg

相似文献

1
MSclassifier: median-supplement model-based classification tool for automated knowledge discovery.MS分类器:基于中位数补充模型的自动知识发现分类工具。
F1000Res. 2020 Sep 10;9:1114. doi: 10.12688/f1000research.25501.1. eCollection 2020.
2
Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers.机器学习算法在(放化疗)治疗结果预测中的应用:分类器的实证比较。
Med Phys. 2018 Jul;45(7):3449-3459. doi: 10.1002/mp.12967. Epub 2018 Jun 13.
3
Machine learning approaches to decipher hormone and HER2 receptor status phenotypes in breast cancer.机器学习方法解析乳腺癌中的激素和 HER2 受体状态表型。
Brief Bioinform. 2019 Mar 22;20(2):504-514. doi: 10.1093/bib/bbx138.
4
Improving the Mann-Whitney statistical test for feature selection: an approach in breast cancer diagnosis on mammography.改进用于特征选择的曼-惠特尼统计检验:一种乳腺钼靶摄影乳腺癌诊断方法
Artif Intell Med. 2015 Jan;63(1):19-31. doi: 10.1016/j.artmed.2014.12.004. Epub 2014 Dec 12.
5
iProps: A Comprehensive Software Tool for Protein Classification and Analysis With Automatic Machine Learning Capabilities and Model Interpretation Capabilities.iProps:一款具有自动机器学习能力和模型解释能力的蛋白质分类与分析综合软件工具。
IEEE J Biomed Health Inform. 2024 Oct;28(10):6237-6247. doi: 10.1109/JBHI.2024.3425716. Epub 2024 Oct 3.
6
Comparison of Bayes Classifiers for Breast Cancer Classification.用于乳腺癌分类的贝叶斯分类器比较
Asian Pac J Cancer Prev. 2018 Oct 26;19(10):2917-2920. doi: 10.22034/APJCP.2018.19.10.2917.
7
Evaluation of Machine Learning Techniques for Traffic Flow-Based Intrusion Detection.基于流量的入侵检测的机器学习技术评估。
Sensors (Basel). 2022 Nov 30;22(23):9326. doi: 10.3390/s22239326.
8
Machine Learning Based Identification of Microseismic Signals Using Characteristic Parameters.基于特征参数的微震信号机器学习识别。
Sensors (Basel). 2021 Oct 20;21(21):6967. doi: 10.3390/s21216967.
9
Survival prediction models: an introduction to discrete-time modeling.生存预测模型:离散时间建模简介。
BMC Med Res Methodol. 2022 Jul 26;22(1):207. doi: 10.1186/s12874-022-01679-6.
10
Noisecut: a python package for noise-tolerant classification of binary data using prior knowledge integration and max-cut solutions.噪声裁剪:一种使用先验知识集成和最大割解决方案实现二进制数据噪声容忍分类的 Python 包。
BMC Bioinformatics. 2024 Apr 20;25(1):155. doi: 10.1186/s12859-024-05769-8.

引用本文的文献

1
BamClassifier: a machine learning method for assessing iron deficiency.Bam分类器:一种评估缺铁情况的机器学习方法。
Sci Rep. 2025 Sep 2;15(1):32264. doi: 10.1038/s41598-025-92892-y.

本文引用的文献

1
Machine learning approaches to decipher hormone and HER2 receptor status phenotypes in breast cancer.机器学习方法解析乳腺癌中的激素和 HER2 受体状态表型。
Brief Bioinform. 2019 Mar 22;20(2):504-514. doi: 10.1093/bib/bbx138.
2
Classification of breast cancer patients using somatic mutation profiles and machine learning approaches.利用体细胞突变谱和机器学习方法对乳腺癌患者进行分类。
BMC Syst Biol. 2016 Aug 26;10 Suppl 3(Suppl 3):62. doi: 10.1186/s12918-016-0306-z.
3
An approach for deciphering patient-specific variations with application to breast cancer molecular expression profiles.
一种用于解读患者特异性变异并应用于乳腺癌分子表达谱的方法。
J Biomed Inform. 2016 Oct;63:120-130. doi: 10.1016/j.jbi.2016.07.022. Epub 2016 Jul 28.
4
A Gene Regulatory Program in Human Breast Cancer.人类乳腺癌中的一种基因调控程序。
Genetics. 2015 Dec;201(4):1341-8. doi: 10.1534/genetics.115.180125. Epub 2015 Oct 28.
5
SAGA: a hybrid search algorithm for Bayesian Network structure learning of transcriptional regulatory networks.SAGA:一种用于转录调控网络贝叶斯网络结构学习的混合搜索算法。
J Biomed Inform. 2015 Feb;53:27-35. doi: 10.1016/j.jbi.2014.08.010. Epub 2014 Aug 30.
6
Breast cancer subtypes based on ER/PR and Her2 expression: comparison of clinicopathologic features and survival.基于雌激素受体(ER)/孕激素受体(PR)和人表皮生长因子受体2(Her2)表达的乳腺癌亚型:临床病理特征与生存情况比较
Clin Med Res. 2009 Jun;7(1-2):4-13. doi: 10.3121/cmr.2009.825.
7
Commercialized multigene predictors of clinical outcome for breast cancer.用于乳腺癌临床结局的商业化多基因预测指标
Oncologist. 2008 May;13(5):477-93. doi: 10.1634/theoncologist.2007-0248.
8
Predicting the subcellular localization of human proteins using machine learning and exploratory data analysis.使用机器学习和探索性数据分析预测人类蛋白质的亚细胞定位。
Genomics Proteomics Bioinformatics. 2006 May;4(2):120-33. doi: 10.1016/S1672-0229(06)60023-5.
9
Boosting for tumor classification with gene expression data.利用基因表达数据进行肿瘤分类的提升算法
Bioinformatics. 2003 Jun 12;19(9):1061-9. doi: 10.1093/bioinformatics/btf867.
10
Support vector machine approach for protein subcellular localization prediction.用于蛋白质亚细胞定位预测的支持向量机方法
Bioinformatics. 2001 Aug;17(8):721-8. doi: 10.1093/bioinformatics/17.8.721.