通过机器学习分类器预测基因表达分数

Predicting the Gene Expression Score by a Machine Learning Classifier.

作者信息

Pawłowski Piotr H, Zielenkiewicz Piotr

机构信息

Institute of Biochemistry and Biophysics, Polish Academy of Sciences, 02-093 Warsaw, Poland.

Laboratory of Systems Biology, Institute of Experimental Plant Biology and Biotechnology, Faculty of Biology, University of Warsaw, 02-096 Warsaw, Poland.

出版信息

Life (Basel). 2025 Apr 29;15(5):723. doi: 10.3390/life15050723.

DOI:10.3390/life15050723

PMID:40430151

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12113619/

Abstract

The topic of this work is gene expression and its score according to various factors analyzed globally using machine learning techniques. The expression score (ES) of genes characterizes their activity and, thus, their importance for cellular processes. This may depend on many different factors (attributes). To find the most important classifier, a machine learning classifier (random forest) was selected, trained, and optimized on the Waikato Environment for Knowledge Analysis WEKA platform, resulting in the most accurate attribute-dependent prediction of the ES of genes. In this way, data from the Saccharomyces Genome Database (SGD), presenting ES values corresponding to a wide spectrum of attributes, were used, revised, classified, and balanced, and the significance of the considered attributes was evaluated. In this way, the novel random forest model indicates the most important attributes determining classes of low, moderate, and high ES. They cover both the experimental conditions and the genetic, physical, statistical, and logistic features. During validation, the obtained model could classify the instances of a primary unknown test set with a correctness of 84.1%.

摘要

这项工作的主题是基因表达及其根据使用机器学习技术进行全局分析的各种因素得出的分数。基因的表达分数（ES）表征了它们的活性，从而也表征了它们对细胞过程的重要性。这可能取决于许多不同的因素（属性）。为了找到最重要的分类器，选择了一种机器学习分类器（随机森林），并在怀卡托知识分析环境（WEKA）平台上进行训练和优化，从而实现了对基因ES最准确的属性依赖预测。通过这种方式，使用了来自酵母基因组数据库（SGD）的数据，这些数据呈现了对应于广泛属性的ES值，并对其进行了修订、分类和平衡，同时评估了所考虑属性的重要性。通过这种方式，新的随机森林模型指出了决定低、中、高ES类别的最重要属性。它们涵盖了实验条件以及遗传、物理、统计和逻辑特征。在验证过程中，所获得的模型能够以84.1%的正确率对一个主要未知测试集的实例进行分类。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c687/12113619/80076590b038/life-15-00723-g001.jpg

相似文献

Predicting the Gene Expression Score by a Machine Learning Classifier.

Life (Basel). 2025 Apr 29;15(5):723. doi: 10.3390/life15050723.

An Ensemble Approach for the Prediction of Diabetes Mellitus Using a Soft Voting Classifier with an Explainable AI.

Sensors (Basel). 2022 Sep 25;22(19):7268. doi: 10.3390/s22197268.

Random Forest Classifier for Zero-Shot Learning Based on Relative Attribute.

IEEE Trans Neural Netw Learn Syst. 2018 May;29(5):1662-1674. doi: 10.1109/TNNLS.2017.2677441. Epub 2017 Mar 21.

Evaluation of Machine Learning Techniques for Traffic Flow-Based Intrusion Detection.

Sensors (Basel). 2022 Nov 30;22(23):9326. doi: 10.3390/s22239326.

Development and validation of a machine learning-based predictive model for assessing the 90-day prognostic outcome of patients with spontaneous intracerebral hemorrhage.

J Transl Med. 2024 Mar 4;22(1):236. doi: 10.1186/s12967-024-04896-3.

Predictive modeling and optimization in dermatology: Machine learning for skin disease classification.

Comput Biol Med. 2025 May;189:109946. doi: 10.1016/j.compbiomed.2025.109946. Epub 2025 Mar 3.

Classifying changes in LN-18 glial cell morphology: a supervised machine learning approach to analyzing cell microscopy data via FIJI and WEKA.

Med Biol Eng Comput. 2020 Jul;58(7):1419-1430. doi: 10.1007/s11517-020-02177-x. Epub 2020 Apr 21.

Novel Biomarker Prediction for Lung Cancer Using Random Forest Classifiers.

Cancer Inform. 2023 Apr 21;22:11769351231167992. doi: 10.1177/11769351231167992. eCollection 2023.

Predicting vitamin D deficiency using optimized random forest classifier.

Clin Nutr ESPEN. 2024 Apr;60:1-10. doi: 10.1016/j.clnesp.2023.12.146. Epub 2023 Dec 28.

Machine Learning Hybrid Model for the Prediction of Chronic Kidney Disease.

Comput Intell Neurosci. 2023 Mar 14;2023:9266889. doi: 10.1155/2023/9266889. eCollection 2023.

本文引用的文献

Mechanism of transcription modulation by the transcription-repair coupling factor.

Nucleic Acids Res. 2022 Jun 10;50(10):5688-5712. doi: 10.1093/nar/gkac449.

A report on DNA sequence determinants in gene expression.

Bioinformation. 2020 May 31;16(5):422-431. doi: 10.6026/97320630016422. eCollection 2020.

Gene Expression Value Prediction Based on XGBoost Algorithm.

Front Genet. 2019 Nov 12;10:1077. doi: 10.3389/fgene.2019.01077. eCollection 2019.

Integrated TORC1 and PKA signaling control the temporal activation of glucose-induced gene expression in yeast.

Nat Commun. 2019 Aug 8;10(1):3558. doi: 10.1038/s41467-019-11540-y.

A Beginner's Guide to Analysis of RNA Sequencing Data.

Am J Respir Cell Mol Biol. 2018 Aug;59(2):145-157. doi: 10.1165/rcmb.2017-0430TR.

Ten quick tips for machine learning in computational biology.

BioData Min. 2017 Dec 8;10:35. doi: 10.1186/s13040-017-0155-3. eCollection 2017.

Nucleotides upstream of the Kozak sequence strongly influence gene expression in the yeast .

J Biol Eng. 2017 Aug 21;11:25. doi: 10.1186/s13036-017-0068-1. eCollection 2017.

Epigenetic regulation and chromatin remodeling in learning and memory.

Exp Mol Med. 2017 Jan 13;49(1):e281. doi: 10.1038/emm.2016.140.

Comparison between melanoma gene expression score and fluorescence in situ hybridization for the classification of melanocytic lesions.

Mod Pathol. 2016 Aug;29(8):832-43. doi: 10.1038/modpathol.2016.84. Epub 2016 May 13.

Comparing bioinformatic gene expression profiling methods: microarray and RNA-Seq.

Med Sci Monit Basic Res. 2014 Aug 23;20:138-42. doi: 10.12659/MSMBR.892101.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

通过机器学习分类器预测基因表达分数

Predicting the Gene Expression Score by a Machine Learning Classifier.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献