基于机器学习和 miRNA 用于食管癌的诊断。

Using Machine Learning and miRNA for the Diagnosis of Esophageal Cancer.

机构信息

REHS program, San Diego Supercomputer Center, UC San Diego, San Diego, CA, United States.

San Diego Supercomputer Center, UC San Diego, San Diego, CA, United States.

出版信息

J Appl Lab Med. 2024 Jul 1;9(4):684-695. doi: 10.1093/jalm/jfae037.

DOI:10.1093/jalm/jfae037

PMID:38721901

Abstract

BACKGROUND

Esophageal cancer (EC) remains a global health challenge, often diagnosed at advanced stages, leading to high mortality rates. Current diagnostic tools for EC are limited in their efficacy. This study aims to harness the potential of microRNAs (miRNAs) as novel, noninvasive diagnostic biomarkers for EC. Our objective was to determine the diagnostic accuracy of miRNAs, particularly in distinguishing miRNAs associated with EC from control miRNAs.

METHODS

We applied machine learning (ML) techniques in WEKA (Waikato Environment for Knowledge Analysis) and TensorFlow Keras to a dataset of miRNA sequences and gene targets, assessing the predictive power of several classifiers: naïve Bayes, multilayer perceptron, Hoeffding tree, random forest, and random tree. The data were further subjected to InfoGain feature selection to identify the most informative miRNA sequence and gene target descriptors. The ML models' abilities to distinguish between miRNA implicated in EC and control group miRNA was then tested.

RESULTS

Of the tested WEKA classifiers, the top 3 performing ones were random forest, Hoeffding tree, and naïve Bayes. The TensorFlow Keras neural network model was subsequently trained and tested, the model's predictive power was further validated using an independent dataset. The TensorFlow Keras gave an accuracy 0.91. The WEKA best algorithm (naïve Bayes) model yielded an accuracy of 0.94.

CONCLUSIONS

The results demonstrate the potential of ML-based miRNA classifiers in diagnosing EC. However, further studies are necessary to validate these findings and explore the full clinical potential of this approach.

摘要

背景

食管癌（EC）仍然是一个全球性的健康挑战，通常在晚期诊断，导致高死亡率。目前用于 EC 的诊断工具在疗效上存在局限性。本研究旨在利用 microRNAs（miRNAs）作为新型非侵入性 EC 诊断生物标志物的潜力。我们的目标是确定 miRNAs 的诊断准确性，特别是区分与 EC 相关的 miRNAs 和对照 miRNAs 的准确性。

方法

我们在 WEKA（Waikato 环境用于知识分析）和 TensorFlow Keras 中应用机器学习（ML）技术，对 miRNA 序列和基因靶标的数据集进行评估，评估了几种分类器的预测能力：朴素贝叶斯、多层感知机、Hoeffding 树、随机森林和随机树。进一步对数据进行 InfoGain 特征选择，以识别最具信息量的 miRNA 序列和基因靶标描述符。然后测试 ML 模型区分与 EC 相关的 miRNA 和对照组 miRNA 的能力。

结果

在测试的 WEKA 分类器中，表现最好的前 3 个是随机森林、Hoeffding 树和朴素贝叶斯。随后训练和测试了 TensorFlow Keras 神经网络模型，并使用独立数据集进一步验证了模型的预测能力。TensorFlow Keras 的准确率为 0.91。WEKA 最佳算法（朴素贝叶斯）模型的准确率为 0.94。