Suppr超能文献

寻找多种生物标志物组合以诊断口腔鳞状细胞癌——一种数据挖掘方法。

Finding the combination of multiple biomarkers to diagnose oral squamous cell carcinoma - A data mining approach.

作者信息

da Costa Nattane Luíza, de Sá Alves Mariana, de Sá Rodrigues Nayara, Bandeira Celso Muller, Oliveira Alves Mônica Ghislaine, Mendes Maria Anita, Cesar Alves Levy Anderson, Almeida Janete Dias, Barbosa Rommel

机构信息

Informatics Nucleo, Goiano Federal Institute of Education, Science and Technology, Campus Urutaí, Urutaí-GO, Brazil.

Department of Biosciences and Oral Diagnosis, Institute of Science and Technology, São Paulo State University (Unesp), São José dos Campos, Brazil.

出版信息

Comput Biol Med. 2022 Apr;143:105296. doi: 10.1016/j.compbiomed.2022.105296. Epub 2022 Feb 6.

Abstract

Data mining has proven to be a reliable method to analyze and discover useful knowledge about various diseases, including cancer research. In particular, data mining and machine learning algorithms to study oral squamous cell carcinoma (OSCC), the most common form of oral cancer, is a new area of research. This malignant neoplasm can be studied using saliva samples. Saliva is an important biofluid that must be used to verify potential biomarkers associated with oral cancer. In this study, first, we provide an overview of OSSC diagnoses based on machine learning and salivary metabolites. To our knowledge, this is the first study to apply advanced data mining techniques to diagnose OSCC. Then, we give new results of classification and feature selection algorithms used to identify potential salivary biomarkers of OSCC. To accomplish this task, we used the filter feature selection random forest importance algorithm and a wrapper methodology to evaluate the importance of metabolites obtained from gas chromatography mass-spectrometry (GC-MS) in the context of differentiation of OSCC and the control group. Salivary samples (n = 68) were collected for the control group, and the OSCC group were from patients matched for gender, age, and smoking habit. The classification process occurred based on Random Forest (RF) classification algorithm along with 10-cross validation. The results showed that glucuronic acid, maleic acid, and batyl alcohol can classify the samples with an area under the curve (AUC) of 0.91 versus an AUC of 0.76 using all 51 metabolites analyzed. The methodology used in this study can assist healthcare professionals and be adopted to discover diagnostic biomarkers for other diseases.

摘要

数据挖掘已被证明是一种可靠的方法,可用于分析和发现有关各种疾病的有用知识,包括癌症研究。特别是,利用数据挖掘和机器学习算法来研究口腔鳞状细胞癌(OSCC),这种最常见的口腔癌形式,是一个新的研究领域。这种恶性肿瘤可以通过唾液样本进行研究。唾液是一种重要的生物流体,必须用于验证与口腔癌相关的潜在生物标志物。在本研究中,首先,我们基于机器学习和唾液代谢物对口腔鳞状细胞癌的诊断进行了概述。据我们所知,这是第一项应用先进数据挖掘技术诊断口腔鳞状细胞癌的研究。然后,我们给出了用于识别口腔鳞状细胞癌潜在唾液生物标志物的分类和特征选择算法的新结果。为了完成这项任务,我们使用了过滤特征选择随机森林重要性算法和一种包装方法,以评估在区分口腔鳞状细胞癌和对照组的背景下,从气相色谱 - 质谱(GC - MS)获得的代谢物的重要性。对照组收集了68份唾液样本,口腔鳞状细胞癌组的样本来自性别、年龄和吸烟习惯相匹配的患者。分类过程基于随机森林(RF)分类算法以及10折交叉验证进行。结果表明,与使用分析的所有51种代谢物时曲线下面积(AUC)为0.76相比,葡萄糖醛酸、马来酸和鲨肝醇能够以AUC为0.91对样本进行分类。本研究中使用的方法可以帮助医疗保健专业人员,并可用于发现其他疾病的诊断生物标志物。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验