基于长非编码 RNA 表达的数据分析算法对 2 型糖尿病的预测：四种数据挖掘方法的比较。

Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches.

机构信息

Department of Laboratory Medicine, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

出版信息

BMC Bioinformatics. 2020 Aug 27;21(1):372. doi: 10.1186/s12859-020-03719-8.

DOI:10.1186/s12859-020-03719-8

PMID:32854616

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7451240/

Abstract

BACKGROUND

About 90% of patients who have diabetes suffer from Type 2 DM (T2DM). Many studies suggest using the significant role of lncRNAs to improve the diagnosis of T2DM. Machine learning and Data Mining techniques are tools that can improve the analysis and interpretation or extraction of knowledge from the data. These techniques may enhance the prognosis and diagnosis associated with reducing diseases such as T2DM. We applied four classification models, including K-nearest neighbor (KNN), support vector machine (SVM), logistic regression, and artificial neural networks (ANN) for diagnosing T2DM, and we compared the diagnostic power of these algorithms with each other. We performed the algorithms on six LncRNA variables (LINC00523, LINC00995, HCG27_201, TPT1-AS1, LY86-AS1, DKFZP) and demographic data.

RESULTS

To select the best performance, we considered the AUC, sensitivity, specificity, plotted the ROC curve, and showed the average curve and range. The mean AUC for the KNN algorithm was 91% with 0.09 standard deviation (SD); the mean sensitivity and specificity were 96 and 85%, respectively. After applying the SVM algorithm, the mean AUC obtained 95% after stratified 10-fold cross-validation, and the SD obtained 0.05. The mean sensitivity and specificity were 95 and 86%, respectively. The mean AUC for ANN and the SD were 93% and 0.03, also the mean sensitivity and specificity were 78 and 85%. At last, for the logistic regression algorithm, our results showed 95% of mean AUC, and the SD of 0.05, the mean sensitivity and specificity were 92 and 85%, respectively. According to the ROCs, the Logistic Regression and SVM had a better area under the curve compared to the others.

CONCLUSION

We aimed to find the best data mining approach for the prediction of T2DM using six lncRNA expression. According to the finding, the maximum AUC dedicated to SVM and logistic regression, among others, KNN and ANN also had the high mean AUC and small standard deviations of AUC scores among the approaches, KNN had the highest mean sensitivity and the highest specificity belonged to SVM. This study's result could improve our knowledge about the early detection and diagnosis of T2DM using the lncRNAs as biomarkers.

摘要

背景

约 90%的糖尿病患者患有 2 型糖尿病（T2DM）。许多研究表明，长链非编码 RNA（lncRNA）具有重要作用，可以提高 T2DM 的诊断效果。机器学习和数据挖掘技术是可以改进数据分析、解释或提取数据中知识的工具。这些技术可以改善与 T2DM 等疾病相关的预后和诊断。我们应用了四种分类模型，包括 K 近邻（KNN）、支持向量机（SVM）、逻辑回归和人工神经网络（ANN）来诊断 T2DM，并比较了这些算法的诊断能力。我们在六个 lncRNA 变量（LINC00523、LINC00995、HCG27_201、TPT1-AS1、LY86-AS1、DKFZP）和人口统计学数据上执行了这些算法。

结果

为了选择最佳性能，我们考虑了 AUC、敏感性、特异性、绘制了 ROC 曲线，并显示了平均曲线和范围。KNN 算法的平均 AUC 为 91%，标准偏差（SD）为 0.09；平均敏感性和特异性分别为 96%和 85%。应用 SVM 算法后，分层 10 倍交叉验证获得的平均 AUC 为 95%，SD 为 0.05。平均敏感性和特异性分别为 95%和 86%。ANN 的平均 AUC 和 SD 分别为 93%和 0.03，平均敏感性和特异性分别为 78%和 85%。最后，对于逻辑回归算法，我们的结果显示平均 AUC 为 95%，SD 为 0.05，平均敏感性和特异性分别为 92%和 85%。根据 ROC，逻辑回归和 SVM 的曲线下面积优于其他算法。

结论

我们旨在寻找使用六种 lncRNA 表达预测 T2DM 的最佳数据挖掘方法。根据研究结果，SVM 和逻辑回归的 AUC 最高，KNN 和 ANN 的 AUC 平均值也较高，且 AUC 评分的标准差较小，KNN 的平均敏感性最高，SVM 的特异性最高。这项研究的结果可以提高我们对使用 lncRNAs 作为生物标志物进行 T2DM 早期检测和诊断的认识。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a59/7451240/02bc0f57ca53/12859_2020_3719_Fig1_HTML.jpg

相似文献

Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches.基于长非编码 RNA 表达的数据分析算法对 2 型糖尿病的预测：四种数据挖掘方法的比较。

BMC Bioinformatics. 2020 Aug 27;21(1):372. doi: 10.1186/s12859-020-03719-8.

Long non-coding RNA LY86-AS1 and HCG27_201 expression in type 2 diabetes mellitus.长链非编码RNA LY86-AS1和HCG27_20在2型糖尿病中的表达

Mol Biol Rep. 2018 Dec;45(6):2601-2608. doi: 10.1007/s11033-018-4429-8. Epub 2018 Oct 16.

Downregulation of long non-coding RNAs LINC00523 and LINC00994 in type 2 diabetes in an Iranian cohort.伊朗队列中2型糖尿病患者长链非编码RNA LINC00523和LINC00994的下调

Mol Biol Rep. 2018 Oct;45(5):1227-1233. doi: 10.1007/s11033-018-4276-7. Epub 2018 Jul 24.

Machine Learning for the Prediction of New-Onset Diabetes Mellitus during 5-Year Follow-up in Non-Diabetic Patients with Cardiovascular Risks.机器学习用于预测有心血管风险的非糖尿病患者5年随访期间新发糖尿病

Yonsei Med J. 2019 Feb;60(2):191-199. doi: 10.3349/ymj.2019.60.2.191.

Comparison of Machine Learning Algorithms and Nomogram Construction for Diabetic Retinopathy Prediction in Type 2 Diabetes Mellitus Patients.机器学习算法与列线图构建在 2 型糖尿病患者糖尿病视网膜病变预测中的比较。

Ophthalmic Res. 2024;67(1):537-548. doi: 10.1159/000541294. Epub 2024 Sep 4.

Predicting coronary artery disease: a comparison between two data mining algorithms.预测冠状动脉疾病：两种数据挖掘算法的比较。

BMC Public Health. 2019 Apr 29;19(1):448. doi: 10.1186/s12889-019-6721-5.

Machine learning for characterizing risk of type 2 diabetes mellitus in a rural Chinese population: the Henan Rural Cohort Study.基于中国农村人群的机器学习特征分析 2 型糖尿病风险：河南农村队列研究。

Sci Rep. 2020 Mar 10;10(1):4406. doi: 10.1038/s41598-020-61123-x.

CRlncRC: a machine learning-based method for cancer-related long noncoding RNA identification using integrated features.CRlncRC：一种基于机器学习的方法，利用整合特征识别癌症相关长链非编码RNA

BMC Med Genomics. 2018 Dec 31;11(Suppl 6):120. doi: 10.1186/s12920-018-0436-9.

Application of supervised machine learning algorithms for classification and prediction of type-2 diabetes disease status in Afar regional state, Northeastern Ethiopia 2021.2021 年，埃塞俄比亚东北部阿法尔地区使用监督机器学习算法对 2 型糖尿病疾病状况进行分类和预测。

Sci Rep. 2023 May 13;13(1):7779. doi: 10.1038/s41598-023-34906-1.

Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage.利用电子病历数据构建机器学习模型的联合建模策略：以脑出血为例。

BMC Med Inform Decis Mak. 2022 Oct 25;22(1):278. doi: 10.1186/s12911-022-02018-x.

引用本文的文献

Gene expression knowledge graph for patient representation and diabetes prediction.用于患者表征和糖尿病预测的基因表达知识图谱。

J Biomed Semantics. 2025 Mar 8;16(1):2. doi: 10.1186/s13326-025-00325-6.

Multi-Omics Analysis Revealed the rSNPs Potentially Involved in T2DM Pathogenic Mechanism and Metformin Response.多组学分析揭示了 rSNP 可能参与 T2DM 发病机制和二甲双胍反应。

Int J Mol Sci. 2024 Aug 27;25(17):9297. doi: 10.3390/ijms25179297.

Detection of Diabetes through Microarray Genes with Enhancement of Classifiers Performance.通过微阵列基因检测糖尿病并提高分类器性能。

Diagnostics (Basel). 2023 Aug 11;13(16):2654. doi: 10.3390/diagnostics13162654.

Automated Type 2 Diabetes Case and Control Identification from the MIMIC-IV Database.从MIMIC-IV数据库中自动识别2型糖尿病病例与对照

AMIA Jt Summits Transl Sci Proc. 2023 Jun 16;2023:602-611. eCollection 2023.

Prediction of Weight Loss to Decrease the Risk for Type 2 Diabetes Using Multidimensional Data in Filipino Americans: Secondary Analysis.利用多维数据预测菲律宾裔美国人的体重减轻以降低2型糖尿病风险：二次分析

JMIR Diabetes. 2023 Apr 11;8:e44018. doi: 10.2196/44018.

LncRNA PTGS2 regulates islet β-cell function through the miR-146a-5p/RBP4 axis and its diagnostic value in type 2 diabetes mellitus.长链非编码RNA PTGS2通过miR-146a-5p/RBP4轴调节胰岛β细胞功能及其在2型糖尿病中的诊断价值。

Am J Transl Res. 2021 Oct 15;13(10):11316-11328. eCollection 2021.

Study of TCM Syndrome Identification Modes for Patients with Type 2 Diabetes Mellitus Based on Data Mining.基于数据挖掘的2型糖尿病患者中医辨证模式研究

Evid Based Complement Alternat Med. 2021 Sep 6;2021:5528550. doi: 10.1155/2021/5528550. eCollection 2021.

Developing an Individual Glucose Prediction Model Using Recurrent Neural Network.使用递归神经网络开发个体血糖预测模型。

Sensors (Basel). 2020 Nov 12;20(22):6460. doi: 10.3390/s20226460.

本文引用的文献

Epidemiology of childhood overweight, obesity and their related factors in a sample of preschool children from Central Iran.伊朗中部学龄前儿童超重、肥胖及其相关因素的流行病学研究。

BMC Pediatr. 2019 May 20;19(1):159. doi: 10.1186/s12887-019-1540-5.

Predicting Diabetes Mellitus With Machine Learning Techniques.运用机器学习技术预测糖尿病

Front Genet. 2018 Nov 6;9:515. doi: 10.3389/fgene.2018.00515. eCollection 2018.

Long non-coding RNA LY86-AS1 and HCG27_201 expression in type 2 diabetes mellitus.长链非编码RNA LY86-AS1和HCG27_20在2型糖尿病中的表达

Mol Biol Rep. 2018 Dec;45(6):2601-2608. doi: 10.1007/s11033-018-4429-8. Epub 2018 Oct 16.

A group of long noncoding RNAs identified by data mining can predict the prognosis of lung adenocarcinoma.一组通过数据挖掘鉴定的长链非编码 RNA 可以预测肺腺癌的预后。

Cancer Sci. 2018 Dec;109(12):4033-4044. doi: 10.1111/cas.13822. Epub 2018 Nov 4.

Data mining of the cancer-related lncRNAs GO terms and KEGG pathways by using mRMR method.采用 mRMR 方法对癌症相关 lncRNAs 的 GO 术语和 KEGG 通路进行数据挖掘。

Math Biosci. 2018 Oct;304:1-8. doi: 10.1016/j.mbs.2018.08.001. Epub 2018 Aug 4.

Downregulation of long non-coding RNAs LINC00523 and LINC00994 in type 2 diabetes in an Iranian cohort.伊朗队列中2型糖尿病患者长链非编码RNA LINC00523和LINC00994的下调

Mol Biol Rep. 2018 Oct;45(5):1227-1233. doi: 10.1007/s11033-018-4276-7. Epub 2018 Jul 24.

Definition, Classification and Diagnosis of Diabetes Mellitus.糖尿病的定义、分类及诊断

Exp Clin Endocrinol Diabetes. 2018 Jul;126(7):406-410. doi: 10.1055/a-0584-6223. Epub 2018 Jul 5.

Precision of manual two-dimensional segmentations of lung and liver metastases and its impact on tumour response assessment using RECIST 1.1.肺和肝转移瘤手动二维分割的精确性及其对使用RECIST 1.1进行肿瘤反应评估的影响。

Eur Radiol Exp. 2017;1(1):16. doi: 10.1186/s41747-017-0015-4. Epub 2017 Oct 30.

Presenting an evaluation model of the trauma registry software.呈现创伤登记软件的评估模型。

Int J Med Inform. 2018 Apr;112:99-103. doi: 10.1016/j.ijmedinf.2018.01.013. Epub 2018 Jan 31.

Long Noncoding RNAs as Diagnostic and Therapeutic Targets in Type 2 Diabetes and Related Complications.长链非编码RNA作为2型糖尿病及相关并发症的诊断和治疗靶点

Genes (Basel). 2017 Aug 22;8(8):207. doi: 10.3390/genes8080207.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

基于长非编码 RNA 表达的数据分析算法对 2 型糖尿病的预测：四种数据挖掘方法的比较。

Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches.

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSION

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献