Suppr超能文献

基于长非编码 RNA 表达的数据分析算法对 2 型糖尿病的预测:四种数据挖掘方法的比较。

Type2 diabetes mellitus prediction using data mining algorithms based on the long-noncoding RNAs expression: a comparison of four data mining approaches.

机构信息

Department of Laboratory Medicine, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran.

出版信息

BMC Bioinformatics. 2020 Aug 27;21(1):372. doi: 10.1186/s12859-020-03719-8.

Abstract

BACKGROUND

About 90% of patients who have diabetes suffer from Type 2 DM (T2DM). Many studies suggest using the significant role of lncRNAs to improve the diagnosis of T2DM. Machine learning and Data Mining techniques are tools that can improve the analysis and interpretation or extraction of knowledge from the data. These techniques may enhance the prognosis and diagnosis associated with reducing diseases such as T2DM. We applied four classification models, including K-nearest neighbor (KNN), support vector machine (SVM), logistic regression, and artificial neural networks (ANN) for diagnosing T2DM, and we compared the diagnostic power of these algorithms with each other. We performed the algorithms on six LncRNA variables (LINC00523, LINC00995, HCG27_201, TPT1-AS1, LY86-AS1, DKFZP) and demographic data.

RESULTS

To select the best performance, we considered the AUC, sensitivity, specificity, plotted the ROC curve, and showed the average curve and range. The mean AUC for the KNN algorithm was 91% with 0.09 standard deviation (SD); the mean sensitivity and specificity were 96 and 85%, respectively. After applying the SVM algorithm, the mean AUC obtained 95% after stratified 10-fold cross-validation, and the SD obtained 0.05. The mean sensitivity and specificity were 95 and 86%, respectively. The mean AUC for ANN and the SD were 93% and 0.03, also the mean sensitivity and specificity were 78 and 85%. At last, for the logistic regression algorithm, our results showed 95% of mean AUC, and the SD of 0.05, the mean sensitivity and specificity were 92 and 85%, respectively. According to the ROCs, the Logistic Regression and SVM had a better area under the curve compared to the others.

CONCLUSION

We aimed to find the best data mining approach for the prediction of T2DM using six lncRNA expression. According to the finding, the maximum AUC dedicated to SVM and logistic regression, among others, KNN and ANN also had the high mean AUC and small standard deviations of AUC scores among the approaches, KNN had the highest mean sensitivity and the highest specificity belonged to SVM. This study's result could improve our knowledge about the early detection and diagnosis of T2DM using the lncRNAs as biomarkers.

摘要

背景

约 90%的糖尿病患者患有 2 型糖尿病(T2DM)。许多研究表明,长链非编码 RNA(lncRNA)具有重要作用,可以提高 T2DM 的诊断效果。机器学习和数据挖掘技术是可以改进数据分析、解释或提取数据中知识的工具。这些技术可以改善与 T2DM 等疾病相关的预后和诊断。我们应用了四种分类模型,包括 K 近邻(KNN)、支持向量机(SVM)、逻辑回归和人工神经网络(ANN)来诊断 T2DM,并比较了这些算法的诊断能力。我们在六个 lncRNA 变量(LINC00523、LINC00995、HCG27_201、TPT1-AS1、LY86-AS1、DKFZP)和人口统计学数据上执行了这些算法。

结果

为了选择最佳性能,我们考虑了 AUC、敏感性、特异性、绘制了 ROC 曲线,并显示了平均曲线和范围。KNN 算法的平均 AUC 为 91%,标准偏差(SD)为 0.09;平均敏感性和特异性分别为 96%和 85%。应用 SVM 算法后,分层 10 倍交叉验证获得的平均 AUC 为 95%,SD 为 0.05。平均敏感性和特异性分别为 95%和 86%。ANN 的平均 AUC 和 SD 分别为 93%和 0.03,平均敏感性和特异性分别为 78%和 85%。最后,对于逻辑回归算法,我们的结果显示平均 AUC 为 95%,SD 为 0.05,平均敏感性和特异性分别为 92%和 85%。根据 ROC,逻辑回归和 SVM 的曲线下面积优于其他算法。

结论

我们旨在寻找使用六种 lncRNA 表达预测 T2DM 的最佳数据挖掘方法。根据研究结果,SVM 和逻辑回归的 AUC 最高,KNN 和 ANN 的 AUC 平均值也较高,且 AUC 评分的标准差较小,KNN 的平均敏感性最高,SVM 的特异性最高。这项研究的结果可以提高我们对使用 lncRNAs 作为生物标志物进行 T2DM 早期检测和诊断的认识。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7a59/7451240/02bc0f57ca53/12859_2020_3719_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验