一种基于核的新方法，用于处理任意长度的符号数据，并应用于 2 型糖尿病风险。

A novel kernel based approach to arbitrary length symbolic data with application to type 2 diabetes risk.

机构信息

Department of Computer Science, School of Science and Technology, Middlesex University, London, NW4 4BT, UK.

Centre for Vision Speech and Signal Processing Alan Turing Building (BB), University of Surrey, Guildford, Surrey, GU2 7XH, UK.

出版信息

Sci Rep. 2022 Mar 23;12(1):4985. doi: 10.1038/s41598-022-08757-1.

DOI:10.1038/s41598-022-08757-1

PMID:35322076

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8943170/

Abstract

Predictive modeling of clinical data is fraught with challenges arising from the manner in which events are recorded. Patients typically fall ill at irregular intervals and experience dissimilar intervention trajectories. This results in irregularly sampled and uneven length data which poses a problem for standard multivariate tools. The alternative of feature extraction into equal-length vectors via methods like Bag-of-Words (BoW) potentially discards useful information. We propose an approach based on a kernel framework in which data is maintained in its native form: discrete sequences of symbols. Kernel functions derived from the edit distance between pairs of sequences may then be utilized in conjunction with support vector machines to classify the data. Our method is evaluated in the context of the prediction task of determining patients likely to develop type 2 diabetes following an earlier episode of elevated blood pressure of 130/80 mmHg. Kernels combined via multi kernel learning achieved an F1-score of 0.96, outperforming classification with SVM 0.63, logistic regression 0.63, Long Short Term Memory 0.61 and Multi-Layer Perceptron 0.54 applied to a BoW representation of the data. We achieved an F1-score of 0.97 on MKL on external dataset. The proposed approach is consequently able to overcome limitations associated with feature-based classification in the context of clinical data.

摘要

临床数据的预测建模充满了挑战，这些挑战源于事件记录的方式。患者通常会不定期生病，并经历不同的干预轨迹。这导致数据采样不规则且长度不均，这对标准多元工具构成了问题。通过类似于词袋 (BoW) 的方法将特征提取到等长向量的替代方法可能会丢弃有用的信息。我们提出了一种基于核框架的方法，其中数据以其原始形式（符号的离散序列）保留。然后，可以使用来自序列对之间编辑距离的核函数与支持向量机结合使用来对数据进行分类。我们的方法在预测任务中进行了评估，该任务是确定在先前出现 130/80mmHg 的高血压事件后可能发展为 2 型糖尿病的患者。通过多核学习组合的核达到了 0.96 的 F1 分数，优于 SVM 0.63、逻辑回归 0.63、长短期记忆 0.61 和多层感知机 0.54 在数据的 BoW 表示形式上的分类。我们在外部数据集上的多核学习中达到了 0.97 的 F1 分数。因此，该方法能够克服在临床数据中基于特征的分类相关的限制。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/73e0/8943170/3abd80d85cb9/41598_2022_8757_Fig1_HTML.jpg

相似文献

A novel kernel based approach to arbitrary length symbolic data with application to type 2 diabetes risk.一种基于核的新方法，用于处理任意长度的符号数据，并应用于 2 型糖尿病风险。

Sci Rep. 2022 Mar 23;12(1):4985. doi: 10.1038/s41598-022-08757-1.

Efficient Multiple Kernel Learning Algorithms Using Low-Rank Representation.基于低秩表示的高效多核学习算法

Comput Intell Neurosci. 2017;2017:3678487. doi: 10.1155/2017/3678487. Epub 2017 Aug 22.

Protein subcellular localization prediction using multiple kernel learning based support vector machine.基于多核学习支持向量机的蛋白质亚细胞定位预测

Mol Biosyst. 2017 Mar 28;13(4):785-795. doi: 10.1039/c6mb00860g.

A multi-label learning based kernel automatic recommendation method for support vector machine.一种基于多标签学习的支持向量机核自动推荐方法。

PLoS One. 2015 Apr 20;10(3):e0120455. doi: 10.1371/journal.pone.0120455. eCollection 2015.

L2-norm multiple kernel learning and its application to biomedical data fusion.L2-范数多核学习及其在生物医学数据融合中的应用。

BMC Bioinformatics. 2010 Jun 8;11:309. doi: 10.1186/1471-2105-11-309.

Seminal quality prediction using data mining methods.使用数据挖掘方法进行精液质量预测。

Technol Health Care. 2014;22(4):531-45. doi: 10.3233/THC-140816.

Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms.基于词袋模型和监督降维算法的慢性病患者多标签分类

J Biomed Inform. 2014 Oct;51:165-75. doi: 10.1016/j.jbi.2014.05.010. Epub 2014 May 29.

Vicinal support vector classifier using supervised kernel-based clustering.基于监督核聚类的邻接支持向量分类器。

Artif Intell Med. 2014 Mar;60(3):189-96. doi: 10.1016/j.artmed.2014.01.003. Epub 2014 Feb 7.

Sepsis mortality prediction with the Quotient Basis Kernel.基于商数基核的脓毒症死亡率预测

Artif Intell Med. 2014 May;61(1):45-52. doi: 10.1016/j.artmed.2014.03.004. Epub 2014 Mar 27.

A Multiple Kernel Learning Model Based on -Norm.基于范数的多核学习模型。

Comput Intell Neurosci. 2018 Jan 23;2018:1018789. doi: 10.1155/2018/1018789. eCollection 2018.

本文引用的文献

The 2019 ESC Guidelines on diabetes, pre-diabetes, and cardiovascular diseases developed in collaboration with the EASD: New features and the ‘Ten Commandments’ of the 2019 Guidelines are discussed by Professor Peter J. Grant and Professor Francesco Cosentino, the Task Force chairmen.与欧洲糖尿病研究协会（EASD）合作制定的《2019年欧洲心脏病学会（ESC）糖尿病、糖尿病前期和心血管疾病指南》：特别工作组主席彼得·J·格兰特教授和弗朗切斯科·科森蒂诺教授讨论了《2019年指南》的新特点和“十诫”。

Eur Heart J. 2019 Oct 14;40(39):3215-3217. doi: 10.1093/eurheartj/ehz687.

Prognostic Modeling and Prevention of Diabetes Using Machine Learning Technique.使用机器学习技术对糖尿病进行预后建模和预防。

Sci Rep. 2019 Sep 24;9(1):13805. doi: 10.1038/s41598-019-49563-6.

Predicting the onset of type 2 diabetes using wide and deep learning with electronic health records.利用电子健康记录的宽深学习预测 2 型糖尿病发病。

Comput Methods Programs Biomed. 2019 Dec;182:105055. doi: 10.1016/j.cmpb.2019.105055. Epub 2019 Aug 27.

Machine Learning Models in Type 2 Diabetes Risk Prediction: Results from a Cross-sectional Retrospective Study in Chinese Adults.机器学习模型在 2 型糖尿病风险预测中的应用：一项中国成年人横断面回顾性研究的结果。

Curr Med Sci. 2019 Aug;39(4):582-588. doi: 10.1007/s11596-019-2077-4. Epub 2019 Jul 25.

Cardiovascular disease risk prediction using automated machine learning: A prospective study of 423,604 UK Biobank participants.使用自动化机器学习进行心血管疾病风险预测：对 423604 名英国生物库参与者的前瞻性研究。

PLoS One. 2019 May 15;14(5):e0213653. doi: 10.1371/journal.pone.0213653. eCollection 2019.

De novo design of anticancer peptides by ensemble artificial neural networks.基于集成人工神经网络的抗癌肽从头设计。

J Mol Model. 2019 Apr 5;25(5):112. doi: 10.1007/s00894-019-4007-6.

Assessment of a Deep Learning Model Based on Electronic Health Record Data to Forecast Clinical Outcomes in Patients With Rheumatoid Arthritis.基于电子健康记录数据的深度学习模型评估类风湿关节炎患者临床结局预测

JAMA Netw Open. 2019 Mar 1;2(3):e190606. doi: 10.1001/jamanetworkopen.2019.0606.

Drug-Drug Interaction Discovery: Kernel Learning from Heterogeneous Similarities.药物相互作用发现：基于异构相似性的核学习

Smart Health (Amst). 2018 Dec;9-10:88-100. doi: 10.1016/j.smhl.2018.07.007. Epub 2018 Jul 7.

Using neural attention networks to detect adverse medical events from electronic health records.利用神经注意力网络从电子健康记录中检测不良医疗事件。

J Biomed Inform. 2018 Nov;87:118-130. doi: 10.1016/j.jbi.2018.10.002. Epub 2018 Oct 15.

Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electronic Health Record (EHR) Analysis.深度电子健康记录（EHR）：深度学习技术在电子健康记录（EHR）分析中的最新进展综述。

IEEE J Biomed Health Inform. 2018 Sep;22(5):1589-1604. doi: 10.1109/JBHI.2017.2767063. Epub 2017 Oct 27.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

一种基于核的新方法，用于处理任意长度的符号数据，并应用于 2 型糖尿病风险。

A novel kernel based approach to arbitrary length symbolic data with application to type 2 diabetes risk.

机构信息

出版信息

相似文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

本文引用的文献