Li Ang, Wang Yunxin, Chen Hongxu
Graduate School of Arts and Science, Dongshin University, Naju, 58245, South Korea.
Department of Basic Teaching and Research, Jilin Sport University, Changchun, Jilin, 130022, China.
SLAS Technol. 2025 Jun;32:100286. doi: 10.1016/j.slast.2025.100286. Epub 2025 Apr 10.
The performance and long-term health of athletes are significantly influenced by their cardiovascular resilience and associated risk factors. This study explores the innovative applications of Natural Language Processing (NLP) and Large Language Models (LLMs) in biomedical diagnostics, particularly for AI-driven arrhythmia detection, hypertrophic cardiomyopathy (HCM) in athletes, and personalized medicine. The complexity of analysing diverse biomedical datasets, such as electrocardiograms (ECG), clinical records, genetic screening reports, and imaging results, poses challenges in obtaining precise early diagnoses. To address these issues, we introduce a hybrid machine learning (ML) framework that integrates the Wolf Pack Search Algorithm Dynamic Random Forest (WPSA-DRF) with a RoBERTa-based LLM to enhance the accuracy of cardiovascular disease predictions. Using advanced NLP techniques, including biomedical text mining, entity recognition, and feature extraction, the system processes structured and unstructured clinical data to detect abnormalities associated with sudden cardiac arrest (SCA), arrhythmias, and genetic cardiomyopathies. The proposed system achieves a diagnostic accuracy of 92.5 %, precision of 92.7 %, recall of 99.23 %, and F1-score of 95.6 %, outperforming traditional diagnostic methodologies. Furthermore, the research underscores the role of LLMs in personalized medicine, identifying patient-specific risk factors and optimizing treatment pathways for cardiac patients. This work highlights how NLP-driven AI solutions are transforming biomedical research, accelerating early disease detection, and improving clinical decision-making for both athletes and the general population.
运动员的心血管弹性及相关风险因素对其表现和长期健康有着重大影响。本研究探索自然语言处理(NLP)和大语言模型(LLMs)在生物医学诊断中的创新应用,特别是用于人工智能驱动的心律失常检测、运动员肥厚型心肌病(HCM)以及个性化医疗。分析多样的生物医学数据集(如心电图(ECG)、临床记录、基因筛查报告和影像结果)的复杂性,给获得精确的早期诊断带来了挑战。为解决这些问题,我们引入了一种混合机器学习(ML)框架,该框架将狼群搜索算法动态随机森林(WPSA-DRF)与基于RoBERTa的大语言模型相结合,以提高心血管疾病预测的准确性。该系统使用先进的NLP技术,包括生物医学文本挖掘、实体识别和特征提取,来处理结构化和非结构化的临床数据,以检测与心脏骤停(SCA)、心律失常和遗传性心肌病相关的异常情况。所提出的系统实现了92.5%的诊断准确率、92.7%的精确率、99.23%的召回率和95.6%的F1分数,优于传统诊断方法。此外,该研究强调了大语言模型在个性化医疗中的作用,识别患者特定的风险因素并优化心脏病患者的治疗途径。这项工作突出了NLP驱动的人工智能解决方案如何正在改变生物医学研究,加速疾病早期检测,并改善运动员和普通人群的临床决策。