Xia Yilin, He Mengqiao, Basang Sijia, Sha Leihao, Huang Zijie, Jin Ling, Duan Yifei, Tang Yusha, Li Hua, Lai Wanlin, Chen Lei
Department of Neurology, West China Hospital, Sichuan University, #37 Guoxue Alley, Wuhou District, Chengdu, China, 86 18980605819.
Sichuan Provincial Engineering Research Center of Brain-Machine Interface, and Sichuan Provincial Engineering Research Center of Neuromodulation, West China Hospital, Sichuan University, Chengdu, China.
JMIR Med Inform. 2024 Oct 17;12:e57727. doi: 10.2196/57727.
Obtaining and describing semiology efficiently and classifying seizure types correctly are crucial for the diagnosis and treatment of epilepsy. Nevertheless, there exists an inadequacy in related informatics resources and decision support tools.
We developed a symptom entity extraction tool and an epilepsy semiology ontology (ESO) and used machine learning to achieve an automated binary classification of epilepsy in this study.
Using present history data of electronic health records from the Southwest Epilepsy Center in China, we constructed an ESO and a symptom-entity extraction tool to extract seizure duration, seizure symptoms, and seizure frequency from the unstructured text by combining manual annotation with natural language processing techniques. In addition, we achieved automatic classification of patients in the study cohort with high accuracy based on the extracted seizure feature data using multiple machine learning methods.
Data included present history from 10,925 cases between 2010 and 2020. Six annotators labeled a total of 2500 texts to obtain 5844 words of semiology and construct an ESO with 702 terms. Based on the ontology, the extraction tool achieved an accuracy rate of 85% in symptom extraction. Furthermore, we trained a stacking ensemble learning model combining XGBoost and random forest with an F1-score of 75.03%. The random forest model had the highest area under the curve (0.985).
This work demonstrated the feasibility of natural language processing-assisted structural extraction of epilepsy medical record texts and downstream tasks, providing open ontology resources for subsequent related work.
高效获取并描述癫痫发作症状学以及正确分类癫痫发作类型对于癫痫的诊断和治疗至关重要。然而,相关信息学资源和决策支持工具存在不足。
本研究开发了一种症状实体提取工具和癫痫发作症状学本体(ESO),并使用机器学习实现癫痫的自动二元分类。
利用中国西南癫痫中心电子健康记录的现病史数据,我们构建了一个ESO和一个症状实体提取工具,通过将人工标注与自然语言处理技术相结合,从非结构化文本中提取发作持续时间、发作症状和发作频率。此外,我们使用多种机器学习方法,基于提取的发作特征数据,在研究队列中实现了患者的自动高精度分类。
数据包括2010年至2020年间10925例患者的现病史。6名标注人员对总共2500篇文本进行标注,以获取5844个症状学词汇,并构建了一个包含702个术语的ESO。基于该本体,提取工具在症状提取方面的准确率达到85%。此外,我们训练了一个结合XGBoost和随机森林的堆叠集成学习模型,F1分数为75.03%。随机森林模型的曲线下面积最高(0.985)。
这项工作证明了自然语言处理辅助癫痫病历文本结构提取及下游任务的可行性,为后续相关工作提供了开放的本体资源。