Jackson Richard G, Patel Rashmi, Jayatilleke Nishamali, Kolliakou Anna, Ball Michael, Gorrell Genevieve, Roberts Angus, Dobson Richard J, Stewart Robert
Institute of Psychiatry, Psychology & Neuroscience, King's College London, London, UK.
Department of Computer Science, University of Sheffield, Sheffield, UK.
BMJ Open. 2017 Jan 17;7(1):e012012. doi: 10.1136/bmjopen-2016-012012.
We sought to use natural language processing to develop a suite of language models to capture key symptoms of severe mental illness (SMI) from clinical text, to facilitate the secondary use of mental healthcare data in research.
Development and validation of information extraction applications for ascertaining symptoms of SMI in routine mental health records using the Clinical Record Interactive Search (CRIS) data resource; description of their distribution in a corpus of discharge summaries.
Electronic records from a large mental healthcare provider serving a geographic catchment of 1.2 million residents in four boroughs of south London, UK.
The distribution of derived symptoms was described in 23 128 discharge summaries from 7962 patients who had received an SMI diagnosis, and 13 496 discharge summaries from 7575 patients who had received a non-SMI diagnosis.
Fifty SMI symptoms were identified by a team of psychiatrists for extraction based on salience and linguistic consistency in records, broadly categorised under positive, negative, disorganisation, manic and catatonic subgroups. Text models for each symptom were generated using the TextHunter tool and the CRIS database.
We extracted data for 46 symptoms with a median F1 score of 0.88. Four symptom models performed poorly and were excluded. From the corpus of discharge summaries, it was possible to extract symptomatology in 87% of patients with SMI and 60% of patients with non-SMI diagnosis.
This work demonstrates the possibility of automatically extracting a broad range of SMI symptoms from English text discharge summaries for patients with an SMI diagnosis. Descriptive data also indicated that most symptoms cut across diagnoses, rather than being restricted to particular groups.
我们试图利用自然语言处理技术开发一套语言模型,以从临床文本中捕捉严重精神疾病(SMI)的关键症状,从而促进精神卫生保健数据在研究中的二次利用。
利用临床记录交互式搜索(CRIS)数据资源,开发并验证用于确定常规精神卫生记录中SMI症状的信息提取应用程序;描述这些症状在出院小结语料库中的分布情况。
来自一家大型精神卫生保健机构的电子记录,该机构服务于英国伦敦南部四个行政区120万居民的地理区域。
描述了7962例被诊断为SMI的患者的23128份出院小结以及7575例未被诊断为SMI的患者的13496份出院小结中所提取症状的分布情况。
一组精神科医生根据记录中的显著性和语言一致性,确定了50种SMI症状以进行提取,这些症状大致分为阳性、阴性、紊乱、躁狂和紧张症亚组。使用TextHunter工具和CRIS数据库为每种症状生成文本模型。
我们提取了46种症状的数据,中位数F1分数为0.88。有四种症状模型表现不佳,被排除在外。从出院小结语料库中,可以提取出87%的SMI患者和60%的非SMI诊断患者的症状学信息。
这项工作证明了从SMI诊断患者的英文文本出院小结中自动提取广泛的SMI症状的可能性。描述性数据还表明,大多数症状跨越诊断类别,而不是局限于特定群体。