McInerney Ciarán D, Oliver Phillip, Achinanya Ada, Horspool Michelle, Huddy Vyv, Burton Christopher
School of Medicine & Population Health, University of Sheffield, Sheffield, England.
Sheffield Health and Social Care NHS Foundation Trust, Sheffield, United Kingdom.
PLoS One. 2025 May 8;20(5):e0322771. doi: 10.1371/journal.pone.0322771. eCollection 2025.
The coded prevalence of complex mental health difficulties in electronic health records, such as personality disorder and dysthymia,is much lower than expected from population surveys. We aimed to identify features in primary care records that might be useful in promoting greater recognition of complex mental health difficulties.
We analysed Connected Bradford, an anonymised primary care database of approximately 1.15M citizens. We used multiple approaches to generate a large set of features representing multi-level collections of patient attributes across time and dimensions of healthcare. Feature sets included antecedent and concurrent problems (psychiatric, social and medical), patterns of prescription and service use and temporal stability of attendance. These were tested individually and in combination. We analysed the relationship between features and diagnostic codes using scaled mutual information. We identified 3,040 records satisfying our definition of complex mental health difficulties. This was 0.3% of the population compared to an expected prevalence of 3-5%. We generated >500,000 features. The most informative feature was count of unique psychiatric diagnoses. Other features were identified, including binary features (e.g., presence or absence of prescription for antipsychotic medication), continuous features (e.g., entropy of non-attendance) and counts of features (e.g., concerning behaviours such as self-harm & substance misuse). Several of these showed odds ratios >=5 or <=0.2 but low positive predictive value. We suggest this is due to the large number of "cases" being uncoded and, thus appearing as "controls".
Complex mental health difficulties are poorly coded. We demonstrated the feasibility of using information theoretic approaches to develop a large set of novel features in electronic health records. While these are currently insufficient for diagnosis, several can act as prompts to consider further diagnostic assessment.
电子健康记录中复杂心理健康问题(如人格障碍和心境恶劣)的编码患病率远低于人群调查预期。我们旨在确定初级保健记录中的特征,这些特征可能有助于提高对复杂心理健康问题的识别率。
我们分析了Connected Bradford,这是一个约115万市民的匿名初级保健数据库。我们使用多种方法生成了大量特征,这些特征代表了患者属性在时间和医疗维度上的多层次集合。特征集包括既往和并发问题(精神、社会和医疗方面)、处方和服务使用模式以及就诊的时间稳定性。这些特征分别进行了测试,并进行了组合测试。我们使用缩放互信息分析了特征与诊断代码之间的关系。我们确定了3040条符合我们对复杂心理健康问题定义的记录。这占总人口的0.3%,而预期患病率为3 - 5%。我们生成了超过50万个特征。最具信息量的特征是独特精神科诊断的数量。还确定了其他特征,包括二元特征(如是否有抗精神病药物处方)、连续特征(如未就诊的熵)和特征计数(如关于自伤和药物滥用等行为)。其中一些特征的优势比 >=5 或 <=0.2,但阳性预测值较低。我们认为这是由于大量“病例”未被编码,因此表现为“对照”。
复杂心理健康问题的编码情况不佳。我们证明了使用信息论方法在电子健康记录中开发大量新特征的可行性。虽然目前这些特征不足以用于诊断,但其中一些可以作为进一步诊断评估的提示。