Biomedical Informatics & Data Science Section, Division of General Internal Medicine, The Johns Hopkins University School of Medicine, Baltimore, Maryland, USA.
Department of Health Policy and Management, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA.
J Am Med Inform Assoc. 2022 Aug 16;29(9):1607-1617. doi: 10.1093/jamia/ocac092.
Electronic consultation (eConsult) content reflects important information about referring clinician needs across an organization, but is challenging to extract. The objective of this work was to develop machine learning models for classifying eConsult questions for question type and question content. Another objective of this work was to investigate the ability to solve this task with constrained expert time resources.
Our data source is the San Francisco Health Network eConsult system, with over 700 000 deidentified questions from the years 2008-2017, from gastroenterology, urology, and neurology specialties. We develop classifiers based on Bidirectional Encoder Representations from Transformers, experimenting with multitask learning to learn when information can be shared across classifiers. We produce learning curves to understand when we may be able to reduce the amount of human labeling required.
Multitask learning shows benefits only in the neurology-urology pair where they shared substantial similarities in the distribution of question types. Continued pretraining of models in new domains is highly effective. In the neurology-urology pair, near-peak performance is achieved with only 10% of the urology training data given all of the neurology data.
Sharing information across classifier types shows little benefit, whereas sharing classifier components across specialties can help if they are similar in the balance of procedural versus cognitive patient care.
We can accurately classify eConsult content with enough labeled data, but only in special cases do methods for reducing labeling effort apply. Future work should explore new learning paradigms to further reduce labeling effort.
电子咨询(eConsult)的内容反映了整个组织中转诊临床医生需求的重要信息,但提取这些信息具有挑战性。这项工作的目的是开发用于对 eConsult 问题进行分类的机器学习模型,以区分问题类型和问题内容。这项工作的另一个目的是研究在受限于专家时间资源的情况下解决此任务的能力。
我们的数据来源是旧金山健康网络电子咨询系统,该系统包含 2008 年至 2017 年间来自胃肠病学、泌尿科和神经病学专业的超过 70 万份去标识问题。我们基于来自转换器的双向编码器表示开发分类器,尝试使用多任务学习来学习何时可以在分类器之间共享信息。我们生成学习曲线,以了解何时我们可能能够减少所需的人工标记量。
多任务学习仅在神经病学-泌尿科对中显示出益处,因为它们在问题类型的分布上具有很大的相似性。在新领域中继续对模型进行预训练非常有效。在神经病学-泌尿科对中,仅使用 10%的泌尿科训练数据即可获得几乎达到峰值的性能,同时将所有神经病学数据都用于该数据。
在分类器类型之间共享信息几乎没有好处,而在专业之间共享分类器组件则可以帮助平衡程序与认知患者护理,如果它们相似的话。
我们可以用足够的标记数据准确地对 eConsult 内容进行分类,但只有在特殊情况下,减少标记工作的方法才适用。未来的工作应该探索新的学习范例,以进一步减少标记工作。