HiTZ Basque Center for Language Technologies - Ixa NLP Group, University of the Basque Country (UPV/EHU), Spain(1).
HiTZ Basque Center for Language Technologies - Ixa NLP Group, University of the Basque Country (UPV/EHU), Spain(1).
Artif Intell Med. 2023 Sep;143:102622. doi: 10.1016/j.artmed.2023.102622. Epub 2023 Jul 9.
Civil registration and vital statistics systems capture birth and death events to compile vital statistics and to provide legal rights to citizens. Vital statistics are a key factor in promoting public health policies and the health of the population. Medical certification of cause of death is the preferred source of cause of death information. However, two thirds of all deaths worldwide are not captured in routine mortality information systems and their cause of death is unknown. Verbal autopsy is an interim solution for estimating the cause of death distribution at the population level in the absence of medical certification. A Verbal Autopsy (VA) consists of an interview with the relative or the caregiver of the deceased. The VA includes both Closed Questions (CQs) with structured answer options, and an Open Response (OR) consisting of a free narrative of the events expressed in natural language and without any pre-determined structure. There are a number of automated systems to analyze the CQs to obtain cause specific mortality fractions with limited performance. We hypothesize that the incorporation of the text provided by the OR might convey relevant information to discern the CoD. The experimental layout compares existing Computer Coding Verbal Autopsy methods such as Tariff 2.0 with other approaches well suited to the processing of structured inputs as is the case of the CQs. Next, alternative approaches based on language models are employed to analyze the OR. Finally, we propose a new method with a bi-modal input that combines the CQs and the OR. Empirical results corroborated that the CoD prediction capability of the Tariff 2.0 algorithm is outperformed by our method taking into account the valuable information conveyed by the OR. As an added value, with this work we made available the software to enable the reproducibility of the results attained with a version implemented in R to make the comparison with Tariff 2.0 evident.
人口登记和生命统计系统记录了出生和死亡事件,编制生命统计数据,并为公民提供合法权利。生命统计数据是制定公共卫生政策和促进人口健康的关键因素。死因医学认证是死因信息的首选来源。然而,全球三分之二的死亡病例并未被常规死亡率信息系统记录,其死因未知。死因推断是在没有医学认证的情况下,估算人群死因分布的一种临时解决方案。死因推断(VA)包括对死者的亲属或护理人员进行访谈。VA 包括有结构化答案选项的封闭式问题(CQs),以及一个由自然语言表达的、没有任何预先确定结构的、对事件的自由叙述的开放式回答(OR)。有许多自动化系统可以分析 CQs,以获得特定病因的死亡率分数,但性能有限。我们假设,将 OR 提供的文本纳入其中,可能会传达有助于辨别死因的相关信息。实验设计比较了现有的计算机编码死因推断方法,如 Tariff 2.0,以及其他非常适合处理结构化输入的方法,如 CQs。接下来,我们采用了基于语言模型的替代方法来分析 OR。最后,我们提出了一种新的双模态输入方法,将 CQs 和 OR 结合在一起。实证结果证实,考虑到 OR 传达的有价值信息,Tariff 2.0 算法的死因预测能力不如我们的方法。此外,通过这项工作,我们提供了软件,以实现用 R 实现的版本的结果的可重复性,从而使 Tariff 2.0 的比较更加明显。