Zhu Xudong, Plasek Joseph M, Tang Chunlei, Al-Assad Wasim, Zhang Zhikun, Xiong Yun, Wang Liqin, Yerneni Sharmitha, Ortega Carlos, Kang Min-Jeoung, Zhou Li, Bates David W, Dykes Patricia C
Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai, China.
Division of General Internal Medicine and Primary Care, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA.
BMC Res Notes. 2021 Apr 14;14(1):136. doi: 10.1186/s13104-021-05529-4.
Our goal was to research and develop exploratory analysis tools for clinical notes, which now are underrepresented to limit the diversity of data insights on medically relevant applications.
We characterize how exploratory analysis can affect representation learning on clinical narratives and present several self-developed tools to explore sepsis. Our experiments focus on patients with sepsis in the MIMIC-III Clinical Database or in our institution's research patient data repository. We found that global embeddings assist in learning local representations of clinical notes. Second, aligning at any specific time facilitates the use of learning models by pooling more available clinical notes to form a training set. Furthermore, reconstruction of the timeline enhances downstream-processing techniques by emphasizing temporal expressions and temporal relationships in clinical documentation. We demonstrate that clustering helps plot various types of clinical notes against a scale, which conveys a sense of the range or spread of the data and is useful for understanding data correlations. Appropriate exploratory analysis tools provide keen insights into preprocessing clinical notes, thereby further enhancing downstream analysis capabilities, making data driven medicine possible. Our examples can help generate better data representation of clinical documentation for models with improved performance and interpretability.
我们的目标是研究和开发用于临床记录的探索性分析工具,目前这类工具的使用不足,限制了医学相关应用中数据洞察的多样性。
我们描述了探索性分析如何影响临床叙事的表征学习,并展示了几种用于探索脓毒症的自主开发工具。我们的实验聚焦于MIMIC-III临床数据库或我们机构研究患者数据存储库中的脓毒症患者。我们发现全局嵌入有助于学习临床记录的局部表征。其次,在任何特定时间进行对齐,通过汇集更多可用的临床记录以形成训练集,便于学习模型的使用。此外,时间线的重建通过强调临床文档中的时间表达和时间关系,增强了下游处理技术。我们证明聚类有助于根据一个尺度绘制各种类型的临床记录,这传达了数据范围或分布的感觉,有助于理解数据相关性。适当的探索性分析工具能为临床记录预处理提供敏锐的见解,从而进一步增强下游分析能力,使数据驱动的医学成为可能。我们的示例有助于为性能和可解释性得到提升的模型生成更好的临床文档数据表征。