Estiri Hossein, Strasser Zachary H, Klann Jeffery G, McCoy Thomas H, Wagholikar Kavishwar B, Vasey Sebastien, Castro Victor M, Murphy MaryKate E, Murphy Shawn N
Laboratory of Computer Science, Massachusetts General Hospital, Boston, MA 02144, USA.
Research Information Science and Computing, Mass General Brigham, Somerville, MA 02145, USA.
Patterns (N Y). 2020 Jul 10;1(4):100051. doi: 10.1016/j.patter.2020.100051. Epub 2020 Jun 18.
Electronic health records (EHRs) contain important temporal information about the progression of disease and treatment outcomes. This paper proposes a transitive sequencing approach for constructing temporal representations from EHR observations for downstream machine learning. Using clinical data from a cohort of patients with congestive heart failure, we mined temporal representations by transitive sequencing of EHR medication and diagnosis records for classification and prediction tasks. We compared the classification and prediction performances of the transitive sequential representations (bag-of-sequences approach) with the conventional approach of using aggregated vectors of EHR data (aggregated vector representation) across different classifiers. We found that the transitive sequential representations are better phenotype "differentiators" and predictors than the "atemporal" EHR records. Our results also demonstrated that data representations obtained from transitive sequencing of EHR observations can present novel insights about the progression of the disease that are difficult to discern when clinical data are treated independently of the patient's history.
电子健康记录(EHRs)包含有关疾病进展和治疗结果的重要时间信息。本文提出了一种传递性排序方法,用于从EHR观察结果构建时间表征,以用于下游机器学习。利用来自充血性心力衰竭患者队列的临床数据,我们通过对EHR用药和诊断记录进行传递性排序来挖掘时间表征,以用于分类和预测任务。我们将传递性序列表征(序列包方法)与使用EHR数据聚合向量的传统方法(聚合向量表征)在不同分类器上的分类和预测性能进行了比较。我们发现,传递性序列表征比“无时间性的”EHR记录更能作为更好的表型“区分器”和预测器。我们的结果还表明,从EHR观察结果的传递性排序中获得的数据表征可以呈现出关于疾病进展的新见解,而当临床数据独立于患者病史进行处理时,这些见解很难辨别。