Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark.
PLoS Comput Biol. 2011 Aug;7(8):e1002141. doi: 10.1371/journal.pcbi.1002141. Epub 2011 Aug 25.
Electronic patient records remain a rather unexplored, but potentially rich data source for discovering correlations between diseases. We describe a general approach for gathering phenotypic descriptions of patients from medical records in a systematic and non-cohort dependent manner. By extracting phenotype information from the free-text in such records we demonstrate that we can extend the information contained in the structured record data, and use it for producing fine-grained patient stratification and disease co-occurrence statistics. The approach uses a dictionary based on the International Classification of Disease ontology and is therefore in principle language independent. As a use case we show how records from a Danish psychiatric hospital lead to the identification of disease correlations, which subsequently can be mapped to systems biology frameworks.
电子病历仍然是一个相当未被探索的,但具有潜在丰富数据来源的领域,可以用于发现疾病之间的相关性。我们描述了一种从医疗记录中以系统和非队列依赖的方式收集患者表型描述的通用方法。通过从这些记录中的自由文本中提取表型信息,我们证明我们可以扩展结构记录数据中包含的信息,并将其用于生成细粒度的患者分层和疾病共现统计数据。该方法使用基于国际疾病分类本体的字典,因此原则上与语言无关。作为一个用例,我们展示了来自丹麦一家精神病院的记录如何导致疾病相关性的识别,随后这些相关性可以映射到系统生物学框架中。