Mwogi Thomas S, Biondich Paul G, Grannis Shaun J
Regenstrief Institute, Indianapolis, IN ; Indiana University Purdue University (IUPUI), Indianapolis, IN.
AMIA Annu Symp Proc. 2014 Nov 14;2014:1855-63. eCollection 2014.
Motivated by the need for readily available data for testing an open-source health information exchange platform, we developed and evaluated two methods for generating synthetic messages. The methods used HL7 version 2 messages obtained from the Indiana Network for Patient Care. Data from both methods were analyzed to assess how effectively the output reflected original 'real-world' data. The Markov Chain method (MCM) used an algorithm based on transitional probability matrix while the Music Box model (MBM) randomly selected messages of particular trigger type from the original data to generate new messages. The MBM was faster, generated shorter messages and exhibited less variation in message length. The MCM required more computational power, generated longer messages with more message length variability. Both methods exhibited adequate coverage, producing a high proportion of messages consistent with original messages. Both methods yielded similar rates of valid messages.
出于测试开源健康信息交换平台对现成可用数据的需求,我们开发并评估了两种生成合成消息的方法。这些方法使用了从印第安纳州患者护理网络获取的HL7版本2消息。对两种方法的数据进行了分析,以评估输出在多大程度上有效地反映了原始的“真实世界”数据。马尔可夫链方法(MCM)使用基于转移概率矩阵的算法,而音乐盒模型(MBM)则从原始数据中随机选择特定触发类型的消息来生成新消息。MBM速度更快,生成的消息更短,并且消息长度的变化更小。MCM需要更多的计算能力,生成的消息更长,消息长度的变异性更大。两种方法都具有足够的覆盖率,生成的消息中有很大一部分与原始消息一致。两种方法产生的有效消息率相似。