Hripcsak George, Elhadad Noémie, Chen Yueh-Hsia, Zhou Li, Morrison Frances P
Department of Biomedical Informatics, Columbia University, College of Physicians and Surgeons, 622 West 168 Street, VC-5, New York, NY 10032, USA.
J Am Med Inform Assoc. 2009 Mar-Apr;16(2):220-7. doi: 10.1197/jamia.M3007. Epub 2008 Dec 11.
To measure the uncertainty of temporal assertions like "3 weeks ago" in clinical texts.
Temporal assertions extracted from narrative clinical reports were compared to facts extracted from a structured clinical database for the same patients.
The authors correlated the assertions and the facts to determine the dependence of the uncertainty of the assertions on the semantic and lexical properties of the assertions.
The observed deviation between the stated duration and actual duration averaged about 20% of the stated deviation. Linear regression revealed that assertions about events further in the past tend to be more uncertain, smaller numeric values tend to be more uncertain (1 mo v. 30 d), and round numbers tend to be more uncertain (10 versus 11 yrs).
The authors empirically derived semantics behind statements of duration using "ago," and verified intuitions about how numbers are used.
测量临床文本中诸如“3周前”等时间断言的不确定性。
将从叙述性临床报告中提取的时间断言与从同一患者的结构化临床数据库中提取的事实进行比较。
作者将断言与事实相关联,以确定断言的不确定性对断言的语义和词汇属性的依赖性。
所述持续时间与实际持续时间之间观察到的偏差平均约为所述偏差的20%。线性回归显示,关于过去更远事件的断言往往更不确定,较小的数值往往更不确定(1个月对30天),整数往往更不确定(10年对11年)。
作者通过实证得出了使用“前”的持续时间陈述背后的语义,并验证了关于数字使用方式的直觉。