Department of Psychiatry and Psychiatric Institute, University of Illinois College of Medicine, Chicago, IL, USA.
School of Information Management, Wuhan University, Wuhan, Hubei, China.
Database (Oxford). 2019 Jan 1;2019:bay143. doi: 10.1093/database/bay143.
Clinical case reports are the `eyewitness reports' of medicine and provide a valuable, unique, albeit noisy and underutilized type of evidence. Generally a case report has a single main finding that represents the reason for writing up the report in the first place. In the present study, we present the results of manual annotation carried out by two individuals on 500 randomly sampled case reports. This corpus contains main finding sentences extracted from title, abstract and full-text of the same article that can be regarded as semantically related and are often paraphrases. The final reconciled corpus of 416 articles comprises an open resource for further study. This is the first step in establishing text mining models and tools that can identify main finding sentences in an automated fashion, and in measuring quantitatively how similar any two main findings are. We envision that case reports in PubMed may be automatically indexed by main finding, so that users can carry out information queries for specific main findings (rather than general topics)-and given one case report, a user can retrieve those having the most similar main findings. The metric of main finding similarity may also potentially be relevant to the modeling of paraphrasing, summarization and entailment within the biomedical literature.
临床病例报告是医学的“目击者报告”,提供了一种有价值的、独特的、尽管嘈杂且未充分利用的证据类型。一般来说,病例报告有一个单一的主要发现,这代表了编写报告的首要原因。在本研究中,我们展示了由两个人对 500 篇随机抽样病例报告进行手动注释的结果。该语料库包含从同一文章的标题、摘要和全文中提取的主要发现句子,可以被视为语义相关的,并且通常是释义。最终协调一致的 416 篇文章语料库构成了进一步研究的开放资源。这是建立能够自动识别主要发现句子的文本挖掘模型和工具的第一步,并且可以定量地衡量任意两个主要发现的相似程度。我们设想 PubMed 中的病例报告可以通过主要发现自动进行索引,以便用户可以针对特定的主要发现进行信息查询(而不是一般主题)——并且给定一个病例报告,用户可以检索具有最相似主要发现的报告。主要发现相似性的度量标准也可能与生物医学文献中的释义、总结和蕴涵建模相关。