The NIH BD2K Center of Excellence in Biomedical Computing, University of California at Los Angeles, Los Angeles, CA 90095, USA.
Department of Physiology, University of California at Los Angeles, Los Angeles, CA 90095, USA.
Sci Data. 2018 Nov 20;5:180258. doi: 10.1038/sdata.2018.258.
Clinical case reports (CCRs) provide an important means of sharing clinical experiences about atypical disease phenotypes and new therapies. However, published case reports contain largely unstructured and heterogeneous clinical data, posing a challenge to mining relevant information. Current indexing approaches generally concern document-level features and have not been specifically designed for CCRs. To address this disparity, we developed a standardized metadata template and identified text corresponding to medical concepts within 3,100 curated CCRs spanning 15 disease groups and more than 750 reports of rare diseases. We also prepared a subset of metadata on reports on selected mitochondrial diseases and assigned ICD-10 diagnostic codes to each. The resulting resource, Metadata Acquired from Clinical Case Reports (MACCRs), contains text associated with high-level clinical concepts, including demographics, disease presentation, treatments, and outcomes for each report. Our template and MACCR set render CCRs more findable, accessible, interoperable, and reusable (FAIR) while serving as valuable resources for key user groups, including researchers, physician investigators, clinicians, data scientists, and those shaping government policies for clinical trials.
临床病例报告(CCR)为分享关于非典型疾病表型和新疗法的临床经验提供了重要手段。然而,已发表的病例报告包含大量非结构化和异质的临床数据,这给挖掘相关信息带来了挑战。目前的索引方法通常关注文档级别的特征,而不是专门为 CCR 设计的。为了解决这一差异,我们开发了一个标准化的元数据模板,并在 15 个疾病组的 3100 个经过策展的 CCR 中确定了与医疗概念相对应的文本,这些 CCR 涵盖了超过 750 份罕见疾病报告。我们还为选定的线粒体疾病报告准备了一部分元数据,并为每个报告分配了 ICD-10 诊断代码。由此产生的资源,从临床病例报告中获取的元数据(MACCR),包含与高级临床概念相关的文本,包括每个报告的人口统计学、疾病表现、治疗和结果。我们的模板和 MACCR 集使 CCR 更易于发现、访问、互操作和重用(FAIR),同时也为关键用户群体(包括研究人员、医师调查员、临床医生、数据科学家以及制定临床试验政府政策的人员)提供了有价值的资源。