Department of Biomedical Informatics, College of Medicine, The Ohio State University, 1585 Neil Ave, Columbus, OH 43210, USA.
Genentech Inc., 1 DNA Way, South San Francisco, CA 94080, USA.
Database (Oxford). 2022 May 18;2022. doi: 10.1093/database/baac031.
The discovery of drug-drug interactions (DDIs) that have a translational impact among in vitro pharmacokinetics (PK), in vivo PK and clinical outcomes depends largely on the quality of the annotated corpus available for text mining. We have developed a new DDI corpus based on an annotation scheme that builds upon and extends previous ones, where an abstract is fragmented and each fragment is then annotated along eight dimensions, namely, focus, polarity, certainty, evidence, directionality, study type, interaction type and mechanism. The guideline for defining these dimensions has undergone refinement during the annotation process. Our DDI corpus comprises 900 positive DDI abstracts and 750 that are not directly relevant to DDI. The abstracts in corpus are separated into eight categories of DDI or non-DDI evidence: DDI with pharmacokinetic (PK) mechanism, in vivo DDI PK, DDI clinical, drug-nutrition interaction, single drug, not drug related, in vitro pharmacodynamic (PD) and case report. Seven annotators, three annotators with drug-interaction research experience and four annotators with less drug-interaction research experience independently annotated the DDI corpus, where two researchers independently annotated each abstract. After two rounds of annotations with additional training in between, agreement improved from (0.79, 0.96, 0.86, 0.70, 0.91, 0.65, 0.78, 0.90) to (0.93, 0.99, 0.96, 0.94, 0.95, 0.93, 0.96, 0.97) for focus, certainty, evidence, study type, interaction type, mechanisms, polarity and direction, respectively. The novice-level annotators improved from 0.83 to 0.96, while the expert-level annotators stayed in high performance with some improvement, from 0.90 to 0.96. In summary, we achieved 96% agreement among each pair of annotators with regard to the eight dimensions. The annotated corpus is now available to the community for inclusion in their text-mining pipelines. Database URL https://github.com/zha204/DDI-Corpus-Database/tree/master/DDI%20corpus.
药物-药物相互作用(DDI)的发现对体外药代动力学(PK)、体内 PK 和临床结果具有转化意义,这在很大程度上取决于可用于文本挖掘的注释语料库的质量。我们基于一个注释方案开发了一个新的 DDI 语料库,该方案建立在以前的方案基础上并进行了扩展,其中摘要被碎片化,然后每个片段沿着八个维度进行注释,即焦点、极性、确定性、证据、方向性、研究类型、相互作用类型和机制。定义这些维度的指南在注释过程中得到了完善。我们的 DDI 语料库包括 900 篇阳性 DDI 摘要和 750 篇与 DDI 不直接相关的摘要。语料库中的摘要分为八个类别:具有 PK 机制的 DDI、体内 DDI PK、DDI 临床、药物-营养相互作用、单药、与药物无关、体外药效学(PD)和病例报告。七位注释者,三位具有药物相互作用研究经验的注释者和四位具有较少药物相互作用研究经验的注释者独立注释了 DDI 语料库,其中两位研究人员独立注释了每个摘要。经过两轮注释和中间的额外培训,在焦点、确定性、证据、研究类型、相互作用类型、机制、极性和方向方面,注释者之间的一致性从(0.79、0.96、0.86、0.70、0.91、0.65、0.78、0.90)提高到(0.93、0.99、0.96、0.94、0.95、0.93、0.96、0.97)。新手级注释者从 0.83 提高到 0.96,而专家级注释者保持了较高的性能,略有提高,从 0.90 提高到 0.96。总之,我们在八个维度上实现了每个注释者之间 96%的一致性。注释语料库现已可供社区在其文本挖掘管道中使用。数据库网址:https://github.com/zha204/DDI-Corpus-Database/tree/master/DDI%20corpus。