Erlanson Nils, China Joana Félix, Taavola Henric, Norén G Niklas
Uppsala Monitoring Centre, Uppsala, Sweden.
Drug Saf. 2025 Apr;48(4):401-413. doi: 10.1007/s40264-024-01509-2. Epub 2025 Jan 20.
Individual case reports are essential to identify and assess previously unknown adverse effects of medicines. On these reports, information on adverse events (AEs) and drugs are encoded in hierarchical terminologies. Encoding differences may hinder the retrieval and analysis of clinically related reports relevant to a topic of interest. Recent studies have explored the use of data-driven semantic vector representations to support analysis of pharmacovigilance data.
This study aims to evaluate the stability and clinical relatedness of vigiVec, a semantic vector representation for codes of AEs and drugs.
vigiVec is a published adaptation to pharmacovigilance of the publicly available Word2Vec model, applied to structured data instead of free text. It provides vector representations for MedDRA Preferred Terms and WHODrug Global active ingredients, learned from reporting patterns in VigiBase, the WHO global database of adverse event reports for medicines and vaccines. For this study, a 20-dimensional Skip-gram architecture with window size 250 was used. Our evaluation focused on nearest neighbors identified by the cosine similarity of vigiVec vector representations. Clinical relatedness was measured through term intruder detection, whereby a medical doctor was tasked to identify a randomly selected term-the intruder-included among the four nearest neighbors to a specific AE or drug. Stability was measured as the average overlap in the ten nearest neighbors for each AE or drug, in repeated fittings of vigiVec.
Among the ten nearest neighbors, 1.8 AEs on average belonged to the same MedDRA High Level Term (HLT; e.g., coagulopathies), and 1.3 drugs belonged to the same Anatomical Therapeutic Chemical level 3 (ATC-3; e.g., opioids). In the intruder detection task, when neighbors and intruders were both chosen from the same HLT, the intruder detection rate was 46%. When selected from different HLTs, it was 79%. By random chance, we should expect 20% (1 in 5). Corresponding rates for drugs were 42% in same ATC-3 and 65% in different ATC-3. The stability of nearest neighbors was 80% for AEs and 64% for drugs.
Nearest neighbors identified with vigiVec are stable and show high level of clinical relatedness. They are often from different parts of the existing hierarchies and complement these.
个案报告对于识别和评估药物先前未知的不良反应至关重要。在这些报告中,不良事件(AE)和药物的信息采用分层术语进行编码。编码差异可能会阻碍检索和分析与感兴趣主题相关的临床相关报告。最近的研究探索了使用数据驱动的语义向量表示来支持药物警戒数据分析。
本研究旨在评估vigiVec(一种用于AE和药物编码的语义向量表示)的稳定性和临床相关性。
vigiVec是对公开可用的Word2Vec模型在药物警戒方面的一种改编,应用于结构化数据而非自由文本。它为MedDRA优选术语和WHO药物全球活性成分提供向量表示,这些表示是从VigiBase(WHO全球药品和疫苗不良事件报告数据库)中的报告模式中学习得到的。在本研究中,使用了窗口大小为250的20维Skip-gram架构。我们的评估重点是通过vigiVec向量表示的余弦相似度确定的最近邻。临床相关性通过术语入侵者检测来衡量,即让一名医生从与特定AE或药物的四个最近邻中识别出一个随机选择的术语(入侵者)。稳定性通过在vigiVec的重复拟合中,每个AE或药物的十个最近邻中的平均重叠率来衡量。
在十个最近邻中,平均有1.8个AE属于同一个MedDRA高级别术语(HLT;例如,凝血障碍),1.3种药物属于同一个解剖治疗化学3级(ATC-3;例如,阿片类药物)。在入侵者检测任务中,当邻居和入侵者都从同一个HLT中选择时,入侵者检测率为46%。当从不同的HLT中选择时,检测率为79%。随机情况下,我们预期的检测率为20%(五分之一)。药物的相应比率在相同的ATC-3中为42%,在不同的ATC-3中为65%。AE的最近邻稳定性为80%,药物为64%。
用vigiVec确定的最近邻是稳定的,并且显示出高度的临床相关性。它们通常来自现有层次结构的不同部分,并对这些部分起到补充作用。