Han Weiru, Morris Robert, Bu Kun, Zhu Tianrui, Huang Hong, Cheng Feng
Department of Mathematics & Statistics, College of Art and Science, University of South Florida, Tampa, FL 33620, USA.
Department of Pharmaceutical Science, Taneja College of Pharmacy, University of South Florida, Tampa, FL 33613, USA.
Can J Physiol Pharmacol. 2025 Feb 1;103(2):56-69. doi: 10.1139/cjpp-2024-0078. Epub 2024 Dec 4.
The FDA Adverse Event Reporting System (FAERS) is a large-scale repository of reports concerning adverse drug events (ADEs). The same published clinical study or report may be reviewed by multiple companies or healthcare professionals and reported separately to the FDA, leading to a significant presence of duplicate reports in FAERS. These duplicate records can result in the identification of false associations between a given drug and an ADE. In this study, we first assessed the consistency of drug and ADE information in FAERS reports from Alzheimer's disease patients. Our findings showed greater congruence in drug-related information compared to ADE-related information, likely due to the greater heterogeneity and variety of terms or phrases used to describe ADEs. We then demonstrated that text comparison methods are effective in identifying duplicate records based on literature citations, testing 10 different comparison functions for their overall efficacy. Token-based methods (such as COSINE, QGRAM, and JACCARD), edit-based approaches (including OSA, LV, and DL), and sequence-based techniques like LCS have proven highly effective in accurately detecting identical publications within free text, demonstrating both high sensitivity and specificity. These results offer valuable insights for identifying duplicate FAERS reports and improving the reliability of detected associations between drugs and ADEs.
美国食品药品监督管理局不良事件报告系统(FAERS)是一个关于药品不良事件(ADEs)报告的大规模数据库。同一已发表的临床研究或报告可能会被多家公司或医疗保健专业人员审查,并分别向美国食品药品监督管理局报告,导致FAERS中存在大量重复报告。这些重复记录可能会导致在特定药物与药品不良事件之间识别出错误关联。在本研究中,我们首先评估了阿尔茨海默病患者FAERS报告中药物和药品不良事件信息的一致性。我们的研究结果表明,与药品不良事件相关信息相比,药物相关信息的一致性更高,这可能是由于用于描述药品不良事件的术语或短语具有更大的异质性和多样性。然后,我们证明了文本比较方法在基于文献引用识别重复记录方面是有效的,测试了10种不同的比较函数的整体有效性。基于词元的方法(如余弦相似度、QGRAM和杰卡德相似度)、基于编辑的方法(包括最优字符串对齐、莱文斯坦距离和差异距离)以及基于序列的技术(如最长公共子序列)已被证明在准确检测自由文本中的相同出版物方面非常有效,具有高灵敏度和特异性。这些结果为识别FAERS重复报告以及提高检测到的药物与药品不良事件之间关联的可靠性提供了有价值的见解。