Eshleman Ryan, Singh Rahul
Department of Computer Science, San Francisco State University, San Francisco, CA, 94132, USA.
Center for Discovery and Innovation in Parasitic Diseases, University of California, San Diego, USA.
BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):335. doi: 10.1186/s12859-016-1220-5.
Adverse drug events (ADEs) constitute one of the leading causes of post-therapeutic death and their identification constitutes an important challenge of modern precision medicine. Unfortunately, the onset and effects of ADEs are often underreported complicating timely intervention. At over 500 million posts per day, Twitter is a commonly used social media platform. The ubiquity of day-to-day personal information exchange on Twitter makes it a promising target for data mining for ADE identification and intervention. Three technical challenges are central to this problem: (1) identification of salient medical keywords in (noisy) tweets, (2) mapping drug-effect relationships, and (3) classification of such relationships as adverse or non-adverse.
We use a bipartite graph-theoretic representation called a drug-effect graph (DEG) for modeling drug and side effect relationships by representing the drugs and side effects as vertices. We construct individual DEGs on two data sources. The first DEG is constructed from the drug-effect relationships found in FDA package inserts as recorded in the SIDER database. The second DEG is constructed by mining the history of Twitter users. We use dictionary-based information extraction to identify medically-relevant concepts in tweets. Drugs, along with co-occurring symptoms are connected with edges weighted by temporal distance and frequency. Finally, information from the SIDER DEG is integrate with the Twitter DEG and edges are classified as either adverse or non-adverse using supervised machine learning.
We examine both graph-theoretic and semantic features for the classification task. The proposed approach can identify adverse drug effects with high accuracy with precision exceeding 85 % and F1 exceeding 81 %. When compared with leading methods at the state-of-the-art, which employ un-enriched graph-theoretic analysis alone, our method leads to improvements ranging between 5 and 8 % in terms of the aforementioned measures. Additionally, we employ our method to discover several ADEs which, though present in medical literature and Twitter-streams, are not represented in the SIDER databases.
We present a DEG integration model as a powerful formalism for the analysis of drug-effect relationships that is general enough to accommodate diverse data sources, yet rigorous enough to provide a strong mechanism for ADE identification.
药物不良事件(ADEs)是治疗后死亡的主要原因之一,对其进行识别是现代精准医学面临的一项重大挑战。不幸的是,ADEs的发生和影响常常未得到充分报告,这使得及时干预变得复杂。推特作为一个常用的社交媒体平台,每天有超过5亿条帖子。推特上日常个人信息交流的普遍性使其成为用于ADE识别和干预的数据挖掘的一个有前景的目标。这个问题有三个核心技术挑战:(1)在(有噪声的)推文里识别突出的医学关键词,(2)映射药物-效应关系,以及(3)将此类关系分类为不良或非不良。
我们使用一种称为药物-效应图(DEG)的二分图理论表示法,通过将药物和副作用表示为顶点来对药物和副作用关系进行建模。我们在两个数据源上构建单独的DEG。第一个DEG是根据SIDER数据库中记录的FDA药品说明书中发现的药物-效应关系构建的。第二个DEG是通过挖掘推特用户的历史记录构建的。我们使用基于字典的信息提取来识别推文中与医学相关的概念。药物以及同时出现的症状通过时间距离和频率加权的边相连。最后,将来自SIDER DEG的信息与推特DEG整合,并使用监督机器学习将边分类为不良或非不良。
我们检查了用于分类任务的图理论和语义特征。所提出的方法能够高精度地识别药物不良反应,精确率超过85%,F1值超过81%。与当前最先进的仅采用未丰富的图理论分析的领先方法相比,我们的方法在上述指标方面提高了5%至8%。此外,我们使用我们的方法发现了几种ADEs,这些ADEs虽然存在于医学文献和推特流中,但未在SIDER数据库中体现。
我们提出了一个DEG整合模型,作为一种强大的形式主义用于分析药物-效应关系,它具有足够的通用性以容纳不同的数据源,但又足够严格以提供一种强大的ADE识别机制。