Haslam Bryan, Perez-Breva Luis
Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
J Am Med Inform Assoc. 2017 Jan;24(1):13-23. doi: 10.1093/jamia/ocw003. Epub 2016 May 17.
Our objective is to test the limits of the assumption that better learning from data in medicine requires more granular data. We hypothesize that clinical trial metadata contains latent scientific, clinical, and regulatory expert knowledge that can be accessed to draw conclusions about the underlying biology of diseases. We seek to demonstrate that this latent information can be uncovered from the whole body of clinical trials.
We extract free-text metadata from 93 654 clinical drug trials and introduce a representation that allows us to compare different trials. We then construct a network of diseases using only the trial metadata. We view each trial as the summation of expert knowledge of biological mechanisms and medical evidence linking a disease to a drug believed to modulate the pathways of that disease. Our network representation allows us to visualize disease relationships based on this underlying information.
Our disease network shows surprising agreement with another disease network based on genetic data and on the Medical Subject Headings (MeSH) taxonomy, yet also contains unique disease similarities.
The agreement of our results with other sources indicates that our premise regarding latent expert knowledge holds. The disease relationships unique to our network may be used to generate hypotheses for future biological and clinical research as well as drug repurposing and design. Our results provide an example of using experimental data on humans to generate biologically useful information and point to a set of new and promising strategies to link clinical outcomes data back to biological research.
我们的目标是检验这样一种假设的局限性,即医学领域中从数据中更好地学习需要更细化的数据。我们假设临床试验元数据包含潜在的科学、临床和监管专家知识,这些知识可用于得出有关疾病潜在生物学的结论。我们试图证明可以从整个临床试验中挖掘出这种潜在信息。
我们从93654项临床药物试验中提取自由文本元数据,并引入一种表示方法,使我们能够比较不同的试验。然后,我们仅使用试验元数据构建一个疾病网络。我们将每项试验视为生物学机制专家知识与医学证据的总和,这些证据将一种疾病与一种被认为可调节该疾病通路的药物联系起来。我们的网络表示方法使我们能够基于这些潜在信息直观呈现疾病之间的关系。
我们的疾病网络与另一个基于基因数据和医学主题词(MeSH)分类法的疾病网络表现出惊人的一致性,但也包含独特的疾病相似性。
我们的结果与其他来源的一致性表明,我们关于潜在专家知识的前提是成立的。我们网络中独特的疾病关系可用于为未来的生物学和临床研究以及药物重新利用和设计生成假设。我们的结果提供了一个利用人类实验数据生成生物学有用信息的例子,并指出了一系列将临床结果数据与生物学研究联系起来的新的、有前景的策略。