The Harker School, San Jose, CA, United States of America.
PLoS One. 2020 May 27;15(5):e0233438. doi: 10.1371/journal.pone.0233438. eCollection 2020.
Researchers and clinicians face a significant challenge in keeping up-to-date with the rapid rate of new associations between genetic mutations and diseases. To remedy this problem, this research mined the ClinicalTrials.gov corpus to extract relevant biological insights, produce unique reports to summarize findings, and make the meta-data available via APIs. An automated text-analysis pipeline performed the following features: parsing the ClinicalTrials.gov files, extracting and analyzing mutations from the corpus, mapping clinical trials to Human Phenotype Ontology (HPO), and finding associations between clinical trials and HPO nodes. Unique reports were created for each mutation (SNPs and protein mutations) mentioned in the corpus, as well as for each clinical trial that references a mutation. These reports, which have been run over multiple time points, along with APIs to access meta-data, are freely available at http://snpminertrials.com. Additionally, HPO was used to normalize disease terms and associate clinical trials with relevant genes. The creation of the pipeline and reports, the association of clinical trials with HPO terms, and the insights, public repository, and APIs produced are all novel in this work. The freely-available resources present relevant biological information and novel insights between biomedical entities in a robust and accessible manner, mitigating the challenge of being informed about new associations between mutations, genes, and diseases.
研究人员和临床医生在跟上基因突变与疾病之间新关联的快速发展方面面临着重大挑战。为了解决这个问题,这项研究挖掘了 ClinicalTrials.gov 语料库,以提取相关的生物学见解,生成独特的报告来总结发现,并通过 API 提供元数据。一个自动化的文本分析管道执行了以下功能:解析 ClinicalTrials.gov 文件,从语料库中提取和分析突变,将临床试验映射到人类表型本体 (HPO),并在临床试验和 HPO 节点之间寻找关联。对语料库中提到的每个突变(SNP 和蛋白质突变)以及引用突变的每个临床试验都创建了独特的报告。这些报告已经在多个时间点上运行,并提供了访问元数据的 API,可在 http://snpminertrials.com 上免费获取。此外,HPO 用于规范化疾病术语,并将临床试验与相关基因联系起来。该管道和报告的创建、临床试验与 HPO 术语的关联以及产生的见解、公共存储库和 API 在这项工作中都是新颖的。这些免费提供的资源以强大且易于访问的方式呈现了生物医学实体之间的相关生物学信息和新颖见解,减轻了了解基因突变、基因和疾病之间新关联的挑战。