Information Center, Beijing University of Chinese Medicine, Beijing, China.
Lester and Sue Smith Breast Center, Baylor College of Medicine, Houston, TX, USA.
BMC Bioinformatics. 2018 Dec 28;19(Suppl 17):495. doi: 10.1186/s12859-018-2463-0.
BACKGROUND: Due to recent technology advancements, disease related knowledge is growing rapidly. It becomes nontrivial to go through all published literature to identify associations between human diseases and genetic, environmental, and life style factors, disease symptoms, and treatment strategies. Here we report DLAD4U (Disease List Automatically Derived For You), an efficient, accurate and easy-to-use disease search engine based on PubMed literature. RESULTS: DLAD4U uses the eSearch and eFetch APIs from the National Center for Biotechnology Information (NCBI) to find publications related to a query and to identify diseases from the retrieved publications. The hypergeometric test was used to prioritize identified diseases for displaying to users. DLAD4U accepts any valid queries for PubMed, and the output results include a ranked disease list, information associated with each disease, chronologically-ordered supporting publications, a summary of the run, and links for file export. DLAD4U outperformed other disease search engines in our comparative evaluation using selected genes and drugs as query terms and manually curated data as "gold standard". For 100 genes that are associated with only one disease in the gold standard, the Mean Average Precision (MAP) measure from DLAD4U was 0.77, which clearly outperformed other tools. For 10 genes that are associated with multiple diseases in the gold standard, the mean precision, recall and F-measure scores from DLAD4U were always higher than those from other tools. The superior performance of DLAD4U was further confirmed using 100 drugs as queries, with an MAP of 0.90. CONCLUSIONS: DLAD4U is a new, intuitive disease search engine that takes advantage of existing resources at NCBI to provide computational efficiency and uses statistical analyses to ensure accuracy. DLAD4U is publicly available at http://dlad4u.zhang-lab.org .
背景:由于最近技术的进步,疾病相关知识正在迅速增长。要浏览所有已发表的文献,以确定人类疾病与遗传、环境和生活方式因素、疾病症状和治疗策略之间的关联,并非易事。在这里,我们报告了基于 PubMed 文献的高效、准确且易于使用的疾病搜索引擎 DLAD4U(自动为您生成疾病列表)。
结果:DLAD4U 使用来自国家生物技术信息中心(NCBI)的 eSearch 和 eFetch API 来查找与查询相关的出版物,并从检索到的出版物中识别疾病。超几何检验用于对识别出的疾病进行优先级排序,以便向用户显示。DLAD4U 接受任何有效的 PubMed 查询,输出结果包括按排名列出的疾病列表、与每个疾病相关的信息、按时间顺序排列的支持出版物、运行摘要以及文件导出链接。在使用选定的基因和药物作为查询词以及手工整理的数据作为“黄金标准”进行的比较评估中,DLAD4U 在我们的比较评估中优于其他疾病搜索引擎。对于黄金标准中仅与一种疾病相关的 100 个基因,DLAD4U 的平均准确率(MAP)测量值为 0.77,明显优于其他工具。对于黄金标准中与多种疾病相关的 10 个基因,DLAD4U 的平均精度、召回率和 F1 分数始终高于其他工具。使用 100 种药物作为查询词进一步证实了 DLAD4U 的优越性能,其 MAP 为 0.90。
结论:DLAD4U 是一种新的直观的疾病搜索引擎,它利用 NCBI 现有的资源提供计算效率,并使用统计分析确保准确性。DLAD4U 可在 http://dlad4u.zhang-lab.org 上公开获得。
BMC Bioinformatics. 2018-12-28
BMC Genomics. 2012-12-17
BMC Bioinformatics. 2014-11-6
Comput Methods Programs Biomed. 2016-7
J Biomed Semantics. 2016-4-29
BMC Bioinformatics. 2004-4-29
J Biomed Inform. 2023-6
Nat Biotechnol. 2017-10
Nucleic Acids Res. 2017-1-4
Mol Cell Proteomics. 2017-1
Environ Health Perspect. 2016-10
Nat Med. 2015-11