Department of Computer Science, University of Arizona, Tucson, AZ, USA.
School of Medicine, Oregon Health & Science University, Portland, OR, USA.
Database (Oxford). 2018 Jan 1;2018:bay098. doi: 10.1093/database/bay098.
PubMed, a repository and search engine for biomedical literature, now indexes >1 million articles each year. This exceeds the processing capacity of human domain experts, limiting our ability to truly understand many diseases. We present Reach, a system for automated, large-scale machine reading of biomedical papers that can extract mechanistic descriptions of biological processes with relatively high precision at high throughput. We demonstrate that combining the extracted pathway fragments with existing biological data analysis algorithms that rely on curated models helps identify and explain a large number of previously unidentified mutually exclusive altered signaling pathways in seven different cancer types. This work shows that combining human-curated 'big mechanisms' with extracted 'big data' can lead to a causal, predictive understanding of cellular processes and unlock important downstream applications.
PubMed 是一个生物医学文献的存储库和搜索引擎,现在每年索引超过 100 万篇文章。这超过了人类领域专家的处理能力,限制了我们真正理解许多疾病的能力。我们提出了 Reach,这是一个用于自动、大规模机器阅读生物医学论文的系统,可以以相对较高的精度和高通量提取生物过程的机制描述。我们证明,将提取的途径片段与依赖于精心设计模型的现有生物数据分析算法相结合,有助于识别和解释七种不同癌症类型中大量以前未被识别的相互排斥的改变的信号通路。这项工作表明,将人类精心设计的“大机制”与提取的“大数据”相结合,可以导致对细胞过程的因果预测理解,并解锁重要的下游应用。