Chen Yang, Li Li, Zhang Guo-Qiang, Xu Rong
Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA.
Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA Department of Electrical Engineering and Computer Science, Department of Epidemiology and Biostatistics and Department of Family Medicine and Community Health, Case Western Reserve University, Cleveland, OH 44106, USA.
Bioinformatics. 2015 Jun 15;31(12):i276-83. doi: 10.1093/bioinformatics/btv245.
Discerning genetic contributions to diseases not only enhances our understanding of disease mechanisms, but also leads to translational opportunities for drug discovery. Recent computational approaches incorporate disease phenotypic similarities to improve the prediction power of disease gene discovery. However, most current studies used only one data source of human disease phenotype. We present an innovative and generic strategy for combining multiple different data sources of human disease phenotype and predicting disease-associated genes from integrated phenotypic and genomic data.
To demonstrate our approach, we explored a new phenotype database from biomedical ontologies and constructed Disease Manifestation Network (DMN). We combined DMN with mimMiner, which was a widely used phenotype database in disease gene prediction studies. Our approach achieved significantly improved performance over a baseline method, which used only one phenotype data source. In the leave-one-out cross-validation and de novo gene prediction analysis, our approach achieved the area under the curves of 90.7% and 90.3%, which are significantly higher than 84.2% (P < e(-4)) and 81.3% (P < e(-12)) for the baseline approach. We further demonstrated that our predicted genes have the translational potential in drug discovery. We used Crohn's disease as an example and ranked the candidate drugs based on the rank of drug targets. Our gene prediction approach prioritized druggable genes that are likely to be associated with Crohn's disease pathogenesis, and our rank of candidate drugs successfully prioritized the Food and Drug Administration-approved drugs for Crohn's disease. We also found literature evidence to support a number of drugs among the top 200 candidates. In summary, we demonstrated that a novel strategy combining unique disease phenotype data with system approaches can lead to rapid drug discovery.
nlp.
edu/public/data/DMN
识别疾病的遗传贡献不仅能增进我们对疾病机制的理解,还能为药物发现带来转化机会。最近的计算方法纳入了疾病表型相似性,以提高疾病基因发现的预测能力。然而,目前大多数研究仅使用人类疾病表型的一个数据源。我们提出了一种创新的通用策略,用于整合多种不同的人类疾病表型数据源,并从整合的表型和基因组数据中预测疾病相关基因。
为了证明我们的方法,我们从生物医学本体中探索了一个新的表型数据库,并构建了疾病表现网络(DMN)。我们将DMN与mimMiner(疾病基因预测研究中广泛使用的表型数据库)相结合。我们的方法相对于仅使用一个表型数据源的基线方法,性能有显著提升。在留一法交叉验证和从头基因预测分析中,我们的方法在曲线下面积分别达到了90.7%和90.3%,显著高于基线方法的84.2%(P < e(-4))和81.3%(P < e(-12))。我们进一步证明了我们预测的基因在药物发现中具有转化潜力。我们以克罗恩病为例,根据药物靶点的排名对候选药物进行排序。我们的基因预测方法优先考虑了可能与克罗恩病发病机制相关的可成药基因,并且我们对候选药物的排名成功地对美国食品药品监督管理局批准的治疗克罗恩病的药物进行了优先排序。我们还发现文献证据支持前200名候选药物中的一些药物。总之,我们证明了一种将独特的疾病表型数据与系统方法相结合的新策略可以实现快速的药物发现。
nlp.
edu/public/data/DMN