Ono Toshihide, Kuhara Satoru
Department of Genetic Resources Technology, Faculty of Agriculture, Kyushu University, 6-10-1 Hakozaki Higashi-ku, Fukuoka 812-8581, Japan.
BMC Bioinformatics. 2014 Jun 10;15:179. doi: 10.1186/1471-2105-15-179.
Understanding the molecular mechanisms involved in disease is critical for the development of more effective and individualized strategies for prevention and treatment. The amount of disease-related literature, including new genetic information on the molecular mechanisms of disease, is rapidly increasing. Extracting beneficial information from literature can be facilitated by computational methods such as the knowledge-discovery approach. Several methods for mining gene-disease relationships using computational methods have been developed, however, there has been a lack of research evaluating specific disease candidate genes.
We present a novel method for gathering and prioritizing specific disease candidate genes. Our approach involved the construction of a set of Medical Subject Headings (MeSH) terms for the effective retrieval of publications related to a disease candidate gene. Information regarding the relationships between genes and publications was obtained from the gene2pubmed database. The set of genes was prioritized using a "weighted literature score" based on the number of publications and weighted by the number of genes occurring in a publication. Using our method for the disease states of pain and Alzheimer's disease, a total of 1101 pain candidate genes and 2810 Alzheimer's disease candidate genes were gathered and prioritized. The precision was 0.30 and the recall was 0.89 in the case study of pain. The precision was 0.04 and the recall was 0.6 in the case study of Alzheimer's disease. The precision-recall curve indicated that the performance of our method was superior to that of other publicly available tools.
Our method, which involved the use of a set of MeSH terms related to disease candidate genes and a novel weighted literature score, improved the accuracy of gathering and prioritizing candidate genes by focusing on a specific disease.
了解疾病相关的分子机制对于制定更有效和个性化的预防及治疗策略至关重要。包括疾病分子机制新遗传信息在内的疾病相关文献数量正在迅速增加。知识发现等计算方法有助于从文献中提取有益信息。已经开发了几种使用计算方法挖掘基因与疾病关系的方法,然而,缺乏对特定疾病候选基因进行评估的研究。
我们提出了一种收集特定疾病候选基因并对其进行优先级排序的新方法。我们的方法包括构建一组医学主题词(MeSH)术语,以有效检索与疾病候选基因相关的出版物。基因与出版物之间关系的信息来自gene2pubmed数据库。基于出版物数量并根据出版物中出现的基因数量进行加权,使用“加权文献分数”对基因集进行优先级排序。使用我们的方法针对疼痛和阿尔茨海默病的疾病状态,共收集并优先排序了1101个疼痛候选基因和2810个阿尔茨海默病候选基因。在疼痛案例研究中,精确率为0.30,召回率为0.89。在阿尔茨海默病案例研究中,精确率为0.04,召回率为0.6。精确率-召回率曲线表明我们方法的性能优于其他公开可用工具。
我们的方法涉及使用一组与疾病候选基因相关的MeSH术语和一种新的加权文献分数,通过关注特定疾病提高了收集和优先排序候选基因的准确性。