Department of Medical Informatics, Tzu Chi University, Hualien, Taiwan.
BMC Bioinformatics. 2014 Aug 25;15(1):286. doi: 10.1186/1471-2105-15-286.
Curation of gene-disease associations published in literature should be based on careful and frequent survey of the references that are highly related to specific gene-disease associations. Retrieval of the references is thus essential for timely and complete curation.
We present a technique CRFref (Conclusive, Rich, and Focused References) that, given a gene-disease pair < g, d>, ranks high those biomedical references that are likely to provide conclusive, rich, and focused results about g and d. Such references are expected to be highly related to the association between g and d. CRFref ranks candidate references based on their scores. To estimate the score of a reference r, CRFref estimates and integrates three measures: degree of conclusiveness, degree of richness, and degree of focus of r with respect to < g, d>. To evaluate CRFref, experiments are conducted on over one hundred thousand references for over one thousand gene-disease pairs. Experimental results show that CRFref performs significantly better than several typical types of baselines in ranking high those references that expert curators select to develop the summaries for specific gene-disease associations.
CRFref is a good technique to rank high those references that are highly related to specific gene-disease associations. It can be incorporated into existing search engines to prioritize biomedical references for curators and researchers, as well as those text mining systems that aim at the study of gene-disease associations.
文献中发表的基因-疾病关联的编纂应该基于对与特定基因-疾病关联高度相关的参考文献的仔细和频繁调查。因此,检索参考文献对于及时和完整的编纂至关重要。
我们提出了一种技术 CRFref(结论性、丰富和聚焦的参考文献),给定基因-疾病对 <g, d>,对那些可能提供关于 g 和 d 的结论性、丰富和聚焦结果的生物医学参考文献进行高排名。这些参考文献预计与 g 和 d 之间的关联高度相关。CRFref 根据它们的分数对候选参考文献进行排名。为了估计参考 r 的分数,CRFref 估计并整合了三个度量标准:结论性程度、丰富性程度和 r 对 <g, d> 的聚焦程度。为了评估 CRFref,我们对超过一万个基因-疾病对的超过十万个参考文献进行了实验。实验结果表明,CRFref 在对专家编纂者选择的那些与特定基因-疾病关联相关的参考文献进行高排名方面,明显优于几种典型的基线类型。
CRFref 是一种对与特定基因-疾病关联高度相关的参考文献进行高排名的好技术。它可以被整合到现有的搜索引擎中,为编纂者和研究人员以及那些旨在研究基因-疾病关联的文本挖掘系统优先考虑生物医学参考文献。