Chen Guocai, Zhao Jieyi, Cohen Trevor, Tao Cui, Sun Jingchun, Xu Hua, Bernstam Elmer V, Lawson Andrew, Zeng Jia, Johnson Amber M, Holla Vijaykumar, Bailey Ann M, Lara-Guerra Humberto, Litzenburger Beate, Meric-Bernstam Funda, Jim Zheng W
Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA.
Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA
Database (Oxford). 2015 Apr 8;2015:bav034. doi: 10.1093/database/bav034. Print 2015.
Ambiguous gene names in the biomedical literature are a barrier to accurate information extraction. To overcome this hurdle, we generated Ontology Fingerprints for selected genes that are relevant for personalized cancer therapy. These Ontology Fingerprints were used to evaluate the association between genes and biomedical literature to disambiguate gene names. We obtained 93.6% precision for the test gene set and 80.4% for the area under a receiver-operating characteristics curve for gene and article association. The core algorithm was implemented using a graphics processing unit-based MapReduce framework to handle big data and to improve performance. We conclude that Ontology Fingerprints can help disambiguate gene names mentioned in text and analyse the association between genes and articles. Database URL: http://www.ontologyfingerprint.org
生物医学文献中模糊的基因名称是准确信息提取的障碍。为克服这一障碍,我们为与个性化癌症治疗相关的选定基因生成了本体指纹。这些本体指纹用于评估基因与生物医学文献之间的关联,以消除基因名称的歧义。对于测试基因集,我们获得了93.6%的精确率,对于基因与文章关联的受试者工作特征曲线下面积,精确率为80.4%。核心算法是使用基于图形处理单元的MapReduce框架实现的,以处理大数据并提高性能。我们得出结论,本体指纹有助于消除文本中提及的基因名称的歧义,并分析基因与文章之间的关联。数据库网址:http://www.ontologyfingerprint.org