Suppr超能文献

OSIRISv1.2:一种用于生物医学文献中基因序列变异的命名实体识别系统。

OSIRISv1.2: a named entity recognition system for sequence variants of genes in biomedical literature.

作者信息

Furlong Laura I, Dach Holger, Hofmann-Apitius Martin, Sanz Ferran

机构信息

Research Unit on Biomedical Informatics (GRIB), IMIM, UPF, PRBB, c/Dr, Aiguader 88, E-08003 Barcelona, Spain.

出版信息

BMC Bioinformatics. 2008 Feb 5;9:84. doi: 10.1186/1471-2105-9-84.

Abstract

BACKGROUND

Single Nucleotide Polymorphisms, among other type of sequence variants, constitute key elements in genetic epidemiology and pharmacogenomics. While sequence data about genetic variation is found at databases such as dbSNP, clues about the functional and phenotypic consequences of the variations are generally found in biomedical literature. The identification of the relevant documents and the extraction of the information from them are hampered by the large size of literature databases and the lack of widely accepted standard notation for biomedical entities. Thus, automatic systems for the identification of citations of allelic variants of genes in biomedical texts are required.

RESULTS

Our group has previously reported the development of OSIRIS, a system aimed at the retrieval of literature about allelic variants of genes http://ibi.imim.es/osirisform.html. Here we describe the development of a new version of OSIRIS (OSIRISv1.2, http://ibi.imim.es/OSIRISv1.2.html) which incorporates a new entity recognition module and is built on top of a local mirror of the MEDLINE collection and HgenetInfoDB: a database that collects data on human gene sequence variations. The new entity recognition module is based on a pattern-based search algorithm for the identification of variation terms in the texts and their mapping to dbSNP identifiers. The performance of OSIRISv1.2 was evaluated on a manually annotated corpus, resulting in 99% precision, 82% recall, and an F-score of 0.89. As an example, the application of the system for collecting literature citations for the allelic variants of genes related to the diseases intracranial aneurysm and breast cancer is presented.

CONCLUSION

OSIRISv1.2 can be used to link literature references to dbSNP database entries with high accuracy, and therefore is suitable for collecting current knowledge on gene sequence variations and supporting the functional annotation of variation databases. The application of OSIRISv1.2 in combination with controlled vocabularies like MeSH provides a way to identify associations of biomedical interest, such as those that relate SNPs with diseases.

摘要

背景

单核苷酸多态性以及其他类型的序列变异是遗传流行病学和药物基因组学的关键要素。虽然诸如dbSNP等数据库中可找到有关遗传变异的序列数据,但关于这些变异的功能和表型后果的线索通常存在于生物医学文献中。文献数据库规模庞大以及生物医学实体缺乏广泛接受的标准标注,阻碍了相关文献的识别以及从中提取信息。因此,需要用于识别生物医学文本中基因等位变异引用的自动化系统。

结果

我们团队之前报道了OSIRIS系统的开发,该系统旨在检索有关基因等位变异的文献(http://ibi.imim.es/osirisform.html)。在此,我们描述了新版本的OSIRIS(OSIRISv1.2,http://ibi.imim.es/OSIRISv1.2.html)的开发情况,它包含一个新的实体识别模块,并且基于MEDLINE文集的本地镜像和HgenetInfoDB构建:HgenetInfoDB是一个收集人类基因序列变异数据的数据库。新的实体识别模块基于一种基于模式的搜索算法,用于识别文本中的变异术语并将其映射到dbSNP标识符。在一个人工标注的语料库上对OSIRISv1.2的性能进行了评估,结果显示精确率为99%,召回率为82%,F值为0.89。例如,展示了该系统用于收集与颅内动脉瘤和乳腺癌相关基因的等位变异的文献引用的应用情况。

结论

OSIRISv1.2可用于将文献参考与dbSNP数据库条目高精度地链接起来,因此适用于收集有关基因序列变异的现有知识,并支持变异数据库的功能注释。OSIRISv1.2与诸如医学主题词表(MeSH)等受控词汇表相结合的应用,提供了一种识别具有生物医学意义的关联的方法,例如那些将单核苷酸多态性(SNP)与疾病相关联的关联。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2c80/2277400/81c830a3796b/1471-2105-9-84-1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验