Pérez-Pérez Martin, Pérez-Rodríguez Gael, Rabal Obdulia, Vazquez Miguel, Oyarzabal Julen, Fdez-Riverola Florentino, Valencia Alfonso, Krallinger Martin, Lourenço Anália
ESEI - Department of Computer Science, University of Vigo, Ourense, Spain.
Small Molecule Discovery Platform, Molecular Therapeutics Program, Center for Applied Medical Research (CIMA), University of Navarra, Pamplona, Spain.
Database (Oxford). 2016 Aug 19;2016. doi: 10.1093/database/baw120. Print 2016.
Biomedical text mining methods and technologies have improved significantly in the last decade. Considerable efforts have been invested in understanding the main challenges of biomedical literature retrieval and extraction and proposing solutions to problems of practical interest. Most notably, community-oriented initiatives such as the BioCreative challenge have enabled controlled environments for the comparison of automatic systems while pursuing practical biomedical tasks. Under this scenario, the present work describes the Markyt Web-based document curation platform, which has been implemented to support the visualisation, prediction and benchmark of chemical and gene mention annotations at BioCreative/CHEMDNER challenge. Creating this platform is an important step for the systematic and public evaluation of automatic prediction systems and the reusability of the knowledge compiled for the challenge. Markyt was not only critical to support the manual annotation and annotation revision process but also facilitated the comparative visualisation of automated results against the manually generated Gold Standard annotations and comparative assessment of generated results. We expect that future biomedical text mining challenges and the text mining community may benefit from the Markyt platform to better explore and interpret annotations and improve automatic system predictions.Database URL: http://www.markyt.org, https://github.com/sing-group/Markyt.
在过去十年中,生物医学文本挖掘方法和技术有了显著改进。人们投入了大量精力来理解生物医学文献检索与提取的主要挑战,并针对实际感兴趣的问题提出解决方案。最值得注意的是,诸如生物创意挑战赛(BioCreative challenge)这样以社区为导向的举措,在追求实际生物医学任务的同时,为自动系统的比较提供了可控环境。在这种情况下,本研究描述了基于网络的Markyt文档管理平台,该平台已被实现用于支持生物创意/化学实体识别挑战赛(BioCreative/CHEMDNER challenge)中化学物质和基因提及注释的可视化、预测和基准测试。创建这个平台对于自动预测系统的系统和公开评估以及为挑战赛汇编的知识的可重用性而言是重要的一步。Markyt不仅对于支持人工注释和注释修订过程至关重要,而且还便于将自动生成的结果与人工生成的金标准注释进行对比可视化,以及对生成结果进行对比评估。我们期望未来的生物医学文本挖掘挑战赛和文本挖掘社区能够从Markyt平台中受益,从而更好地探索和解释注释,并改进自动系统的预测。数据库网址:http://www.markyt.org,https://github.com/sing-group/Markyt 。