Chaki Jyotismita
Department of Computational Intelligence, School of Computer Science and Engineering, Vellore Instiute of Technology, Vellore, India.
PeerJ Comput Sci. 2023 Jul 26;9:e1452. doi: 10.7717/peerj-cs.1452. eCollection 2023.
Figures and captions in medical documentation contain important information. As a result, researchers are becoming more interested in obtaining published medical figures from medical papers and utilizing the captions as a knowledge source.
This work introduces a unique and successful six-fold methodology for extracting figure-caption pairs. The A-torus wavelet transform is used to retrieve the first edge from the scanned page. Then, using the maximally stable extremal regions connected component feature, text and graphical contents are isolated from the edge document, and multi-layer perceptron is used to successfully detect and retrieve figures and captions from medical records. The figure-caption pair is then extracted using the bounding box approach. The files that contain the figures and captions are saved separately and supplied to the end useras theoutput of any investigation. The proposed approach is evaluated using a self-created database based on the pages collected from five open access books: Sergey Makarov, Gregory Noetscher and Aapo Nummenmaa's book "Brain and Human Body Modelling 2021", "Healthcare and Disease Burden in Africa" by Ilha Niohuru, "All-Optical Methods to Study Neuronal Function" by Eirini Papagiakoumou, "RNA, the Epicenter of Genetic Information" by John Mattick and Paulo Amaral and "Illustrated Manual of Pediatric Dermatology" by Susan Bayliss Mallory, Alanna Bree and Peggy Chern.
Experiments and findings comparing the new method to earlier systems reveal a significant increase in efficiency, demonstrating the suggested technique's robustness and efficiency.
医学文档中的图表和标题包含重要信息。因此,研究人员对从医学论文中获取已发表的医学图表并将标题用作知识来源越来越感兴趣。
这项工作引入了一种独特且成功的六重方法来提取图表-标题对。使用A-环面小波变换从扫描页面中检索第一条边缘。然后,利用最大稳定极值区域连通分量特征,从边缘文档中分离出文本和图形内容,并使用多层感知器成功地从医疗记录中检测和检索图表及标题。接着使用边界框方法提取图表-标题对。包含图表和标题的文件被分别保存,并作为任何调查的输出提供给最终用户。所提出的方法是基于从五本开放获取书籍中收集的页面,使用自行创建的数据库进行评估的:谢尔盖·马卡罗夫、格雷戈里·诺切尔和阿波·努梅纳亚的《2021年大脑与人体建模》、伊尔哈·尼奥胡鲁的《非洲的医疗保健与疾病负担》、艾琳妮·帕帕贾库穆的《研究神经元功能的全光方法》、约翰·马蒂克和保罗·阿马拉尔的《RNA,遗传信息的中心》以及苏珊·贝利斯·马洛里、阿兰娜·布里和佩吉·彻恩的《儿科皮肤病学图解手册》。
将新方法与早期系统进行比较的实验和结果表明效率有显著提高,证明了所建议技术的稳健性和效率。