Suppr超能文献

癌症登记数据验证中基于本体的人工智能设计模式与约束

Ontology-Based AI Design Patterns and Constraints in Cancer Registry Data Validation.

作者信息

Nicholson Nicholas, Giusti Francesco, Martos Carmen

机构信息

European Commission, Joint Research Centre (JRC), 21027 Ispra, Italy.

Belgian Cancer Registry, 1210 Brussels, Belgium.

出版信息

Cancers (Basel). 2023 Dec 12;15(24):5812. doi: 10.3390/cancers15245812.

Abstract

Data validation in cancer registration is a critical operation but is resource-intensive and has traditionally depended on proprietary software. Ontology-based AI is a novel approach utilising machine reasoning based on axioms formally described in description logic. This is a different approach from deep learning AI techniques but not exclusive of them. The advantage of the ontology approach lies in its ability to address a number of challenges concurrently. The disadvantages relate to computational costs, which increase with language expressivity and the size of data sets, and class containment restrictions imposed by description logics. Both these aspects would benefit from the availability of design patterns, which is the motivation behind this study. We modelled the European cancer registry data validation rules in description logic using a number of design patterns and showed the viability of the approach. Reasoning speeds are a limiting factor for large cancer registry data sets comprising many hundreds of thousands of records, but these can be offset to a certain extent by developing the ontology in a modular way. Data validation is also a highly parallelisable process. Important potential future work in this domain would be to identify and optimise reusable design patterns, paying particular attention to avoiding any unintended reasoning efficiency hotspots.

摘要

癌症登记中的数据验证是一项关键操作,但资源密集且传统上依赖专有软件。基于本体的人工智能是一种新颖的方法,它利用基于描述逻辑中形式化描述的公理进行机器推理。这是一种与深度学习人工智能技术不同的方法,但并非与之相互排斥。本体方法的优势在于其能够同时应对诸多挑战。其劣势涉及计算成本,计算成本会随着语言表达能力和数据集规模的增加而上升,以及描述逻辑所施加的类包含限制。这两个方面都将受益于设计模式的可用性,这也是本研究的动机所在。我们使用多种设计模式在描述逻辑中对欧洲癌症登记数据验证规则进行建模,并展示了该方法的可行性。推理速度是包含数十万条记录的大型癌症登记数据集的一个限制因素,但通过以模块化方式开发本体,在一定程度上可以抵消这一因素的影响。数据验证也是一个高度可并行化的过程。该领域未来重要的潜在工作将是识别和优化可重复使用的设计模式,尤其要注意避免任何意外的推理效率热点。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验