Haidar Ali, Field Matthew, Batumalai Vikneswary, Cloak Kirrily, Al Mouiee Daniel, Chlap Phillip, Huang Xiaoshui, Chin Vicky, Aly Farhannah, Carolan Martin, Sykes Jonathan, Vinod Shalini K, Delaney Geoffrey P, Holloway Lois
Ingham Institute for Applied Medical Research, Liverpool, NSW 2170, Australia.
Liverpool and Macarthur Cancer Therapy Centres, Liverpool, NSW 2170, Australia.
Cancers (Basel). 2023 Jan 17;15(3):564. doi: 10.3390/cancers15030564.
In progressing the use of big data in health systems, standardised nomenclature is required to enable data pooling and analyses. In many radiotherapy planning systems and their data archives, target volumes (TV) and organ-at-risk (OAR) structure nomenclature has not been standardised. Machine learning (ML) has been utilised to standardise volumes nomenclature in retrospective datasets. However, only subsets of the structures have been targeted. Within this paper, we proposed a new approach for standardising all the structures nomenclature by using multi-modal artificial neural networks. A cohort consisting of 1613 breast cancer patients treated with radiotherapy was identified from Liverpool & Macarthur Cancer Therapy Centres, NSW, Australia. Four types of volume characteristics were generated to represent each target and OAR volume: textual features, geometric features, dosimetry features, and imaging data. Five datasets were created from the original cohort, the first four represented different subsets of volumes and the last one represented the whole list of volumes. For each dataset, 15 sets of combinations of features were generated to investigate the effect of using different characteristics on the standardisation performance. The best model reported 99.416% classification accuracy over the hold-out sample when used to standardise all the nomenclatures in a breast cancer radiotherapy plan into 21 classes. Our results showed that ML based automation methods can be used for standardising naming conventions in a radiotherapy plan taking into consideration the inclusion of multiple modalities to better represent each volume.
在推进医疗系统中大数据的应用时,需要标准化的命名法来实现数据汇总和分析。在许多放射治疗计划系统及其数据档案中,靶区体积(TV)和危及器官(OAR)结构的命名尚未标准化。机器学习(ML)已被用于回顾性数据集中的体积命名标准化。然而,仅针对了部分结构子集。在本文中,我们提出了一种使用多模态人工神经网络对所有结构命名进行标准化的新方法。从澳大利亚新南威尔士州利物浦和麦克阿瑟癌症治疗中心识别出一个由1613例接受放射治疗的乳腺癌患者组成的队列。生成了四种类型的体积特征来代表每个靶区和OAR体积:文本特征、几何特征、剂量学特征和影像数据。从原始队列中创建了五个数据集,前四个代表不同的体积子集,最后一个代表整个体积列表。对于每个数据集,生成了1组特征组合来研究使用不同特征对标准化性能的影响。当用于将乳腺癌放射治疗计划中的所有命名标准化为21个类别时,最佳模型在留出样本上的分类准确率为99.416%。我们的结果表明,基于机器学习的自动化方法可用于放射治疗计划中的命名规范标准化,同时考虑纳入多种模态以更好地表示每个体积。