Eftekhar Shayan, Eftekhar Behzad
The University of Queensland, Brisbane, Australia.
Department of Neurosurgery, Nepean Clinical School, Sydney Medical School, Faculty of Medicine and Health, The University of Sydney, Sydney, Australia.
Heliyon. 2024 Feb 21;10(5):e26831. doi: 10.1016/j.heliyon.2024.e26831. eCollection 2024 Mar 15.
Automated supervised text classification methods require preclassified training data. Their application in scenarios that a large amount of preclassified data is not accessible is challenging. Neurosurgical literature classification into subspecialties is an example of this situation. We have introduced an automated similarity-based text classification method, evaluated it along with two other automated methods and applied the introduced method in neurosurgical literature classification.
Performance of an introduced similarity-based text classification method along with two other automated methods (Lbl2Vec and keyword counting-based methods) was compared with performance of two senior neurosurgery registrars in classification of neurosurgical literature to 5 subspecialties. The Kappa-statistic measure of interrater agreement, overall marginal homogeneity using the Stuart-Maxwell test, marginal homogeneity relative to individual categories using McNemar tests and the sensitivity and specificity of each of the three methods were calculated.The introduced method was used to classify 211617 neurosurgical publications indexed in Pubmed to different subspecialties based on keywords extracted from subspecialty sections of a neurosurgery textbook.
The introduced similarity-based method showed the highest agreement with the registrars (raw agreement and Kappa value) followed by the Lbl2Vec and the counting-based method. Classifications of the English neurosurgical publications indexed in Pubmed into categories of Oncology, Vascular, Spine and functional using the introduced similarity-based method were more reliable (closer to the registrars' classifications) than Cranial trauma. The classifications and future forecast showed highest publications in Oncology, followed by Cranial trauma, Vascular, spine and functional neurosurgery.
The classification of the English neurosurgical publications indexed in Pubmed to different subspecialties, using the introduced method, shows that Oncology and tumour has been the main battleground for the neurosurgeons over years and probably in the near future. The performance of the introduced classification method in comparison with the human performance shows its potential application in the situations that enough preclassified data are not accessible for automated text classification.
自动化监督文本分类方法需要预先分类的训练数据。在无法获取大量预先分类数据的场景中应用这些方法具有挑战性。神经外科文献按亚专业分类就是这种情况的一个例子。我们引入了一种基于相似度的自动化文本分类方法,将其与其他两种自动化方法一起进行评估,并将引入的方法应用于神经外科文献分类。
将引入的基于相似度的文本分类方法与其他两种自动化方法(Lbl2Vec和基于关键词计数的方法)的性能与两名资深神经外科住院医师将神经外科文献分类为5个亚专业的性能进行比较。计算了评分者间一致性的Kappa统计量、使用Stuart-Maxwell检验的总体边际同质性、使用McNemar检验相对于各个类别的边际同质性以及三种方法各自的敏感性和特异性。基于从神经外科学教科书亚专业章节中提取的关键词,使用引入的方法将211617篇在Pubmed上索引的神经外科出版物分类到不同的亚专业。
引入的基于相似度的方法与住院医师的一致性最高(原始一致性和Kappa值),其次是Lbl2Vec和基于计数的方法。使用引入的基于相似度的方法将Pubmed上索引的英文神经外科出版物分类为肿瘤学、血管、脊柱和功能性类别比颅脑创伤更可靠(更接近住院医师的分类)。分类和未来预测显示肿瘤学领域的出版物最多,其次是颅脑创伤、血管、脊柱和功能性神经外科。
使用引入的方法将Pubmed上索引的英文神经外科出版物分类到不同的亚专业表明,多年来肿瘤学和肿瘤一直是神经外科医生的主要战场,并且可能在不久的将来也是如此。与人工性能相比,引入的分类方法的性能表明其在无法获取足够预先分类数据进行自动化文本分类的情况下的潜在应用。