Suppr超能文献

评估标准和语义增强距离度量在神经病学患者中的应用。

Evaluation of standard and semantically-augmented distance metrics for neurology patients.

机构信息

Department of Neurology and Rehabilitation, University of Illinois at Chicago, Chicago, IL, 60612, USA.

Department of Internal Medicine, Texas Tech University Health Sciences Center, Lubbock, TX, USA.

出版信息

BMC Med Inform Decis Mak. 2020 Aug 26;20(1):203. doi: 10.1186/s12911-020-01217-8.

Abstract

BACKGROUND

Patient distances can be calculated based on signs and symptoms derived from an ontological hierarchy. There is controversy as to whether patient distance metrics that consider the semantic similarity between concepts can outperform standard patient distance metrics that are agnostic to concept similarity. The choice of distance metric can dominate the performance of classification or clustering algorithms. Our objective was to determine if semantically augmented distance metrics would outperform standard metrics on machine learning tasks.

METHODS

We converted the neurological findings from 382 published neurology cases into sets of concepts with corresponding machine-readable codes. We calculated patient distances by four different metrics (cosine distance, a semantically augmented cosine distance, Jaccard distance, and a semantically augmented bipartite distance). Semantic augmentation for two of the metrics depended on concept similarities from a hierarchical neuro-ontology. For machine learning algorithms, we used the patient diagnosis as the ground truth label and patient findings as machine learning features. We assessed classification accuracy for four classifiers and cluster quality for two clustering algorithms for each of the distance metrics.

RESULTS

Inter-patient distances were smaller when the distance metric was semantically augmented. Classification accuracy and cluster quality were not significantly different by distance metric.

CONCLUSION

Although semantic augmentation reduced inter-patient distances, we did not find improved classification accuracy or improved cluster quality with semantically augmented patient distance metrics when applied to a dataset of neurology patients. Further work is needed to assess the utility of semantically augmented patient distances.

摘要

背景

可以根据本体论层次中得出的症状和体征来计算患者之间的距离。关于考虑概念之间语义相似性的患者距离指标是否优于对概念相似性不敏感的标准患者距离指标,存在争议。距离指标的选择可以主导分类或聚类算法的性能。我们的目的是确定在机器学习任务中,语义增强的距离指标是否优于标准指标。

方法

我们将 382 篇已发表的神经病学病例中的神经学发现转换为具有相应机器可读代码的概念集。我们通过四种不同的指标(余弦距离、语义增强余弦距离、杰卡德距离和语义增强二分图距离)计算患者之间的距离。其中两种指标的语义增强取决于来自层次神经本体的概念相似性。对于机器学习算法,我们将患者诊断作为地面真实标签,将患者发现作为机器学习特征。我们评估了每种距离指标的四种分类器的分类准确性和两种聚类算法的聚类质量。

结果

当使用语义增强的距离指标时,患者之间的距离更小。距离指标对分类准确性和聚类质量没有显著影响。

结论

尽管语义增强缩小了患者之间的距离,但在应用于神经病学患者数据集时,我们并没有发现语义增强的患者距离指标可以提高分类准确性或提高聚类质量。需要进一步研究来评估语义增强的患者距离的实用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/633a/7448345/2a94b3ec9c17/12911_2020_1217_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验