Suppr超能文献

验证 3 种计算机辅助面部表型工具(DeepGestalt、GestaltMatcher 和 D-Score):比较诊断准确性研究。

Validation of 3 Computer-Aided Facial Phenotyping Tools (DeepGestalt, GestaltMatcher, and D-Score): Comparative Diagnostic Accuracy Study.

机构信息

Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany.

Institute for Digitalization and General Medicine, University Hospital Aachen, Aachen, Germany.

出版信息

J Med Internet Res. 2024 Mar 13;26:e42904. doi: 10.2196/42904.

Abstract

BACKGROUND

While characteristic facial features provide important clues for finding the correct diagnosis in genetic syndromes, valid assessment can be challenging. The next-generation phenotyping algorithm DeepGestalt analyzes patient images and provides syndrome suggestions. GestaltMatcher matches patient images with similar facial features. The new D-Score provides a score for the degree of facial dysmorphism.

OBJECTIVE

We aimed to test state-of-the-art facial phenotyping tools by benchmarking GestaltMatcher and D-Score and comparing them to DeepGestalt.

METHODS

Using a retrospective sample of 4796 images of patients with 486 different genetic syndromes (London Medical Database, GestaltMatcher Database, and literature images) and 323 inconspicuous control images, we determined the clinical use of D-Score, GestaltMatcher, and DeepGestalt, evaluating sensitivity; specificity; accuracy; the number of supported diagnoses; and potential biases such as age, sex, and ethnicity.

RESULTS

DeepGestalt suggested 340 distinct syndromes and GestaltMatcher suggested 1128 syndromes. The top-30 sensitivity was higher for DeepGestalt (88%, SD 18%) than for GestaltMatcher (76%, SD 26%). DeepGestalt generally assigned lower scores but provided higher scores for patient images than for inconspicuous control images, thus allowing the 2 cohorts to be separated with an area under the receiver operating characteristic curve (AUROC) of 0.73. GestaltMatcher could not separate the 2 classes (AUROC 0.55). Trained for this purpose, D-Score achieved the highest discriminatory power (AUROC 0.86). D-Score's levels increased with the age of the depicted individuals. Male individuals yielded higher D-scores than female individuals. Ethnicity did not appear to influence D-scores.

CONCLUSIONS

If used with caution, algorithms such as D-score could help clinicians with constrained resources or limited experience in syndromology to decide whether a patient needs further genetic evaluation. Algorithms such as DeepGestalt could support diagnosing rather common genetic syndromes with facial abnormalities, whereas algorithms such as GestaltMatcher could suggest rare diagnoses that are unknown to the clinician in patients with a characteristic, dysmorphic face.

摘要

背景

虽然特征性的面部特征为寻找遗传综合征的正确诊断提供了重要线索,但准确评估可能具有挑战性。下一代表型分析算法 DeepGestalt 分析患者图像并提供综合征建议。GestaltMatcher 将患者图像与具有相似面部特征的图像进行匹配。新的 D 分数提供了面部发育不良程度的分数。

目的

通过基准测试 GestaltMatcher 和 D 分数并将其与 DeepGestalt 进行比较,我们旨在测试最先进的面部表型工具。

方法

使用 4796 张患者图像的回顾性样本(伦敦医学数据库、GestaltMatcher 数据库和文献图像)和 323 张不显眼的对照图像,我们确定了 D 分数、GestaltMatcher 和 DeepGestalt 的临床用途,评估了敏感性;特异性;准确性;支持的诊断数量;以及年龄、性别和种族等潜在偏差。

结果

DeepGestalt 建议了 340 种不同的综合征,而 GestaltMatcher 建议了 1128 种综合征。DeepGestalt 的前 30 位敏感性更高(88%,SD 18%),而 GestaltMatcher 的敏感性更高(76%,SD 26%)。DeepGestalt 通常为患者图像分配较低的分数,但为患者图像分配的分数高于不显眼的对照图像,从而使两个队列可以通过接收者操作特征曲线下的面积(AUROC)区分 0.73。GestaltMatcher 无法区分这两个类别(AUROC 0.55)。为此目的训练的 D 分数实现了最高的判别能力(AUROC 0.86)。D 分数随所描绘个体的年龄而增加。男性个体的 D 分数高于女性个体。种族似乎对 D 分数没有影响。

结论

如果谨慎使用,诸如 D 分数之类的算法可以帮助资源有限或在综合征学方面经验有限的临床医生决定患者是否需要进一步的遗传评估。诸如 DeepGestalt 之类的算法可以支持诊断具有面部异常的常见遗传综合征,而诸如 GestaltMatcher 之类的算法可以为具有特征性、发育不良面部的患者提供临床医生未知的罕见诊断。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/64c0/10973953/284e455a3dbd/jmir_v26i1e42904_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验