Lesmann Hellen, Hustinx Alexander, Moosa Shahida, Klinkhammer Hannah, Marchi Elaine, Caro Pilar, Abdelrazek Ibrahim M, Pantel Jean Tori, Hagen Merle Ten, Thong Meow-Keong, Binti Mazlan Rifhan Azwani, Tae Sok Kun, Kamphans Tom, Meiswinkel Wolfgang, Li Jing-Mei, Javanmardi Behnam, Knaus Alexej, Uwineza Annette, Knopp Cordula, Tkemaladze Tinatin, Elbracht Miriam, Mattern Larissa, Jamra Rami Abou, Velmans Clara, Strehlow Vincent, Jacob Maureen, Peron Angela, Dias Cristina, Nunes Beatriz Carvalho, Vilella Thainá, Pinheiro Isabel Furquim, Kim Chong Ae, Melaragno Maria Isabel, Weiland Hannah, Kaptain Sophia, Chwiałkowska Karolina, Kwasniewski Miroslaw, Saad Ramy, Wiethoff Sarah, Goel Himanshu, Tang Clara, Hau Anna, Barakat Tahsin Stefan, Panek Przemysław, Nabil Amira, Suh Julia, Braun Frederik, Gomy Israel, Averdunk Luisa, Ekure Ekanem, Bergant Gaber, Peterlin Borut, Graziano Claudio, Gaboon Nagwa, Fiesco-Roa Moisés, Spinelli Alessandro Mauro, Wilpert Nina-Maria, Phowthongkum Prasit, Güzel Nergis, Haack Tobias B, Bitar Rana, Tzschach Andreas, Rodriguez-Palmero Agusti, Brunet Theresa, Rudnik-Schöneborn Sabine, Contreras-Capetillo Silvina Noemi, Oberlack Ava, Samango-Sprouse Carole, Sadeghin Teresa, Olaya Margaret, Platzer Konrad, Borovikov Artem, Schnabel Franziska, Heuft Lara, Herrmann Vera, Oegema Renske, Elkhateeb Nour, Kumar Sheetal, Komlosi Katalin, Mohamed Khoushoua, Kalantari Silvia, Sirchia Fabio, Martinez-Monseny Antonio F, Höller Matthias, Toutouna Louiza, Mohamed Amal, Lasa-Aranzasti Amaia, Sayer John A, Ehmke Nadja, Danyel Magdalena, Sczakiel Henrike, Schwartzmann Sarina, Boschann Felix, Zhao Max, Adam Ronja, Einicke Lara, Horn Denise, Chew Kee Seang, Kam Choy Chen, Karakoyun Miray, Pode-Shakked Ben, Eliyahu Aviva, Rock Rachel, Carrion Teresa, Chorin Odelia, Zarate Yuri A, Conti Marcelo Martinez, Karakaya Mert, Tung Moon Ley, Chandra Bharatendu, Bouman Arjan, Lumaka Aime, Wasif Naveed, Shinawi Marwan, Blackburn Patrick R, Wang Tianyun, Niehues Tim, Schmidt Axel, Roth Regina Rita, Wieczorek Dagmar, Hu Ping, Waikel Rebekah L, Ledgister Hanchard Suzanna E, Elmakkawy Gehad, Safwat Sylvia, Ebstein Frédéric, Krüger Elke, Küry Sébastien, Bézieau Stéphane, Arlt Annabelle, Olinger Eric, Marbach Felix, Li Dong, Dupuis Lucie, Mendoza-Londono Roberto, Houge Sofia Douzgou, Weis Denisa, Chung Brian Hon-Yin, Mak Christopher C Y, Kayserili Hülya, Elcioglu Nursel, Aykut Ayca, Şimşek-Kiper Peli Özlem, Bögershausen Nina, Wollnik Bernd, Bentzen Heidi Beate, Kurth Ingo, Netzer Christian, Jezela-Stanek Aleksandra, Devriendt Koen, Gripp Karen W, Mücke Martin, Verloes Alain, Schaaf Christian P, Nellåker Christoffer, Solomon Benjamin D, Nöthen Markus M, Abdalla Ebtesam, Lyon Gholson J, Krawitz Peter M, Hsieh Tzung-Chien
Institute of Human Genetics, University of Bonn, Bonn, NRW, Germany.
Institute for Genomic Statistics and Bioinformatics, University of Bonn, Bonn, NRW, Germany.
Res Sq. 2024 Jun 10:rs.3.rs-4438861. doi: 10.21203/rs.3.rs-4438861/v1.
The most important factor that complicates the work of dysmorphologists is the significant phenotypic variability of the human face. Next-Generation Phenotyping (NGP) tools that assist clinicians with recognizing characteristic syndromic patterns are particularly challenged when confronted with patients from populations different from their training data. To that end, we systematically analyzed the impact of genetic ancestry on facial dysmorphism. For that purpose, we established the GestaltMatcher Database (GMDB) as a reference dataset for medical images of patients with rare genetic disorders from around the world. We collected 10,980 frontal facial images - more than a quarter previously unpublished - from 8,346 patients, representing 581 rare disorders. Although the predominant ancestry is still European (67%), data from underrepresented populations have been increased considerably via global collaborations (19% Asian and 7% African). This includes previously unpublished reports for more than 40% of the African patients. The NGP analysis on this diverse dataset revealed characteristic performance differences depending on the composition of training and test sets corresponding to genetic relatedness. For clinical use of NGP, incorporating non-European patients resulted in a profound enhancement of GestaltMatcher performance. The top-5 accuracy rate increased by +11.29%. Importantly, this improvement in delineating the correct disorder from a facial portrait was achieved without decreasing the performance on European patients. By design, GMDB complies with the FAIR principles by rendering the curated medical data findable, accessible, interoperable, and reusable. This means GMDB can also serve as data for training and benchmarking. In summary, our study on facial dysmorphism on a global sample revealed a considerable cross ancestral phenotypic variability confounding NGP that should be counteracted by international efforts for increasing data diversity. GMDB will serve as a vital reference database for clinicians and a transparent training set for advancing NGP technology.
使畸形学家工作复杂化的最重要因素是人类面部显著的表型变异性。当面对来自与训练数据不同人群的患者时,协助临床医生识别特征性综合征模式的下一代表型分析(NGP)工具尤其具有挑战性。为此,我们系统地分析了遗传血统对面部畸形的影响。为此,我们建立了格式塔匹配器数据库(GMDB),作为来自世界各地患有罕见遗传疾病患者医学图像的参考数据集。我们从8346名患者那里收集了10980张正面面部图像——超过四分之一此前未发表——这些患者代表了581种罕见疾病。尽管主要血统仍然是欧洲血统(67%),但通过全球合作,来自代表性不足人群的数据有了大幅增加(19%为亚洲血统,7%为非洲血统)。这包括超过40%非洲患者此前未发表的报告。对这个多样化数据集的NGP分析揭示了根据与遗传相关性对应的训练集和测试集组成而产生的特征性性能差异。对于NGP的临床应用,纳入非欧洲患者显著提高了格式塔匹配器的性能。前5准确率提高了11.29%。重要的是,在不降低对欧洲患者性能的情况下,实现了从面部画像中准确识别正确疾病的改进。从设计上看,GMDB符合FAIR原则,使经过整理的医学数据可查找、可访问、可互操作且可重复使用。这意味着GMDB也可作为训练和基准测试的数据。总之,我们对全球样本面部畸形的研究揭示了相当大的跨祖先表型变异性,这使NGP混淆,应通过国际努力增加数据多样性来加以应对。GMDB将成为临床医生的重要参考数据库和推动NGP技术发展的透明训练集。