Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Biomedical Research Building (BRB), Miami, FL, 33136, USA.
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02155, USA.
Genome Biol. 2024 Jan 31;25(1):39. doi: 10.1186/s13059-024-03171-4.
Expansions of tandem repeats (TRs) cause approximately 60 monogenic diseases. We expect that the discovery of additional pathogenic repeat expansions will narrow the diagnostic gap in many diseases. A growing number of TR expansions are being identified, and interpreting them is a challenge. We present RExPRT (Repeat EXpansion Pathogenicity pRediction Tool), a machine learning tool for distinguishing pathogenic from benign TR expansions. Our results demonstrate that an ensemble approach classifies TRs with an average precision of 93% and recall of 83%. RExPRT's high precision will be valuable in large-scale discovery studies, which require prioritization of candidate loci for follow-up studies.
串联重复(TR)的扩展导致大约 60 种单基因疾病。我们预计,发现更多致病性重复扩展将缩小许多疾病的诊断差距。越来越多的 TR 扩展正在被发现,对其进行解释是一个挑战。我们提出了 RExPRT(Repeat EXpansion Pathogenicity pRediction Tool),这是一种用于区分致病性和良性 TR 扩展的机器学习工具。我们的结果表明,集成方法对 TR 的分类平均精度为 93%,召回率为 83%。RExPRT 的高精度在需要对候选基因座进行优先级排序以进行后续研究的大规模发现研究中很有价值。