RExPRT：一种用于预测串联重复序列座致病性的机器学习工具。

RExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci.

机构信息

Dr. John T. Macdonald Foundation Department of Human Genetics and John P. Hussman Institute for Human Genetics, University of Miami Miller School of Medicine, Biomedical Research Building (BRB), Miami, FL, 33136, USA.

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, 02155, USA.

出版信息

Genome Biol. 2024 Jan 31;25(1):39. doi: 10.1186/s13059-024-03171-4.

DOI:10.1186/s13059-024-03171-4

PMID:38297326

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10832122/

Abstract

Expansions of tandem repeats (TRs) cause approximately 60 monogenic diseases. We expect that the discovery of additional pathogenic repeat expansions will narrow the diagnostic gap in many diseases. A growing number of TR expansions are being identified, and interpreting them is a challenge. We present RExPRT (Repeat EXpansion Pathogenicity pRediction Tool), a machine learning tool for distinguishing pathogenic from benign TR expansions. Our results demonstrate that an ensemble approach classifies TRs with an average precision of 93% and recall of 83%. RExPRT's high precision will be valuable in large-scale discovery studies, which require prioritization of candidate loci for follow-up studies.

摘要

串联重复（TR）的扩展导致大约 60 种单基因疾病。我们预计，发现更多致病性重复扩展将缩小许多疾病的诊断差距。越来越多的 TR 扩展正在被发现，对其进行解释是一个挑战。我们提出了 RExPRT（Repeat EXpansion Pathogenicity pRediction Tool），这是一种用于区分致病性和良性 TR 扩展的机器学习工具。我们的结果表明，集成方法对 TR 的分类平均精度为 93%，召回率为 83%。RExPRT 的高精度在需要对候选基因座进行优先级排序以进行后续研究的大规模发现研究中很有价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e188/10832122/8c587e25aff8/13059_2024_3171_Fig1_HTML.jpg

相似文献

RExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci.RExPRT：一种用于预测串联重复序列座致病性的机器学习工具。

Genome Biol. 2024 Jan 31;25(1):39. doi: 10.1186/s13059-024-03171-4.

Genome-wide sequencing as a first-tier screening test for short tandem repeat expansions.全基因组测序作为短串联重复扩展的一线筛查试验。

Genome Med. 2021 Aug 9;13(1):126. doi: 10.1186/s13073-021-00932-9.

Advancing genomic technologies and clinical awareness accelerates discovery of disease-associated tandem repeat sequences.基因组技术和临床意识的进步加速了与疾病相关串联重复序列的发现。

Genome Res. 2022 Jan;32(1):1-27. doi: 10.1101/gr.269530.120. Epub 2021 Dec 29.

A genome-wide spectrum of tandem repeat expansions in 338,963 humans.在 338963 个人类中发现了全基因组串联重复扩展的范围。

Cell. 2024 Apr 25;187(9):2336-2341.e5. doi: 10.1016/j.cell.2024.03.004. Epub 2024 Apr 5.

Characterization and visualization of tandem repeats at genome scale.基因组水平上串联重复序列的特征化和可视化。

Nat Biotechnol. 2024 Oct;42(10):1606-1614. doi: 10.1038/s41587-023-02057-3. Epub 2024 Jan 2.

Large scale in silico characterization of repeat expansion variation in human genomes.大规模的人类基因组中重复扩展变异的计算机模拟分析。

Sci Data. 2020 Sep 8;7(1):294. doi: 10.1038/s41597-020-00633-9.

Genome-wide detection of tandem DNA repeats that are expanded in autism.全基因组检测在孤独症中扩增的串联 DNA 重复。

Nature. 2020 Oct;586(7827):80-86. doi: 10.1038/s41586-020-2579-z. Epub 2020 Jul 27.

Tally-2.0: upgraded validator of tandem repeat detection in protein sequences.Tally-2.0：蛋白质序列中串联重复检测的升级验证器。

Bioinformatics. 2020 May 1;36(10):3260-3262. doi: 10.1093/bioinformatics/btaa121.

The role of tandem repeat expansions in brain disorders.串联重复序列扩增在脑部疾病中的作用。

Emerg Top Life Sci. 2023 Dec 14;7(3):249-263. doi: 10.1042/ETLS20230022.

Tally: a scoring tool for boundary determination between repetitive and non-repetitive protein sequences.Tally：一种用于确定重复和非重复蛋白质序列之间界限的评分工具。

Bioinformatics. 2016 Jul 1;32(13):1952-8. doi: 10.1093/bioinformatics/btw118. Epub 2016 Mar 7.

引用本文的文献

A Tandem Repeat Atlas for the Genome of Inbred Mouse Strains: A Genetic Variation Resource.近交系小鼠基因组串联重复图谱：一种遗传变异资源。

bioRxiv. 2025 May 24:2025.05.23.655792. doi: 10.1101/2025.05.23.655792.

A genome-wide approach for the discovery of novel repeat expansion disorders in the Undiagnosed Diseases Network cohort.一种用于在未确诊疾病网络队列中发现新型重复序列扩张疾病的全基因组方法。

Genet Med. 2025 May 22;27(8):101462. doi: 10.1016/j.gim.2025.101462.

Long-read sequencing for diagnosis of genetic myopathies.用于诊断遗传性肌病的长读长测序

BMJ Neurol Open. 2025 May 11;7(1):e000990. doi: 10.1136/bmjno-2024-000990. eCollection 2025.

STRchive: a dynamic resource detailing population-level and locus-specific insights at tandem repeat disease loci.STR 存档库：一个详细展示串联重复疾病位点人群水平和位点特异性见解的动态资源库。

Genome Med. 2025 Mar 26;17(1):29. doi: 10.1186/s13073-025-01454-4.

Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease.转录组学与长读长基因组学的整合确定了罕见病中的结构变异优先级。

Genome Res. 2025 Apr 14;35(4):914-928. doi: 10.1101/gr.279323.124.

Detailed tandem repeat allele profiling in 1,027 long-read genomes reveals genome-wide patterns of pathogenicity.对1027个长读长基因组进行详细的串联重复等位基因分析揭示了全基因组范围的致病性模式。

bioRxiv. 2025 Jan 20:2025.01.06.631535. doi: 10.1101/2025.01.06.631535.

Recent Advances in the Genetics of Ataxias: An Update on Novel Autosomal Dominant Repeat Expansions.共济失调遗传学的最新进展：新型常染色体显性重复序列扩增的最新情况

Curr Neurol Neurosci Rep. 2025 Jan 16;25(1):16. doi: 10.1007/s11910-024-01400-8.

Toward understanding the role of genomic repeat elements in neurodegenerative diseases.迈向理解基因组重复元件在神经退行性疾病中的作用。

Neural Regen Res. 2025 Mar 1;20(3):646-659. doi: 10.4103/NRR.NRR-D-23-01568. Epub 2024 Apr 16.

Integration of transcriptomics and long-read genomics prioritizes structural variants in rare disease.转录组学与长读长基因组学的整合确定了罕见病中的结构变异优先级。

medRxiv. 2024 Mar 26:2024.03.22.24304565. doi: 10.1101/2024.03.22.24304565.

Repetitive DNA sequence detection and its role in the human genome.重复 DNA 序列检测及其在人类基因组中的作用。

Commun Biol. 2023 Sep 19;6(1):954. doi: 10.1038/s42003-023-05322-y.

本文引用的文献

SVPath: an accurate pipeline for predicting the pathogenicity of human exon structural variants.SVPath：一种准确预测人类外显子结构变异致病性的管道。

Brief Bioinform. 2022 Mar 10;23(2). doi: 10.1093/bib/bbac014.

The CGG repeat expansion in RILPL1 is associated with oculopharyngodistal myopathy type 4.RILPL1 中的 CGG 重复扩展与眼咽远端肌病 4 型有关。

Am J Hum Genet. 2022 Mar 3;109(3):533-541. doi: 10.1016/j.ajhg.2022.01.012. Epub 2022 Feb 10.

StrVCTVRE: A supervised learning method to predict the pathogenicity of human genome structural variants.StrVCTVRE：一种用于预测人类基因组结构变异致病性的监督学习方法。

Am J Hum Genet. 2022 Feb 3;109(2):195-209. doi: 10.1016/j.ajhg.2021.12.007. Epub 2022 Jan 14.

DeepSVP: integration of genotype and phenotype for structural variant prioritization using deep learning.DeepSVP：利用深度学习进行基因型和表型整合的结构变异优先级排序。

Bioinformatics. 2022 Mar 4;38(6):1677-1684. doi: 10.1093/bioinformatics/btab859.

Molecular mechanisms underlying nucleotide repeat expansion disorders.核苷酸重复扩增疾病的分子机制。

Nat Rev Mol Cell Biol. 2021 Sep;22(9):589-607. doi: 10.1038/s41580-021-00382-6. Epub 2021 Jun 17.

Human-lineage-specific genomic elements are associated with neurodegenerative disease and APOE transcript usage.人类谱系特异性基因组元件与神经退行性疾病和 APOE 转录本的使用有关。

Nat Commun. 2021 Apr 6;12(1):2076. doi: 10.1038/s41467-021-22262-5.

The GGC repeat expansion in NOTCH2NLC is associated with oculopharyngodistal myopathy type 3.NOTCH2NLC 中的 GGC 重复扩展与眼咽远端肌病 3 型有关。

Brain. 2021 Jul 28;144(6):1819-1832. doi: 10.1093/brain/awab077.

Patterns of de novo tandem repeat mutations and their role in autism.从头开始的串联重复突变模式及其在自闭症中的作用。

Nature. 2021 Jan;589(7841):246-250. doi: 10.1038/s41586-020-03078-7. Epub 2021 Jan 13.

Large scale in silico characterization of repeat expansion variation in human genomes.大规模的人类基因组中重复扩展变异的计算机模拟分析。

Sci Data. 2020 Sep 8;7(1):294. doi: 10.1038/s41597-020-00633-9.

Forensic Autosomal Short Tandem Repeats and Their Potential Association With Phenotype.法医常染色体短串联重复序列及其与表型的潜在关联。

Front Genet. 2020 Aug 6;11:884. doi: 10.3389/fgene.2020.00884. eCollection 2020.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

RExPRT：一种用于预测串联重复序列座致病性的机器学习工具。

RExPRT: a machine learning tool to predict pathogenicity of tandem repeat loci.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献