Suppr超能文献

seGMM:一种从大规模平行测序数据中确定性别的新工具。

seGMM: A New Tool for Gender Determination From Massively Parallel Sequencing Data.

作者信息

Liu Sihan, Zeng Yuanyuan, Wang Chao, Zhang Qian, Chen Meilin, Wang Xiaolu, Wang Lanchen, Lu Yu, Guo Hui, Bu Fengxiao

机构信息

Institute of Rare Diseases, West China Hospital of Sichuan University, Chengdu, China.

School of Medicine, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China.

出版信息

Front Genet. 2022 Mar 3;13:850804. doi: 10.3389/fgene.2022.850804. eCollection 2022.

Abstract

In clinical genetic testing, checking the concordance between self-reported gender and genotype-inferred gender from genomic data is a significant quality control measure because mismatched gender due to sex chromosomal abnormalities or misregistration of clinical information can significantly affect molecular diagnosis and treatment decisions. Targeted gene sequencing (TGS) is widely recommended as a first-tier diagnostic step in clinical genetic testing. However, the existing gender-inference tools are optimized for whole genome and whole exome data and are not adequate and accurate for analyzing TGS data. In this study, we validated a new gender-inference tool, seGMM, which uses unsupervised clustering (Gaussian mixture model) to determine the gender of a sample. The seGMM tool can also identify sex chromosomal abnormalities in samples by aligning the sequencing reads from the genotype data. The seGMM tool consistently demonstrated >99% gender-inference accuracy in a publicly available 1,000-gene panel dataset from the 1,000 Genomes project, an in-house 785 hearing loss gene panel dataset of 16,387 samples, and a 187 autism risk gene panel dataset from the Autism Clinical and Genetic Resources in China (ACGC) database. The performance and accuracy of seGMM was significantly higher for the targeted gene sequencing (TGS), whole exome sequencing (WES), and whole genome sequencing (WGS) datasets compared to the other existing gender-inference tools such as PLINK, seXY, and XYalign. The results of seGMM were confirmed by the short tandem repeat analysis of the sex chromosome marker gene, amelogenin. Furthermore, our data showed that seGMM accurately identified sex chromosomal abnormalities in the samples. In conclusion, the seGMM tool shows great potential in clinical genetics by determining the sex chromosomal karyotypes of samples from massively parallel sequencing data with high accuracy.

摘要

在临床基因检测中,检查自我报告的性别与根据基因组数据推断的基因型性别之间的一致性是一项重要的质量控制措施,因为性染色体异常或临床信息登记错误导致的性别不匹配会显著影响分子诊断和治疗决策。靶向基因测序(TGS)被广泛推荐为临床基因检测的一线诊断步骤。然而,现有的性别推断工具是针对全基因组和全外显子组数据进行优化的,对于分析TGS数据并不充分且不准确。在本研究中,我们验证了一种新的性别推断工具seGMM,它使用无监督聚类(高斯混合模型)来确定样本的性别。seGMM工具还可以通过比对基因型数据中的测序读数来识别样本中的性染色体异常。在来自千人基因组计划的公开可用的1000基因panel数据集、一个包含16387个样本的内部785个听力损失基因panel数据集以及来自中国自闭症临床与遗传资源(ACGC)数据库的187个自闭症风险基因panel数据集中,seGMM工具始终显示出>99%的性别推断准确率。与其他现有的性别推断工具(如PLINK、seXY和XYalign)相比,seGMM在靶向基因测序(TGS)、全外显子组测序(WES)和全基因组测序(WGS)数据集上的性能和准确性显著更高。seGMM的结果通过性染色体标记基因牙釉蛋白的短串联重复分析得到了证实。此外,我们的数据表明seGMM能够准确识别样本中的性染色体异常。总之,seGMM工具通过高精度地从大规模平行测序数据中确定样本的性染色体核型,在临床遗传学中显示出巨大潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/991e/8930203/410dd17ec772/fgene-13-850804-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验