• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

seGMM:一种从大规模平行测序数据中确定性别的新工具。

seGMM: A New Tool for Gender Determination From Massively Parallel Sequencing Data.

作者信息

Liu Sihan, Zeng Yuanyuan, Wang Chao, Zhang Qian, Chen Meilin, Wang Xiaolu, Wang Lanchen, Lu Yu, Guo Hui, Bu Fengxiao

机构信息

Institute of Rare Diseases, West China Hospital of Sichuan University, Chengdu, China.

School of Medicine, National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, China.

出版信息

Front Genet. 2022 Mar 3;13:850804. doi: 10.3389/fgene.2022.850804. eCollection 2022.

DOI:10.3389/fgene.2022.850804
PMID:35309142
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8930203/
Abstract

In clinical genetic testing, checking the concordance between self-reported gender and genotype-inferred gender from genomic data is a significant quality control measure because mismatched gender due to sex chromosomal abnormalities or misregistration of clinical information can significantly affect molecular diagnosis and treatment decisions. Targeted gene sequencing (TGS) is widely recommended as a first-tier diagnostic step in clinical genetic testing. However, the existing gender-inference tools are optimized for whole genome and whole exome data and are not adequate and accurate for analyzing TGS data. In this study, we validated a new gender-inference tool, seGMM, which uses unsupervised clustering (Gaussian mixture model) to determine the gender of a sample. The seGMM tool can also identify sex chromosomal abnormalities in samples by aligning the sequencing reads from the genotype data. The seGMM tool consistently demonstrated >99% gender-inference accuracy in a publicly available 1,000-gene panel dataset from the 1,000 Genomes project, an in-house 785 hearing loss gene panel dataset of 16,387 samples, and a 187 autism risk gene panel dataset from the Autism Clinical and Genetic Resources in China (ACGC) database. The performance and accuracy of seGMM was significantly higher for the targeted gene sequencing (TGS), whole exome sequencing (WES), and whole genome sequencing (WGS) datasets compared to the other existing gender-inference tools such as PLINK, seXY, and XYalign. The results of seGMM were confirmed by the short tandem repeat analysis of the sex chromosome marker gene, amelogenin. Furthermore, our data showed that seGMM accurately identified sex chromosomal abnormalities in the samples. In conclusion, the seGMM tool shows great potential in clinical genetics by determining the sex chromosomal karyotypes of samples from massively parallel sequencing data with high accuracy.

摘要

在临床基因检测中,检查自我报告的性别与根据基因组数据推断的基因型性别之间的一致性是一项重要的质量控制措施,因为性染色体异常或临床信息登记错误导致的性别不匹配会显著影响分子诊断和治疗决策。靶向基因测序(TGS)被广泛推荐为临床基因检测的一线诊断步骤。然而,现有的性别推断工具是针对全基因组和全外显子组数据进行优化的,对于分析TGS数据并不充分且不准确。在本研究中,我们验证了一种新的性别推断工具seGMM,它使用无监督聚类(高斯混合模型)来确定样本的性别。seGMM工具还可以通过比对基因型数据中的测序读数来识别样本中的性染色体异常。在来自千人基因组计划的公开可用的1000基因panel数据集、一个包含16387个样本的内部785个听力损失基因panel数据集以及来自中国自闭症临床与遗传资源(ACGC)数据库的187个自闭症风险基因panel数据集中,seGMM工具始终显示出>99%的性别推断准确率。与其他现有的性别推断工具(如PLINK、seXY和XYalign)相比,seGMM在靶向基因测序(TGS)、全外显子组测序(WES)和全基因组测序(WGS)数据集上的性能和准确性显著更高。seGMM的结果通过性染色体标记基因牙釉蛋白的短串联重复分析得到了证实。此外,我们的数据表明seGMM能够准确识别样本中的性染色体异常。总之,seGMM工具通过高精度地从大规模平行测序数据中确定样本的性染色体核型,在临床遗传学中显示出巨大潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/991e/8930203/6ebb95632675/fgene-13-850804-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/991e/8930203/410dd17ec772/fgene-13-850804-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/991e/8930203/d3e389b6e081/fgene-13-850804-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/991e/8930203/239add05d347/fgene-13-850804-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/991e/8930203/ce551c54b93d/fgene-13-850804-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/991e/8930203/6ebb95632675/fgene-13-850804-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/991e/8930203/410dd17ec772/fgene-13-850804-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/991e/8930203/d3e389b6e081/fgene-13-850804-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/991e/8930203/239add05d347/fgene-13-850804-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/991e/8930203/ce551c54b93d/fgene-13-850804-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/991e/8930203/6ebb95632675/fgene-13-850804-g005.jpg

相似文献

1
seGMM: A New Tool for Gender Determination From Massively Parallel Sequencing Data.seGMM:一种从大规模平行测序数据中确定性别的新工具。
Front Genet. 2022 Mar 3;13:850804. doi: 10.3389/fgene.2022.850804. eCollection 2022.
2
Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data.鉴定、理解和纠正下一代测序数据中性染色体上的技术伪影。
Gigascience. 2019 Jul 1;8(7). doi: 10.1093/gigascience/giz074.
3
Comprehensive preimplantation genetic testing by massively parallel sequencing.大规模平行测序的全面植入前遗传学检测。
Hum Reprod. 2021 Jan 1;36(1):236-247. doi: 10.1093/humrep/deaa269.
4
seXY: a tool for sex inference from genotype arrays.SEXY:一种从基因型阵列推断性别的工具。
Bioinformatics. 2017 Feb 15;33(4):561-563. doi: 10.1093/bioinformatics/btw696.
5
Forensic Y-SNP analysis beyond SNaPshot: High-resolution Y-chromosomal haplogrouping from low quality and quantity DNA using Ion AmpliSeq and targeted massively parallel sequencing.法医 Y-SNP 分析超越 SNaPshot:利用 Ion AmpliSeq 和靶向大规模平行测序技术从低质量和低数量 DNA 中进行高精度 Y 染色体单倍型分析。
Forensic Sci Int Genet. 2019 Jul;41:93-106. doi: 10.1016/j.fsigen.2019.04.001. Epub 2019 Apr 27.
6
An ancestry informative marker panel design for individual ancestry estimation of Hispanic population using whole exome sequencing data.基于全外显子组测序数据的西班牙裔个体祖籍信息标记面板设计用于个体祖籍估计。
BMC Genomics. 2019 Dec 30;20(Suppl 12):1007. doi: 10.1186/s12864-019-6333-6.
7
Benchmarking of human Y-chromosomal haplogroup classifiers with whole-genome and whole-exome sequence data.利用全基因组和全外显子组序列数据对人类Y染色体单倍群分类器进行基准测试。
Comput Struct Biotechnol J. 2023 Sep 15;21:4613-4618. doi: 10.1016/j.csbj.2023.09.012. eCollection 2023.
8
A practical method to detect SNVs and indels from whole genome and exome sequencing data.一种从全基因组和外显子组测序数据中检测 SNVs 和 indels 的实用方法。
Sci Rep. 2013;3:2161. doi: 10.1038/srep02161.
9
Prenatal detection of aneuploidy and imbalanced chromosomal arrangements by massively parallel sequencing.高通量测序在非整倍体和不平衡染色体结构异常检测中的应用。
PLoS One. 2012;7(2):e27835. doi: 10.1371/journal.pone.0027835. Epub 2012 Feb 28.
10
Archived neonatal dried blood spot samples can be used for accurate whole genome and exome-targeted next-generation sequencing.存档的新生儿干血斑样本可用于准确的全基因组和外显子靶向下一代测序。
Mol Genet Metab. 2013 Sep-Oct;110(1-2):65-72. doi: 10.1016/j.ymgme.2013.06.004. Epub 2013 Jun 13.

引用本文的文献

1
A multi-ancestry genetic reference for the Quebec population.魁北克人群的多祖先遗传参考。
medRxiv. 2025 May 16:2025.05.14.25327536. doi: 10.1101/2025.05.14.25327536.

本文引用的文献

1
Expanding Use of Clinical Genome Sequencing and the Need for More Data on Implementation.临床基因组测序的应用扩展及实施方面更多数据的需求
JAMA. 2020 Nov 24;324(20):2029-2030. doi: 10.1001/jama.2020.19933.
2
Whole-genome sequencing of patients with rare diseases in a national health system.在国家卫生系统中对罕见病患者进行全基因组测序。
Nature. 2020 Jul;583(7814):96-102. doi: 10.1038/s41586-020-2434-2. Epub 2020 Jun 24.
3
The Global Market for Next-Generation Sequencing Tests Continues Its Torrid Pace.全球下一代测序检测市场持续迅猛发展。
J Precis Med. 2018 Oct;4.
4
Applications and analysis of targeted genomic sequencing in cancer studies.靶向基因组测序在癌症研究中的应用与分析
Comput Struct Biotechnol J. 2019 Nov 7;17:1348-1359. doi: 10.1016/j.csbj.2019.10.004. eCollection 2019.
5
Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data.鉴定、理解和纠正下一代测序数据中性染色体上的技术伪影。
Gigascience. 2019 Jul 1;8(7). doi: 10.1093/gigascience/giz074.
6
Inherited and multiple de novo mutations in autism/developmental delay risk genes suggest a multifactorial model.自闭症/发育迟缓风险基因的遗传和新发突变提示了一种多因素模型。
Mol Autism. 2018 Dec 13;9:64. doi: 10.1186/s13229-018-0247-z. eCollection 2018.
7
fastp: an ultra-fast all-in-one FASTQ preprocessor.fastp:一个超快速的一体化 FASTQ 预处理程序。
Bioinformatics. 2018 Sep 1;34(17):i884-i890. doi: 10.1093/bioinformatics/bty560.
8
The UK Biobank resource with deep phenotyping and genomic data.英国生物银行资源库,具有深度表型和基因组数据。
Nature. 2018 Oct;562(7726):203-209. doi: 10.1038/s41586-018-0579-z. Epub 2018 Oct 10.
9
Next-generation sequencing approach for the diagnosis of human diseases: open challenges and new opportunities.用于人类疾病诊断的下一代测序方法:面临的公开挑战与新机遇。
EJIFCC. 2018 Apr 30;29(1):4-14. eCollection 2018 Apr.
10
Mosdepth: quick coverage calculation for genomes and exomes.Mosdepth:基因组和外显子组的快速覆盖度计算。
Bioinformatics. 2018 Mar 1;34(5):867-868. doi: 10.1093/bioinformatics/btx699.