• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

GRAF-pop:一种无需主成分分析即可基于距离推断个体祖先的快速方法,适用于多种基因型数据集。

GRAF-pop: A Fast Distance-Based Method To Infer Subject Ancestry from Multiple Genotype Datasets Without Principal Components Analysis.

机构信息

National Center for Biotechnology Information, National Institutes of Health, Department of Health and Human Services, Bethesda, Maryland 20894 and

Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health; Department of Health and Human Services; Bethesda, Maryland 20892.

出版信息

G3 (Bethesda). 2019 Aug 8;9(8):2447-2461. doi: 10.1534/g3.118.200925.

DOI:10.1534/g3.118.200925
PMID:31151998
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6686921/
Abstract

Inferring subject ancestry using genetic data is an important step in genetic association studies, required for dealing with population stratification. It has become more challenging to infer subject ancestry quickly and accurately since large amounts of genotype data, collected from millions of subjects by thousands of studies using different methods, are accessible to researchers from repositories such as the database of Genotypes and Phenotypes (dbGaP) at the National Center for Biotechnology Information (NCBI). Study-reported populations submitted to dbGaP are often not harmonized across studies or may be missing. Widely-used methods for ancestry prediction assume that most markers are genotyped in all subjects, but this assumption is unrealistic if one wants to combine studies that used different genotyping platforms. To provide ancestry inference and visualization across studies, we developed a new method, GRAF-pop, of ancestry prediction that is robust to missing genotypes and allows researchers to visualize predicted population structure in color and in three dimensions. When genotypes are dense, GRAF-pop is comparable in quality and running time to existing ancestry inference methods EIGENSTRAT, FastPCA, and FlashPCA2, all of which rely on principal components analysis (PCA). When genotypes are not dense, GRAF-pop gives much better ancestry predictions than the PCA-based methods. GRAF-pop employs basic geometric and probabilistic methods; the visualized ancestry predictions have a natural geometric interpretation, which is lacking in PCA-based methods. Since February 2018, GRAF-pop has been successfully incorporated into the dbGaP quality control process to identify inconsistencies between study-reported and computationally predicted populations and to provide harmonized population values in all new dbGaP submissions amenable to population prediction, based on marker genotypes. Plots, produced by GRAF-pop, of summary population predictions are available on dbGaP study pages, and the software, is available at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/Software.cgi.

摘要

使用遗传数据推断个体的祖先族群是遗传关联研究中的重要步骤,这对于处理群体分层至关重要。由于可以从 NCBI 的 dbGaP 等存储库中获取来自成千上万项研究的数以百万计的个体的大量基因型数据,因此快速准确地推断个体的祖先族群变得更加具有挑战性。向 dbGaP 提交的研究报告的族群通常在不同的研究中没有协调一致,或者可能缺失。广泛使用的祖先预测方法假设大多数标记都在所有个体中进行了基因分型,但如果要合并使用不同基因分型平台的研究,这种假设就不切实际。为了在研究之间提供祖先推断和可视化,我们开发了一种新的祖先预测方法 GRAF-pop,该方法对缺失基因型具有鲁棒性,允许研究人员以颜色和三维形式可视化预测的群体结构。当基因型密集时,GRAF-pop 在质量和运行时间上与现有的祖先推断方法 EIGENSTRAT、FastPCA 和 FlashPCA2 相当,所有这些方法都依赖于主成分分析(PCA)。当基因型不密集时,GRAF-pop 比基于 PCA 的方法给出了更好的祖先预测。GRAF-pop 采用了基本的几何和概率方法;可视化的祖先预测具有自然的几何解释,而基于 PCA 的方法则缺乏这种解释。自 2018 年 2 月以来,GRAF-pop 已成功纳入 dbGaP 质量控制流程,以识别研究报告的族群与计算预测的族群之间的不一致,并根据标记基因型为所有新的可进行族群预测的 dbGaP 提交提供协调一致的族群值。基于 GRAF-pop 生成的总结族群预测图可在 dbGaP 研究页面上查看,该软件可在 https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/Software.cgi 上获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc71/6686921/36e48236121b/2447f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc71/6686921/f3afbbe6c0e1/2447f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc71/6686921/f37e689303ff/2447f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc71/6686921/294c9ea62306/2447f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc71/6686921/b389f6050505/2447f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc71/6686921/26043898eec3/2447f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc71/6686921/9db3b59102ab/2447f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc71/6686921/36e48236121b/2447f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc71/6686921/f3afbbe6c0e1/2447f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc71/6686921/f37e689303ff/2447f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc71/6686921/294c9ea62306/2447f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc71/6686921/b389f6050505/2447f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc71/6686921/26043898eec3/2447f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc71/6686921/9db3b59102ab/2447f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/fc71/6686921/36e48236121b/2447f7.jpg

相似文献

1
GRAF-pop: A Fast Distance-Based Method To Infer Subject Ancestry from Multiple Genotype Datasets Without Principal Components Analysis.GRAF-pop:一种无需主成分分析即可基于距离推断个体祖先的快速方法,适用于多种基因型数据集。
G3 (Bethesda). 2019 Aug 8;9(8):2447-2461. doi: 10.1534/g3.118.200925.
2
Quickly identifying identical and closely related subjects in large databases using genotype data.利用基因型数据在大型数据库中快速识别相同和密切相关的个体。
PLoS One. 2017 Jun 13;12(6):e0179106. doi: 10.1371/journal.pone.0179106. eCollection 2017.
3
FastPop: a rapid principal component derived method to infer intercontinental ancestry using genetic data.FastPop:一种利用遗传数据推断洲际血统的快速主成分衍生方法。
BMC Bioinformatics. 2016 Mar 9;17:122. doi: 10.1186/s12859-016-0965-1.
4
Novel probabilistic models of spatial genetic ancestry with applications to stratification correction in genome-wide association studies.用于全基因组关联研究分层校正的空间遗传血统新型概率模型。
Bioinformatics. 2017 Mar 15;33(6):879-885. doi: 10.1093/bioinformatics/btw720.
5
Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations.利用多个群体的等位基因频率从DNA序列数据中快速推断个体祖先。
BMC Bioinformatics. 2015 Jan 16;16:4. doi: 10.1186/s12859-014-0418-7.
6
Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness.在存在亲缘关系的情况下,对群体结构进行稳健推断,以进行血统预测和分层校正。
Genet Epidemiol. 2015 May;39(4):276-93. doi: 10.1002/gepi.21896. Epub 2015 Mar 23.
7
The dbGaP data browser: a new tool for browsing dbGaP controlled-access genomic data.数据库基因和蛋白质组学(dbGaP)数据浏览器:一种用于浏览dbGaP受限访问基因组数据的新工具。
Nucleic Acids Res. 2017 Jan 4;45(D1):D819-D826. doi: 10.1093/nar/gkw1139. Epub 2016 Nov 29.
8
Ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure.利用主成分分析和空间分析进行祖籍推断:基于距离的分析方法,用于解释人口亚结构。
BMC Genomics. 2017 Oct 16;18(1):789. doi: 10.1186/s12864-017-4166-8.
9
MI-MAAP: marker informativeness for multi-ancestry admixed populations.MI-MAAP:多祖混合人群的标记信息量。
BMC Bioinformatics. 2020 Apr 3;21(1):131. doi: 10.1186/s12859-020-3462-5.
10
Sparse principal component analysis for identifying ancestry-informative markers in genome-wide association studies.稀疏主成分分析在全基因组关联研究中识别与祖先相关的标记。
Genet Epidemiol. 2012 May;36(4):293-302. doi: 10.1002/gepi.21621. Epub 2012 Apr 16.

引用本文的文献

1
Leveraging local ancestry and cross-ancestry genetic architecture to improve genetic prediction of complex traits in admixed populations.利用本地祖先和跨祖先遗传结构改善混合人群复杂性状的遗传预测。
Am J Hum Genet. 2025 Jul 3. doi: 10.1016/j.ajhg.2025.06.010.
2
Structural variation detection and association analysis of whole-genome-sequence data from 16,543 Alzheimer's disease sequencing project subjects.对来自16543名阿尔茨海默病测序项目受试者的全基因组序列数据进行结构变异检测和关联分析。
Alzheimers Dement. 2025 Jun;21(6):e70277. doi: 10.1002/alz.70277.
3
The impact of genetic ancestry on survival outcomes in pediatric rhabdomyosarcoma: A report from the Children's Oncology Group.

本文引用的文献

1
AIM-SNPtag: A computationally efficient approach for developing ancestry-informative SNP panels.AIM-SNPtag:一种用于开发具有遗传背景信息的 SNP 面板的计算高效方法。
Forensic Sci Int Genet. 2019 Jan;38:245-253. doi: 10.1016/j.fsigen.2018.10.015. Epub 2018 Nov 2.
2
Improving ancestry distinctions among Southwest Asian populations.提高西南亚人群的祖先差异辨识度。
Forensic Sci Int Genet. 2018 Jul;35:14-20. doi: 10.1016/j.fsigen.2018.03.010. Epub 2018 Mar 23.
3
Massively parallel sequencing of 165 ancestry informative SNPs in two Chinese Tibetan-Burmese minority ethnicities.
遗传血统对儿童横纹肌肉瘤生存结局的影响:来自儿童肿瘤学组的报告。
HGG Adv. 2025 Jul 10;6(3):100466. doi: 10.1016/j.xhgg.2025.100466. Epub 2025 Jun 9.
4
Multi-trait Analysis of GWAS Expands Eosinophilic Esophagitis Genetic Susceptibility and Polygenic Risk Scores.全基因组关联研究的多性状分析扩展了嗜酸性食管炎的遗传易感性和多基因风险评分。
Res Sq. 2025 May 16:rs.3.rs-6630283. doi: 10.21203/rs.3.rs-6630283/v1.
5
Alzheimer's Disease Sequencing Project release 4 whole genome sequencing dataset.阿尔茨海默病测序项目第4版全基因组测序数据集。
Alzheimers Dement. 2025 May;21(5):e70237. doi: 10.1002/alz.70237.
6
Integration of Germline and Somatic Variation Improves Chronic Lymphocytic Leukemia Risk Stratification.生殖系和体细胞变异的整合改善慢性淋巴细胞白血病风险分层。
Cancer Res. 2025 Jul 15;85(14):2743-2752. doi: 10.1158/0008-5472.CAN-24-4251.
7
Genomic alterations in normal breast tissues preceding breast cancer diagnosis.乳腺癌诊断前正常乳腺组织中的基因组改变。
Breast Cancer Res. 2025 Apr 22;27(1):60. doi: 10.1186/s13058-025-02018-5.
8
Disease-modifying effects of TMEM106B in genetic frontotemporal dementia: a longitudinal GENFI study.跨膜蛋白106B(TMEM106B)在遗传性额颞叶痴呆中的疾病修饰作用:一项纵向GENFI研究
Brain. 2025 Apr 22. doi: 10.1093/brain/awaf019.
9
LILRB3 genetic variation is associated with kidney transplant failure in African American recipients.LILRB3基因变异与非裔美国受者的肾移植失败有关。
Nat Med. 2025 May;31(5):1677-1687. doi: 10.1038/s41591-025-03568-z. Epub 2025 Mar 10.
10
Copy Number Variation and Haplotype Analysis of 17q21.31 Reveals Increased Risk Associated with Progressive Supranuclear Palsy and Gene Expression Changes in Neuronal Cells.17q21.31的拷贝数变异与单倍型分析揭示进行性核上性麻痹相关风险增加及神经元细胞中的基因表达变化
Mov Disord. 2025 May;40(5):950-961. doi: 10.1002/mds.30150. Epub 2025 Mar 8.
对两个中国藏缅少数民族的 165 个遗传多态性 SNP 进行大规模平行测序。
Forensic Sci Int Genet. 2018 May;34:141-147. doi: 10.1016/j.fsigen.2018.02.009. Epub 2018 Feb 13.
4
Population Stratification in Genetic Association Studies.基因关联研究中的群体分层
Curr Protoc Hum Genet. 2017 Oct 18;95:1.22.1-1.22.23. doi: 10.1002/cphg.48.
5
Ancestry inference using principal component analysis and spatial analysis: a distance-based analysis to account for population substructure.利用主成分分析和空间分析进行祖籍推断:基于距离的分析方法,用于解释人口亚结构。
BMC Genomics. 2017 Oct 16;18(1):789. doi: 10.1186/s12864-017-4166-8.
6
Quickly identifying identical and closely related subjects in large databases using genotype data.利用基因型数据在大型数据库中快速识别相同和密切相关的个体。
PLoS One. 2017 Jun 13;12(6):e0179106. doi: 10.1371/journal.pone.0179106. eCollection 2017.
7
FlashPCA2: principal component analysis of Biobank-scale genotype datasets.FlashPCA2:生物样本库规模基因型数据集的主成分分析
Bioinformatics. 2017 Sep 1;33(17):2776-2778. doi: 10.1093/bioinformatics/btx299.
8
Recent advances in the study of fine-scale population structure in humans.人类精细尺度种群结构研究的最新进展。
Curr Opin Genet Dev. 2016 Dec;41:98-105. doi: 10.1016/j.gde.2016.08.007. Epub 2016 Sep 20.
9
Efficient analysis of large datasets and sex bias with ADMIXTURE.使用ADMIXTURE对大型数据集和性别偏差进行有效分析。
BMC Bioinformatics. 2016 May 23;17:218. doi: 10.1186/s12859-016-1082-x.
10
FastPop: a rapid principal component derived method to infer intercontinental ancestry using genetic data.FastPop:一种利用遗传数据推断洲际血统的快速主成分衍生方法。
BMC Bioinformatics. 2016 Mar 9;17:122. doi: 10.1186/s12859-016-0965-1.