• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

StructHDP:从混合基因型数据中自动推断聚类数和群体结构。

StructHDP: automatic inference of number of clusters and population structure from admixed genotype data.

机构信息

School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.

出版信息

Bioinformatics. 2011 Jul 1;27(13):i324-32. doi: 10.1093/bioinformatics/btr242.

DOI:10.1093/bioinformatics/btr242
PMID:21685088
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3117349/
Abstract

MOTIVATION

Clustering of genotype data is an important way of understanding similarities and differences between populations. A summary of populations through clustering allows us to make inferences about the evolutionary history of the populations. Many methods have been proposed to perform clustering on multilocus genotype data. However, most of these methods do not directly address the question of how many clusters the data should be divided into and leave that choice to the user.

METHODS

We present StructHDP, which is a method for automatically inferring the number of clusters from genotype data in the presence of admixture. Our method is an extension of two existing methods, Structure and Structurama. Using a Hierarchical Dirichlet Process (HDP), we model the presence of admixture of an unknown number of ancestral populations in a given sample of genotype data. We use a Gibbs sampler to perform inference on the resulting model and infer the ancestry proportions and the number of clusters that best explain the data.

RESULTS

To demonstrate our method, we simulated data from an island model using the neutral coalescent. Comparing the results of StructHDP with Structurama shows the utility of combining HDPs with the Structure model. We used StructHDP to analyze a dataset of 155 Taita thrush, Turdus helleri, which has been previously analyzed using Structure and Structurama. StructHDP correctly picks the optimal number of populations to cluster the data. The clustering based on the inferred ancestry proportions also agrees with that inferred using Structure for the optimal number of populations. We also analyzed data from 1048 individuals from the Human Genome Diversity project from 53 world populations. We found that the clusters obtained correspond with major geographical divisions of the world, which is in agreement with previous analyses of the dataset.

AVAILABILITY

StructHDP is written in C++. The code will be available for download at http://www.sailing.cs.cmu.edu/structhdp.

CONTACT

suyash@cs.cmu.edu; epxing@cs.cmu.edu.

摘要

动机

基因型数据的聚类是理解群体之间相似性和差异性的重要方法。通过聚类对群体进行总结,我们可以推断出群体的进化历史。已经提出了许多方法来对多位点基因型数据进行聚类。然而,这些方法中的大多数并没有直接解决数据应该分为多少个簇的问题,而是将这个选择留给用户。

方法

我们提出了 StructHDP,这是一种在存在混合的情况下从基因型数据中自动推断簇数的方法。我们的方法是现有两种方法 Structure 和 Structurama 的扩展。我们使用分层 Dirichlet 过程 (HDP) 来对给定的基因型数据样本中未知数量的祖先群体的混合进行建模。我们使用 Gibbs 采样器对生成的模型进行推断,并推断出最能解释数据的祖先比例和簇数。

结果

为了演示我们的方法,我们使用中性合并模型模拟了来自岛屿模型的数据。将 StructHDP 的结果与 Structurama 进行比较表明了将 HDP 与 Structure 模型结合使用的有效性。我们使用 StructHDP 分析了 155 只泰塔画眉(Turdus helleri)的数据,这些数据以前使用 Structure 和 Structurama 进行过分析。StructHDP 正确地选择了聚类数据的最佳群体数量。基于推断出的祖先比例进行聚类的结果也与使用 Structure 推断出的最佳群体数量的聚类结果一致。我们还分析了来自 53 个世界人群的 1048 个人的人类基因组多样性项目的数据。我们发现,获得的聚类与世界的主要地理分区相对应,这与对数据集的先前分析一致。

可用性

StructHDP 是用 C++编写的。代码将可在 http://www.sailing.cs.cmu.edu/structhdp 下载。

联系方式

suyash@cs.cmu.edu; epxing@cs.cmu.edu。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/235967bbe97d/btr242f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/4c77f5d8ac20/btr242f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/0e9310d37cd8/btr242f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/452eff5c622c/btr242f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/f9e87659f1e4/btr242f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/677391ced670/btr242f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/a04bd826fc24/btr242f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/cf97790550e1/btr242f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/4f9b3eda49ec/btr242f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/235967bbe97d/btr242f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/4c77f5d8ac20/btr242f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/0e9310d37cd8/btr242f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/452eff5c622c/btr242f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/f9e87659f1e4/btr242f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/677391ced670/btr242f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/a04bd826fc24/btr242f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/cf97790550e1/btr242f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/4f9b3eda49ec/btr242f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/222b/3117349/235967bbe97d/btr242f9.jpg

相似文献

1
StructHDP: automatic inference of number of clusters and population structure from admixed genotype data.StructHDP:从混合基因型数据中自动推断聚类数和群体结构。
Bioinformatics. 2011 Jul 1;27(13):i324-32. doi: 10.1093/bioinformatics/btr242.
2
Inference of population structure using multilocus genotype data.利用多位点基因型数据推断群体结构。
Genetics. 2000 Jun;155(2):945-59. doi: 10.1093/genetics/155.2.945.
3
Evaluation of model fit of inferred admixture proportions.推断的混合比例模型拟合度评估。
Mol Ecol Resour. 2020 Jul;20(4):936-949. doi: 10.1111/1755-0998.13171. Epub 2020 May 25.
4
A Continuous Correlated Beta Process Model for Genetic Ancestry in Admixed Populations.混合群体中遗传血统的连续相关贝塔过程模型
PLoS One. 2016 Mar 11;11(3):e0151047. doi: 10.1371/journal.pone.0151047. eCollection 2016.
5
SHIPS: Spectral Hierarchical clustering for the Inference of Population Structure in genetic studies.SHIPS:遗传研究中用于推断群体结构的谱层次聚类。
PLoS One. 2012;7(10):e45685. doi: 10.1371/journal.pone.0045685. Epub 2012 Oct 12.
6
A general approach for inferring the ancestry of recent ancestors of an admixed individual.推断混合个体最近祖先祖先的一般方法。
Proc Natl Acad Sci U S A. 2024 Jan 9;121(2):e2316242120. doi: 10.1073/pnas.2316242120. Epub 2024 Jan 2.
7
Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations.利用多个群体的等位基因频率从DNA序列数据中快速推断个体祖先。
BMC Bioinformatics. 2015 Jan 16;16:4. doi: 10.1186/s12859-014-0418-7.
8
A non-parametric mixture model for genome-enabled prediction of genetic value for a quantitative trait.一种用于基于基因组预测数量性状遗传值的非参数混合模型。
Genetica. 2010 Oct;138(9-10):959-77. doi: 10.1007/s10709-010-9478-4. Epub 2010 Aug 25.
9
LEI: A Novel Allele Frequency-Based Feature Selection Method for Multi-ancestry Admixed Populations.LEI:一种基于新型等位基因频率的多血统混合人群特征选择方法。
Sci Rep. 2019 Jul 31;9(1):11103. doi: 10.1038/s41598-019-47012-y.
10
Population inference from contemporary American craniometrics.基于当代美国颅骨测量学的人口推断
Am J Phys Anthropol. 2016 Aug;160(4):604-24. doi: 10.1002/ajpa.22959. Epub 2016 Feb 19.

引用本文的文献

1
I-SVVS: integrative stochastic variational variable selection to explore joint patterns of multi-omics microbiome data.I-SVVS:整合随机变分变量选择以探索多组学微生物组数据的联合模式
Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf132.
2
Finding the mean in a partition distribution.求分区分布的均值。
BMC Bioinformatics. 2018 Oct 12;19(1):375. doi: 10.1186/s12859-018-2359-z.
3
Efficacy of population structure analysis with breeding populations and inbred lines.利用育种群体和自交系进行群体结构分析的功效。

本文引用的文献

1
Fast model-based estimation of ancestry in unrelated individuals.基于模型的无关个体祖先快速估计
Genome Res. 2009 Sep;19(9):1655-64. doi: 10.1101/gr.094052.109. Epub 2009 Jul 31.
2
mStruct: inference of population structure in light of both genetic admixing and allele mutations.mStruct:基于遗传混合和等位基因突变推断群体结构。
Genetics. 2009 Jun;182(2):575-93. doi: 10.1534/genetics.108.100222. Epub 2009 Apr 10.
3
Genes mirror geography within Europe.基因反映了欧洲内部的地理特征。
Genetica. 2013 Sep;141(7-9):389-99. doi: 10.1007/s10709-013-9738-1. Epub 2013 Sep 21.
Nature. 2008 Nov 6;456(7218):98-101. doi: 10.1038/nature07331. Epub 2008 Aug 31.
4
Inference of population structure under a Dirichlet process model.狄利克雷过程模型下的群体结构推断
Genetics. 2007 Apr;175(4):1787-802. doi: 10.1534/genetics.106.061317. Epub 2007 Jan 21.
5
Population structure and eigenanalysis.群体结构与特征分析
PLoS Genet. 2006 Dec;2(12):e190. doi: 10.1371/journal.pgen.0020190.
6
Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa.人类群体中遗传距离和地理距离的关系对源于非洲的连续奠基者效应的支持。
Proc Natl Acad Sci U S A. 2005 Nov 1;102(44):15942-7. doi: 10.1073/pnas.0507611102. Epub 2005 Oct 21.
7
Estimation of individual admixture: analytical and study design considerations.个体混合比例的估计:分析与研究设计考量
Genet Epidemiol. 2005 May;28(4):289-301. doi: 10.1002/gepi.20064.
8
Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies.利用多位点基因型数据推断群体结构:连锁位点与相关等位基因频率
Genetics. 2003 Aug;164(4):1567-87. doi: 10.1093/genetics/164.4.1567.
9
Genetic structure of human populations.人类群体的遗传结构。
Science. 2002 Dec 20;298(5602):2381-5. doi: 10.1126/science.1078311.
10
Generating samples under a Wright-Fisher neutral model of genetic variation.在遗传变异的赖特-费希尔中性模型下生成样本。
Bioinformatics. 2002 Feb;18(2):337-8. doi: 10.1093/bioinformatics/18.2.337.