• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用稀疏拉普拉斯特征函数进行祖先信息标记选择和群体结构可视化。

Ancestral informative marker selection and population structure visualization using sparse Laplacian eigenfunctions.

机构信息

Department of Radiology, The University of Chicago, Chicago, Illinois, United States of America.

出版信息

PLoS One. 2010 Nov 4;5(11):e13734. doi: 10.1371/journal.pone.0013734.

DOI:10.1371/journal.pone.0013734
PMID:21079796
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2973949/
Abstract

Identification of a small panel of population structure informative markers can reduce genotyping cost and is useful in various applications, such as ancestry inference in association mapping, forensics and evolutionary theory in population genetics. Traditional methods to ascertain ancestral informative markers usually require the prior knowledge of individual ancestry and have difficulty for admixed populations. Recently Principal Components Analysis (PCA) has been employed with success to select SNPs which are highly correlated with top significant principal components (PCs) without use of individual ancestral information. The approach is also applicable to admixed populations. Here we propose a novel approach based on our recent result on summarizing population structure by graph laplacian eigenfunctions, which differs from PCA in that it is geometric and robust to outliers. Our approach also takes advantage of the priori sparseness of informative markers in the genome. Through simulation of a ring population and the real global population sample HGDP of 650K SNPs genotyped in 940 unrelated individuals, we validate the proposed algorithm at selecting most informative markers, a small fraction of which can recover the similar underlying population structure efficiently. Employing a standard Support Vector Machine (SVM) to predict individuals' continental memberships on HGDP dataset of seven continents, we demonstrate that the selected SNPs by our method are more informative but less redundant than those selected by PCA. Our algorithm is a promising tool in genome-wide association studies and population genetics, facilitating the selection of structure informative markers, efficient detection of population substructure and ancestral inference.

摘要

鉴定一小部分具有群体结构信息量的标记可以降低基因分型成本,并在各种应用中非常有用,例如关联作图中的祖先推断、法医学和群体遗传学中的进化理论。确定祖先信息标记的传统方法通常需要个体祖先的先验知识,并且对于混合群体有困难。最近,主成分分析(PCA)已成功地用于选择与最重要的主成分(PC)高度相关的 SNP,而无需使用个体祖先信息。该方法也适用于混合群体。在这里,我们提出了一种基于我们最近关于通过图拉普拉斯特征函数总结群体结构的结果的新方法,该方法与 PCA 不同,因为它是几何的,并且对离群值具有鲁棒性。我们的方法还利用了基因组中信息量标记的先验稀疏性。通过模拟一个环形群体和 940 个无关个体中 650K SNP 的真实全球群体样本 HGDP,我们验证了该算法在选择最具信息量标记方面的有效性,其中一小部分标记可以有效地恢复相似的潜在群体结构。在 HGDP 数据集的七个大陆上,我们使用标准支持向量机(SVM)来预测个体的大陆归属,证明了我们的方法选择的 SNP 比 PCA 选择的 SNP 更具信息量但冗余度更低。我们的算法是全基因组关联研究和群体遗传学中的一种有前途的工具,有助于选择结构信息量标记、有效检测群体亚结构和祖先推断。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352c/2973949/24019507fc6d/pone.0013734.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352c/2973949/ff934d5185a1/pone.0013734.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352c/2973949/724d3c272e77/pone.0013734.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352c/2973949/199776c4427a/pone.0013734.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352c/2973949/bd89ebf6a082/pone.0013734.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352c/2973949/4d70c6023c91/pone.0013734.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352c/2973949/d57502f8c6b5/pone.0013734.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352c/2973949/31a79d1fa28b/pone.0013734.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352c/2973949/24019507fc6d/pone.0013734.g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352c/2973949/ff934d5185a1/pone.0013734.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352c/2973949/724d3c272e77/pone.0013734.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352c/2973949/199776c4427a/pone.0013734.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352c/2973949/bd89ebf6a082/pone.0013734.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352c/2973949/4d70c6023c91/pone.0013734.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352c/2973949/d57502f8c6b5/pone.0013734.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352c/2973949/31a79d1fa28b/pone.0013734.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/352c/2973949/24019507fc6d/pone.0013734.g008.jpg

相似文献

1
Ancestral informative marker selection and population structure visualization using sparse Laplacian eigenfunctions.利用稀疏拉普拉斯特征函数进行祖先信息标记选择和群体结构可视化。
PLoS One. 2010 Nov 4;5(11):e13734. doi: 10.1371/journal.pone.0013734.
2
MI-MAAP: marker informativeness for multi-ancestry admixed populations.MI-MAAP:多祖混合人群的标记信息量。
BMC Bioinformatics. 2020 Apr 3;21(1):131. doi: 10.1186/s12859-020-3462-5.
3
PCA-correlated SNPs for structure identification in worldwide human populations.用于全球人类群体结构识别的与主成分分析相关的单核苷酸多态性
PLoS Genet. 2007 Sep;3(9):1672-86. doi: 10.1371/journal.pgen.0030160.
4
A PCA-based method for ancestral informative markers selection in structured populations.一种基于主成分分析的方法用于在结构化群体中选择祖先信息标记。
Sci China C Life Sci. 2009 Oct;52(10):972-6. doi: 10.1007/s11427-009-0128-y. Epub 2009 Nov 13.
5
Sparse principal component analysis for identifying ancestry-informative markers in genome-wide association studies.稀疏主成分分析在全基因组关联研究中识别与祖先相关的标记。
Genet Epidemiol. 2012 May;36(4):293-302. doi: 10.1002/gepi.21621. Epub 2012 Apr 16.
6
Tracing sub-structure in the European American population with PCA-informative markers.使用主成分分析(PCA)信息性标记物追踪欧裔美国人种群中的亚结构。
PLoS Genet. 2008 Jul 4;4(7):e1000114. doi: 10.1371/journal.pgen.1000114.
7
Straightforward inference of ancestry and admixture proportions through ancestry-informative insertion deletion multiplexing.通过基于祖先信息的插入缺失多重PCR 直接推断祖先和混合比例。
PLoS One. 2012;7(1):e29684. doi: 10.1371/journal.pone.0029684. Epub 2012 Jan 17.
8
Global analysis of population stratification using a smart panel of 27 continental ancestry-informative SNPs.利用一个包含 27 个大陆祖先信息单核苷酸多态性(SNP)的智能面板进行全球人群结构分层的分析。
Forensic Sci Int Genet. 2018 Jul;35:e10-e12. doi: 10.1016/j.fsigen.2018.05.006. Epub 2018 May 18.
9
FastPop: a rapid principal component derived method to infer intercontinental ancestry using genetic data.FastPop:一种利用遗传数据推断洲际血统的快速主成分衍生方法。
BMC Bioinformatics. 2016 Mar 9;17:122. doi: 10.1186/s12859-016-0965-1.
10
Choice of population structure informative principal components for adjustment in a case-control study.用于病例对照研究中调整的群体结构信息主成分的选择。
BMC Genet. 2011 Jul 19;12:64. doi: 10.1186/1471-2156-12-64.

引用本文的文献

1
The Association of Childhood Allergic Diseases with Prenatal Exposure to Pollen Grains Through At-Birth DNA Methylation.儿童过敏性疾病与出生时DNA甲基化导致的产前花粉颗粒暴露的关联。
Epigenomes. 2025 Mar 11;9(1):9. doi: 10.3390/epigenomes9010009.
2
Including diverse and admixed populations in genetic epidemiology research.将多样化和混合人群纳入遗传流行病学研究中。
Genet Epidemiol. 2022 Oct;46(7):347-371. doi: 10.1002/gepi.22492. Epub 2022 Jul 16.
3
Genome-wide mapping of quantitative trait loci in admixed populations using mixed linear model and Bayesian multiple regression analysis.

本文引用的文献

1
High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics.高维稀疏因子建模:在基因表达基因组学中的应用
J Am Stat Assoc. 2008 Dec 1;103(484):1438-1456. doi: 10.1198/016214508000000869.
2
Graphic analysis of population structure on genome-wide rheumatoid arthritis data.基于全基因组类风湿性关节炎数据的群体结构图形分析。
BMC Proc. 2009 Dec 15;3 Suppl 7(Suppl 7):S110. doi: 10.1186/1753-6561-3-s7-s110.
3
Laplacian eigenfunctions learn population structure.拉普拉斯特征函数可学习群体结构。
混合线性模型和贝叶斯多元回归分析在混合人群中进行数量性状基因座的全基因组图谱绘制。
Genet Sel Evol. 2018 Jun 19;50(1):32. doi: 10.1186/s12711-018-0402-1.
4
Clustering of 770,000 genomes reveals post-colonial population structure of North America.对 77 万份基因组进行聚类分析,揭示了北美后殖民时期的人口结构。
Nat Commun. 2017 Feb 7;8:14238. doi: 10.1038/ncomms14238.
5
Manifold learning for human population structure studies.用于人类群体结构研究的流形学习。
PLoS One. 2012;7(1):e29901. doi: 10.1371/journal.pone.0029901. Epub 2012 Jan 17.
PLoS One. 2009 Dec 1;4(12):e7928. doi: 10.1371/journal.pone.0007928.
4
The role of geography in human adaptation.地理在人类适应中的作用。
PLoS Genet. 2009 Jun;5(6):e1000500. doi: 10.1371/journal.pgen.1000500. Epub 2009 Jun 5.
5
Discovering genetic ancestry using spectral graph theory.利用谱图理论探寻遗传渊源。
Genet Epidemiol. 2010 Jan;34(1):51-9. doi: 10.1002/gepi.20434.
6
Genome-wide insights into the patterns and determinants of fine-scale population structure in humans.对人类精细尺度种群结构模式和决定因素的全基因组洞察。
Am J Hum Genet. 2009 May;84(5):641-50. doi: 10.1016/j.ajhg.2009.04.015.
7
Tracing sub-structure in the European American population with PCA-informative markers.使用主成分分析(PCA)信息性标记物追踪欧裔美国人种群中的亚结构。
PLoS Genet. 2008 Jul 4;4(7):e1000114. doi: 10.1371/journal.pgen.1000114.
8
Genes mirror geography within Europe.基因反映了欧洲内部的地理特征。
Nature. 2008 Nov 6;456(7218):98-101. doi: 10.1038/nature07331. Epub 2008 Aug 31.
9
An overview of statistical learning theory.统计学习理论概述。
IEEE Trans Neural Netw. 1999;10(5):988-99. doi: 10.1109/72.788640.
10
Analysis and application of European genetic substructure using 300 K SNP information.利用30万单核苷酸多态性信息对欧洲遗传亚结构进行分析与应用
PLoS Genet. 2008 Jan;4(1):e4. doi: 10.1371/journal.pgen.0040004.