• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

一种使用核主成分分析和优化进行群体结构推断的新颖快速方法。

A novel and fast approach for population structure inference using kernel-PCA and optimization.

作者信息

Popescu Andrei-Alin, Harper Andrea L, Trick Martin, Bancroft Ian, Huber Katharina T

机构信息

School of Computing Sciences, University of East Anglia, Norwich Research Park, Norwich, Norfolk NR4 7TJ, United Kingdom.

Centre for Novel Agricultural Products, Department of Biology, University of York, York YO10 5DD, United Kingdom.

出版信息

Genetics. 2014 Dec;198(4):1421-31. doi: 10.1534/genetics.114.171314. Epub 2014 Oct 16.

DOI:10.1534/genetics.114.171314
PMID:25326237
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4256762/
Abstract

Population structure is a confounding factor in genome-wide association studies, increasing the rate of false positive associations. To correct for it, several model-based algorithms such as ADMIXTURE and STRUCTURE have been proposed. These tend to suffer from the fact that they have a considerable computational burden, limiting their applicability when used with large datasets, such as those produced by next generation sequencing techniques. To address this, nonmodel based approaches such as sparse nonnegative matrix factorization (sNMF) and EIGENSTRAT have been proposed, which scale better with larger data. Here we present a novel nonmodel-based approach, population structure inference using kernel-PCA and optimization (PSIKO), which is based on a unique combination of linear kernel-PCA and least-squares optimization and allows for the inference of admixture coefficients, principal components, and number of founder populations of a dataset. PSIKO has been compared against existing leading methods on a variety of simulation scenarios, as well as on real biological data. We found that in addition to producing results of the same quality as other tested methods, PSIKO scales extremely well with dataset size, being considerably (up to 30 times) faster for longer sequences than even state-of-the-art methods such as sNMF. PSIKO and accompanying manual are freely available at https://www.uea.ac.uk/computing/psiko.

摘要

群体结构是全基因组关联研究中的一个混杂因素,会增加假阳性关联的发生率。为了对此进行校正,已经提出了几种基于模型的算法,如ADMIXTURE和STRUCTURE。然而,这些算法往往存在计算负担较重的问题,限制了它们在处理大型数据集(如下一代测序技术产生的数据集)时的适用性。为了解决这个问题,人们又提出了诸如稀疏非负矩阵分解(sNMF)和EIGENSTRAT等基于非模型的方法,这些方法在处理更大数据时扩展性更好。在此,我们提出了一种新颖的基于非模型的方法——使用核主成分分析和优化进行群体结构推断(PSIKO),该方法基于线性核主成分分析和最小二乘法优化的独特组合,能够推断数据集的混合系数、主成分以及奠基群体的数量。我们在各种模拟场景以及真实生物学数据上,将PSIKO与现有的领先方法进行了比较。我们发现,PSIKO除了能产生与其他测试方法质量相当的结果外,在数据集规模方面扩展性极佳,对于较长序列,其速度比诸如sNMF这样的前沿方法快得多(快达30倍)。可在https://www.uea.ac.uk/computing/psiko上免费获取PSIKO及其配套手册。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c65e/4256762/4fe9fe3f4a8e/1421fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c65e/4256762/70f435987f08/1421fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c65e/4256762/f80c5df1931c/1421fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c65e/4256762/a89212506ff5/1421fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c65e/4256762/4fe9fe3f4a8e/1421fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c65e/4256762/70f435987f08/1421fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c65e/4256762/f80c5df1931c/1421fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c65e/4256762/a89212506ff5/1421fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c65e/4256762/4fe9fe3f4a8e/1421fig4.jpg

相似文献

1
A novel and fast approach for population structure inference using kernel-PCA and optimization.一种使用核主成分分析和优化进行群体结构推断的新颖快速方法。
Genetics. 2014 Dec;198(4):1421-31. doi: 10.1534/genetics.114.171314. Epub 2014 Oct 16.
2
PSIKO2: a fast and versatile tool to infer population stratification on various levels in GWAS.PSIKO2:一种快速且通用的工具,可在 GWAS 中的各个层面推断群体分层。
Bioinformatics. 2015 Nov 1;31(21):3552-4. doi: 10.1093/bioinformatics/btv396. Epub 2015 Jul 2.
3
Ancestral informative marker selection and population structure visualization using sparse Laplacian eigenfunctions.利用稀疏拉普拉斯特征函数进行祖先信息标记选择和群体结构可视化。
PLoS One. 2010 Nov 4;5(11):e13734. doi: 10.1371/journal.pone.0013734.
4
A fast least-squares algorithm for population inference.一种快速的用于群体推断的最小二乘法。
BMC Bioinformatics. 2013 Jan 23;14:28. doi: 10.1186/1471-2105-14-28.
5
Fast and efficient estimation of individual ancestry coefficients.个体祖先系数的快速高效估计。
Genetics. 2014 Apr;196(4):973-83. doi: 10.1534/genetics.113.160572. Epub 2014 Feb 4.
6
Inferring Population Structure and Admixture Proportions in Low-Depth NGS Data.在低深度 NGS 数据中推断群体结构和混合比例。
Genetics. 2018 Oct;210(2):719-731. doi: 10.1534/genetics.118.301336. Epub 2018 Aug 21.
7
Fast and efficient correction for population stratification in multi-locus genome-wide association studies.多基因座全基因组关联研究中人群分层的快速高效校正。
Genetica. 2021 Dec;149(5-6):313-325. doi: 10.1007/s10709-021-00129-3. Epub 2021 Sep 4.
8
GRAF-pop: A Fast Distance-Based Method To Infer Subject Ancestry from Multiple Genotype Datasets Without Principal Components Analysis.GRAF-pop:一种无需主成分分析即可基于距离推断个体祖先的快速方法,适用于多种基因型数据集。
G3 (Bethesda). 2019 Aug 8;9(8):2447-2461. doi: 10.1534/g3.118.200925.
9
Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness.在存在亲缘关系的情况下,对群体结构进行稳健推断,以进行血统预测和分层校正。
Genet Epidemiol. 2015 May;39(4):276-93. doi: 10.1002/gepi.21896. Epub 2015 Mar 23.
10
Sparse principal component analysis for identifying ancestry-informative markers in genome-wide association studies.稀疏主成分分析在全基因组关联研究中识别与祖先相关的标记。
Genet Epidemiol. 2012 May;36(4):293-302. doi: 10.1002/gepi.21621. Epub 2012 Apr 16.

引用本文的文献

1
Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations.1000 基因组数据中信息位点的基因型暗示了人类群体的进化和混合。
Sci Rep. 2021 Sep 7;11(1):17741. doi: 10.1038/s41598-021-97129-2.
2
Genomic signatures of vegetable and oilseed allopolyploid Brassica juncea and genetic loci controlling the accumulation of glucosinolates.蔬菜和油籽异源多倍体芥菜基因组特征及调控硫苷积累的遗传位点。
Plant Biotechnol J. 2021 Dec;19(12):2619-2628. doi: 10.1111/pbi.13687. Epub 2021 Oct 1.
3
Data in support of genetic architecture of glucosinolate variations in .

本文引用的文献

1
fastSTRUCTURE: variational inference of population structure in large SNP data sets.fastSTRUCTURE:大型单核苷酸多态性(SNP)数据集中群体结构的变分推断
Genetics. 2014 Jun;197(2):573-89. doi: 10.1534/genetics.114.164350. Epub 2014 Apr 2.
2
Fast and efficient estimation of individual ancestry coefficients.个体祖先系数的快速高效估计。
Genetics. 2014 Apr;196(4):973-83. doi: 10.1534/genetics.113.160572. Epub 2014 Feb 4.
3
Associative transcriptomics of traits in the polyploid crop species Brassica napus.多倍体作物甘蓝型油菜性状的关联转录组学
支持……中硫代葡萄糖苷变异遗传结构的数据。 (原文句子不完整,推测补充完整后是“支持……中硫代葡萄糖苷变异遗传结构的数据”,这里“……”表示原文缺失的部分)
Data Brief. 2019 Aug 14;25:104402. doi: 10.1016/j.dib.2019.104402. eCollection 2019 Aug.
4
Species-Wide Variation in Shoot Nitrate Concentration, and Genetic Loci Controlling Nitrate, Phosphorus and Potassium Accumulation in L.全株硝酸盐浓度的种内变异以及控制番茄中硝酸盐、磷和钾积累的基因座
Front Plant Sci. 2018 Oct 16;9:1487. doi: 10.3389/fpls.2018.01487. eCollection 2018.
5
Genotyping by sequencing reveals contrasting patterns of population structure, ecologically mediated divergence, and long-distance dispersal in North American palms.通过测序进行基因分型揭示了北美棕榈树种群结构、生态介导的分化以及长距离扩散的对比模式。
Ecol Evol. 2018 May 8;8(11):5873-5890. doi: 10.1002/ece3.4125. eCollection 2018 Jun.
6
Identification of Candidate Genes for Calcium and Magnesium Accumulation in L. by Association Genetics.通过关联遗传学鉴定番茄中钙和镁积累的候选基因。 (注:原文中“L.”推测可能是“番茄(学名:Solanum lycopersicum)”的缩写,这里按照完整意思翻译了,如果是特定指代其他含“L.”的植物,需要根据实际情况调整。)
Front Plant Sci. 2017 Nov 15;8:1968. doi: 10.3389/fpls.2017.01968. eCollection 2017.
7
Validation of an updated Associative Transcriptomics platform for the polyploid crop species Brassica napus by dissection of the genetic architecture of erucic acid and tocopherol isoform variation in seeds.通过剖析种子中芥酸和生育酚异构体变异的遗传结构,验证了一种经改良的多倍体作物芸薹属油菜关联转录组学平台。
Plant J. 2018 Jan;93(1):181-192. doi: 10.1111/tpj.13767. Epub 2017 Dec 2.
8
Genome sequence and genetic diversity of European ash trees.欧洲白蜡树的基因组序列和遗传多样性。
Nature. 2017 Jan 12;541(7636):212-216. doi: 10.1038/nature20786. Epub 2016 Dec 26.
9
Molecular markers for tolerance of European ash (Fraxinus excelsior) to dieback disease identified using Associative Transcriptomics.利用关联转录组学鉴定欧洲白蜡树(欧洲白蜡)对枯死病耐受性的分子标记
Sci Rep. 2016 Jan 13;6:19335. doi: 10.1038/srep19335.
10
Using neutral, selected, and hitchhiker loci to assess connectivity of marine populations in the genomic era.利用中性位点、选择位点和搭便车位点评估基因组时代海洋种群的连通性。
Evol Appl. 2015 Sep;8(8):769-86. doi: 10.1111/eva.12288. Epub 2015 Jul 28.
Nat Biotechnol. 2012 Aug;30(8):798-802. doi: 10.1038/nbt.2302.
4
Principal components analysis of population admixture.群体混合的主成分分析。
PLoS One. 2012;7(7):e40115. doi: 10.1371/journal.pone.0040115. Epub 2012 Jul 9.
5
Dissecting the genome of the polyploid crop oilseed rape by transcriptome sequencing.通过转录组测序解析多倍体作物油菜的基因组。
Nat Biotechnol. 2011 Jul 31;29(8):762-6. doi: 10.1038/nbt.1926.
6
Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis.基于稀疏因子分析的人口结构分析:统一框架与新方法
PLoS Genet. 2010 Sep 16;6(9):e1001117. doi: 10.1371/journal.pgen.1001117.
7
Integrating common and rare genetic variation in diverse human populations.整合不同人类群体中的常见和罕见遗传变异。
Nature. 2010 Sep 2;467(7311):52-8. doi: 10.1038/nature09298.
8
MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus.MSMS:一个包含重组、种群结构和单一位点选择的合并模拟程序。
Bioinformatics. 2010 Aug 15;26(16):2064-5. doi: 10.1093/bioinformatics/btq322. Epub 2010 Jun 30.
9
Fast model-based estimation of ancestry in unrelated individuals.基于模型的无关个体祖先快速估计
Genome Res. 2009 Sep;19(9):1655-64. doi: 10.1101/gr.094052.109. Epub 2009 Jul 31.
10
Worldwide human relationships inferred from genome-wide patterns of variation.从全基因组变异模式推断全球人类关系。
Science. 2008 Feb 22;319(5866):1100-4. doi: 10.1126/science.1153717.