PLINK：一个用于全基因组关联分析和基于群体的连锁分析的工具集。

PLINK: a tool set for whole-genome association and population-based linkage analyses.

作者信息

Purcell Shaun, Neale Benjamin, Todd-Brown Kathe, Thomas Lori, Ferreira Manuel A R, Bender David, Maller Julian, Sklar Pamela, de Bakker Paul I W, Daly Mark J, Sham Pak C

机构信息

Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA 02114, USA.

出版信息

Am J Hum Genet. 2007 Sep;81(3):559-75. doi: 10.1086/519795. Epub 2007 Jul 25.

DOI:10.1086/519795

PMID:17701901

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC1950838/

Abstract

Whole-genome association studies (WGAS) bring new computational, as well as analytic, challenges to researchers. Many existing genetic-analysis tools are not designed to handle such large data sets in a convenient manner and do not necessarily exploit the new opportunities that whole-genome data bring. To address these issues, we developed PLINK, an open-source C/C++ WGAS tool set. With PLINK, large data sets comprising hundreds of thousands of markers genotyped for thousands of individuals can be rapidly manipulated and analyzed in their entirety. As well as providing tools to make the basic analytic steps computationally efficient, PLINK also supports some novel approaches to whole-genome data that take advantage of whole-genome coverage. We introduce PLINK and describe the five main domains of function: data management, summary statistics, population stratification, association analysis, and identity-by-descent estimation. In particular, we focus on the estimation and use of identity-by-state and identity-by-descent information in the context of population-based whole-genome studies. This information can be used to detect and correct for population stratification and to identify extended chromosomal segments that are shared identical by descent between very distantly related individuals. Analysis of the patterns of segmental sharing has the potential to map disease loci that contain multiple rare variants in a population-based linkage analysis.

摘要

全基因组关联研究（WGAS）给研究人员带来了新的计算和分析挑战。许多现有的基因分析工具并非设计用于方便地处理如此庞大的数据集，也不一定能利用全基因组数据带来的新机遇。为解决这些问题，我们开发了PLINK，一个开源的C/C++全基因组关联研究工具集。使用PLINK，可以对包含为数千个个体进行基因分型的数十万个标记的大型数据集进行快速的整体处理和分析。除了提供使基本分析步骤在计算上高效的工具外，PLINK还支持一些利用全基因组覆盖优势的全基因组数据新方法。我们介绍PLINK并描述其五个主要功能领域：数据管理、汇总统计、群体分层、关联分析和同源性估计。特别是，我们重点关注在基于群体的全基因组研究背景下，状态同源性和血缘同源性信息的估计与使用。这些信息可用于检测和校正群体分层，并识别在亲缘关系非常远的个体之间通过血缘共享的延伸染色体片段。对片段共享模式的分析有可能在基于群体的连锁分析中定位包含多个罕见变异的疾病基因座。

相似文献

PLINK: a tool set for whole-genome association and population-based linkage analyses.PLINK：一个用于全基因组关联分析和基于群体的连锁分析的工具集。

Am J Hum Genet. 2007 Sep;81(3):559-75. doi: 10.1086/519795. Epub 2007 Jul 25.

PLINK: Key Functions for Data Analysis.PLINK：数据分析的关键功能。

Curr Protoc Hum Genet. 2018 Apr;97(1):e59. doi: 10.1002/cphg.59.

Inference of relationships in population data using identity-by-descent and identity-by-state.利用血缘关系和基因状态推断群体数据中的关系。

PLoS Genet. 2011 Sep;7(9):e1002287. doi: 10.1371/journal.pgen.1002287. Epub 2011 Sep 22.

Second-generation PLINK: rising to the challenge of larger and richer datasets.第二代PLINK：应对更大、更丰富数据集的挑战

Gigascience. 2015 Feb 25;4:7. doi: 10.1186/s13742-015-0047-8. eCollection 2015.

AUTOGSCAN: powerful tools for automated genome-wide linkage and linkage disequilibrium analysis.AUTOGSCAN：用于全基因组自动连锁和连锁不平衡分析的强大工具。

Twin Res Hum Genet. 2005 Feb;8(1):16-21. doi: 10.1375/1832427053435382.

Genome-wide linkage analysis with clustered SNP markers.使用聚类单核苷酸多态性标记进行全基因组连锁分析。

J Biomol Screen. 2009 Jan;14(1):92-6. doi: 10.1177/1087057108327327.

PSReliP: an integrated pipeline for analysis and visualization of population structure and relatedness based on genome-wide genetic variant data.PSReliP：一个基于全基因组遗传变异数据的分析和可视化群体结构及亲缘关系的集成分析工具。

BMC Bioinformatics. 2023 Apr 5;24(1):135. doi: 10.1186/s12859-023-05169-4.

High-resolution detection of identity by descent in unrelated individuals.高分辨率检测无关个体间的血缘关系。

Am J Hum Genet. 2010 Apr 9;86(4):526-39. doi: 10.1016/j.ajhg.2010.02.021. Epub 2010 Mar 18.

Lep-MAP3: robust linkage mapping even for low-coverage whole genome sequencing data.Lep-MAP3：即使对于低覆盖度的全基因组测序数据，也能实现稳健的连锁图谱构建。

Bioinformatics. 2017 Dec 1;33(23):3726-3732. doi: 10.1093/bioinformatics/btx494.

Detection of identity by descent using next-generation whole genome sequencing data.利用下一代全基因组测序数据进行血统身份检测。

BMC Bioinformatics. 2012 Jun 6;13:121. doi: 10.1186/1471-2105-13-121.

引用本文的文献

Genome-wide association analysis highlights genomic regions and genes potentially associated with anestrus in crossbred gilts.全基因组关联分析突出了与杂交后备母猪发情期缺失潜在相关的基因组区域和基因。

Mamm Genome. 2025 Sep 11. doi: 10.1007/s00335-025-10159-3.

Genetic and epigenetic analysis of plasma glial fibrillary acidic protein (GFAP) levels in PTSD.创伤后应激障碍患者血浆中胶质纤维酸性蛋白（GFAP）水平的遗传和表观遗传分析

Mol Psychiatry. 2025 Sep 10. doi: 10.1038/s41380-025-03232-5.

Construction of a Core Germplasm and Identification of Candidate SNPs Associated with Growth Performance of Epinephelus tukula by Whole-Genome Resequencing.通过全基因组重测序构建斜带石斑鱼核心种质并鉴定与生长性能相关的候选单核苷酸多态性位点

Mar Biotechnol (NY). 2025 Sep 10;27(5):136. doi: 10.1007/s10126-025-10502-4.

Whole genome sequence analysis of low-density lipoprotein cholesterol across 246 K individuals.对24.6万名个体的低密度脂蛋白胆固醇进行全基因组序列分析。

Genome Biol. 2025 Sep 9;26(1):273. doi: 10.1186/s13059-025-03698-0.

Genomic exploration of durable wheat rust resistance by integrating data from multiple worldwide populations.通过整合来自多个全球种群的数据对小麦持久锈病抗性进行基因组探索。

Plant Genome. 2025 Sep;18(3):e70093. doi: 10.1002/tpg2.70093.

Lineage-specific targets of positive selection in three leaf beetles correspond with defence capacity against their shared parasitoid wasp.三种叶甲中正向选择的谱系特异性靶点与它们对共同寄生蜂的防御能力相对应。

Heredity (Edinb). 2025 Sep 8. doi: 10.1038/s41437-025-00794-6.

Multiancestry brain pQTL fine-mapping and integration with genome-wide association studies of 21 neurologic and psychiatric conditions.多祖先脑蛋白定量性状基因座精细定位及与21种神经和精神疾病全基因组关联研究的整合

Nat Genet. 2025 Sep 8. doi: 10.1038/s41588-025-02291-2.

Identification of pathogenic cell types and shared genetic loci and genes for Alzheimer's disease and inflammatory bowel disease.阿尔茨海默病和炎症性肠病的致病细胞类型以及共享遗传位点和基因的鉴定。

Brief Funct Genomics. 2025 Jan 15;24. doi: 10.1093/bfgp/elaf013.

RAD-Seq-derived SNPs reveal no local population structure in the commercially important deep-sea queen snapper () in Puerto Rico.基于RAD-Seq技术获得的单核苷酸多态性（SNPs）表明，在波多黎各具有重要商业价值的深海皇后笛鲷（）中不存在本地种群结构。

Mar Life Sci Technol. 2025 May 12;7(3):594-605. doi: 10.1007/s42995-025-00289-7. eCollection 2025 Aug.

Comparative genomic insights into adaptation, selection signatures, and population dynamics in indigenous Indian sheep and foreign breeds.对印度本土绵羊和外国品种在适应性、选择特征及种群动态方面的比较基因组学见解。

Front Genet. 2025 Aug 21;16:1621960. doi: 10.3389/fgene.2025.1621960. eCollection 2025.

本文引用的文献

Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels.全基因组关联分析确定2型糖尿病和甘油三酯水平的基因座。

Science. 2007 Jun 1;316(5829):1331-6. doi: 10.1126/science.1142358. Epub 2007 Apr 26.

Ascertainment through family history of disease often decreases the power of family-based association studies.通过家族病史进行疾病确诊往往会降低基于家系的关联研究的效能。

Behav Genet. 2007 Jul;37(4):631-6. doi: 10.1007/s10519-007-9149-0. Epub 2007 Mar 20.

Implications of small effect sizes of individual genetic variants on the design and interpretation of genetic association studies of complex diseases.个体基因变异的小效应量对复杂疾病基因关联研究的设计与解读的影响

Am J Epidemiol. 2006 Oct 1;164(7):609-14. doi: 10.1093/aje/kwj259. Epub 2006 Aug 7.

Principal components analysis corrects for stratification in genome-wide association studies.主成分分析可校正全基因组关联研究中的分层现象。

Nat Genet. 2006 Aug;38(8):904-9. doi: 10.1038/ng1847. Epub 2006 Jul 23.

Evaluating and improving power in whole-genome association studies using fixed marker sets.使用固定标记集评估和提高全基因组关联研究的效能

Nat Genet. 2006 Jun;38(6):663-7. doi: 10.1038/ng1816. Epub 2006 May 21.

A fine-scale linkage-disequilibrium measure based on length of haplotype sharing.一种基于单倍型共享长度的精细尺度连锁不平衡度量。

Am J Hum Genet. 2006 Apr;78(4):615-28. doi: 10.1086/502632. Epub 2006 Feb 13.

Population structure, differential bias and genomic control in a large-scale, case-control association study.一项大规模病例对照关联研究中的群体结构、差异偏倚与基因组控制

Nat Genet. 2005 Nov;37(11):1243-6. doi: 10.1038/ng1653. Epub 2005 Oct 9.

A fine-scale map of recombination rates and hotspots across the human genome.一幅涵盖人类基因组重组率和热点的精细图谱。

Science. 2005 Oct 14;310(5746):321-4. doi: 10.1126/science.1117196.

Haplotype sharing analysis using mantel statistics.使用曼特尔统计法进行单倍型共享分析。

Hum Hered. 2005;59(2):67-78. doi: 10.1159/000085221. Epub 2005 Apr 18.

A note on exact tests of Hardy-Weinberg equilibrium.关于哈迪-温伯格平衡精确检验的一则注释。

Am J Hum Genet. 2005 May;76(5):887-93. doi: 10.1086/429864. Epub 2005 Mar 23.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验