• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

内部数据集市方法用于SNP基因型群体遗传学分析的可行性。

Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes.

作者信息

Amigo Jorge, Phillips Christopher, Salas Antonio, Carracedo Angel

机构信息

Spanish National Genotyping Center (CeGen), Genomic Medicine Group, CIBERER, University of Santiago de Compostela, Galicia, Spain.

出版信息

BMC Bioinformatics. 2009 Mar 19;10 Suppl 3(Suppl 3):S5. doi: 10.1186/1471-2105-10-S3-S5.

DOI:10.1186/1471-2105-10-S3-S5
PMID:19344481
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2665053/
Abstract

BACKGROUND

Databases containing very large amounts of SNP (Single Nucleotide Polymorphism) data are now freely available for researchers interested in medical and/or population genetics applications. While many of these SNP repositories have implemented data retrieval tools for general-purpose mining, these alone cannot cover the broad spectrum of needs of most medical and population genetics studies.

RESULTS

To address this limitation, we have built in-house customized data marts from the raw data provided by the largest public databases. In particular, for population genetics analysis based on genotypes we have built a set of data processing scripts that deal with raw data coming from the major SNP variation databases (e.g. HapMap, Perlegen), stripping them into single genotypes and then grouping them into populations, then merged with additional complementary descriptive information extracted from dbSNP. This allows not only in-house standardization and normalization of the genotyping data retrieved from different repositories, but also the calculation of statistical indices from simple allele frequency estimates to more elaborate genetic differentiation tests within populations, together with the ability to combine population samples from different databases.

CONCLUSION

The present study demonstrates the viability of implementing scripts for handling extensive datasets of SNP genotypes with low computational costs, dealing with certain complex issues that arise from the divergent nature and configuration of the most popular SNP repositories. The information contained in these databases can also be enriched with additional information obtained from other complementary databases, in order to build a dedicated data mart. Updating the data structure is straightforward, as well as permitting easy implementation of new external data and the computation of supplementary statistical indices of interest.

摘要

背景

对于对医学和/或群体遗传学应用感兴趣的研究人员来说,现在可以免费获得包含大量单核苷酸多态性(SNP)数据的数据库。虽然这些SNP存储库中的许多都已实现了用于通用挖掘的数据检索工具,但仅这些工具无法满足大多数医学和群体遗传学研究的广泛需求。

结果

为了解决这一限制,我们根据最大的公共数据库提供的原始数据构建了内部定制的数据集市。特别是,对于基于基因型的群体遗传学分析,我们编写了一组数据处理脚本,用于处理来自主要SNP变异数据库(如HapMap、Perlegen)的原始数据,将其拆分为单个基因型,然后按群体进行分组,再与从dbSNP中提取的其他补充描述性信息合并。这不仅实现了从不同存储库检索到的基因分型数据的内部标准化和规范化,还能从简单的等位基因频率估计到群体内更精细的遗传分化测试进行统计指标计算,同时具备合并来自不同数据库的群体样本的能力。

结论

本研究证明了以低计算成本实现处理大量SNP基因型数据集脚本的可行性,解决了因最流行的SNP存储库的不同性质和配置而产生的某些复杂问题。这些数据库中包含的信息还可以通过从其他补充数据库获得的额外信息进行丰富,以构建一个专用的数据集市。更新数据结构很简单,并且便于实现新的外部数据以及计算感兴趣的补充统计指标。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/456e/2665053/23d8b2941d67/1471-2105-10-S3-S5-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/456e/2665053/80ed21fe3c31/1471-2105-10-S3-S5-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/456e/2665053/9da256b6ccc7/1471-2105-10-S3-S5-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/456e/2665053/6f834d97989f/1471-2105-10-S3-S5-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/456e/2665053/4b9efe07e934/1471-2105-10-S3-S5-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/456e/2665053/9661445bddac/1471-2105-10-S3-S5-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/456e/2665053/23d8b2941d67/1471-2105-10-S3-S5-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/456e/2665053/80ed21fe3c31/1471-2105-10-S3-S5-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/456e/2665053/9da256b6ccc7/1471-2105-10-S3-S5-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/456e/2665053/6f834d97989f/1471-2105-10-S3-S5-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/456e/2665053/4b9efe07e934/1471-2105-10-S3-S5-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/456e/2665053/9661445bddac/1471-2105-10-S3-S5-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/456e/2665053/23d8b2941d67/1471-2105-10-S3-S5-6.jpg

相似文献

1
Viability of in-house datamarting approaches for population genetics analysis of SNP genotypes.内部数据集市方法用于SNP基因型群体遗传学分析的可行性。
BMC Bioinformatics. 2009 Mar 19;10 Suppl 3(Suppl 3):S5. doi: 10.1186/1471-2105-10-S3-S5.
2
ENGINES: exploring single nucleotide variation in entire human genomes.引擎:探索整个人类基因组中的单核苷酸变异。
BMC Bioinformatics. 2011 Apr 19;12:105. doi: 10.1186/1471-2105-12-105.
3
SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access.SPSmart:使基于群体的单核苷酸多态性(SNP)基因型数据库适用于快速全面的网络访问。
BMC Bioinformatics. 2008 Oct 10;9:428. doi: 10.1186/1471-2105-9-428.
4
Next generation tools for the annotation of human SNPs.用于人类单核苷酸多态性注释的下一代工具。
Brief Bioinform. 2009 Jan;10(1):35-52. doi: 10.1093/bib/bbn047.
5
Ferret: a user-friendly Java tool to extract data from the 1000 Genomes Project.雪貂:一个用于从千人基因组计划中提取数据的用户友好型Java工具。
Bioinformatics. 2016 Jul 15;32(14):2224-6. doi: 10.1093/bioinformatics/btw147. Epub 2016 Mar 18.
6
FunctSNP: an R package to link SNPs to functional knowledge and dbAutoMaker: a suite of Perl scripts to build SNP databases.FunctSNP:一个将 SNPs 与功能知识联系起来的 R 包和 dbAutoMaker:一套用于构建 SNP 数据库的 Perl 脚本。
BMC Bioinformatics. 2010 Jun 9;11:311. doi: 10.1186/1471-2105-11-311.
7
An integrated analysis tool for analyzing hybridization intensities and genotypes using new-generation population-optimized human arrays.一种使用新一代群体优化人类阵列分析杂交强度和基因型的综合分析工具。
BMC Genomics. 2016 Mar 31;17:266. doi: 10.1186/s12864-016-2478-8.
8
SNPchiMp v.3: integrating and standardizing single nucleotide polymorphism data for livestock species.SNPchiMp v.3:整合与标准化家畜物种的单核苷酸多态性数据
BMC Genomics. 2015 Apr 10;16(1):283. doi: 10.1186/s12864-015-1497-1.
9
LD2SNPing: linkage disequilibrium plotter and RFLP enzyme mining for tag SNPs.LD2SNPing:用于标签单核苷酸多态性的连锁不平衡绘图及限制性片段长度多态性酶挖掘
BMC Genet. 2009 Jun 6;10:26. doi: 10.1186/1471-2156-10-26.
10
A multi-array multi-SNP genotyping algorithm for Affymetrix SNP microarrays.一种用于Affymetrix SNP微阵列的多阵列多SNP基因分型算法。
Bioinformatics. 2007 Jun 15;23(12):1459-67. doi: 10.1093/bioinformatics/btm131. Epub 2007 Apr 25.

引用本文的文献

1
Inter-laboratory study on standardized MPS libraries: evaluation of performance, concordance, and sensitivity using mixtures and degraded DNA.标准化微卫星文库的实验室间研究:使用混合物和降解DNA评估性能、一致性和灵敏度
Int J Legal Med. 2020 Jan;134(1):185-198. doi: 10.1007/s00414-019-02201-2. Epub 2019 Nov 19.
2
Inferring population structure and relationship using minimal independent evolutionary markers in Y-chromosome: a hybrid approach of recursive feature selection for hierarchical clustering.利用Y染色体中最小数量的独立进化标记推断群体结构和关系:一种用于层次聚类的递归特征选择混合方法。
Nucleic Acids Res. 2014 Sep;42(15):e122. doi: 10.1093/nar/gku585. Epub 2014 Jul 16.
3

本文引用的文献

1
The brave new era of human genetic testing.
Bioessays. 2008 Nov;30(11-12):1246-51. doi: 10.1002/bies.20837.
2
SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access.SPSmart:使基于群体的单核苷酸多态性(SNP)基因型数据库适用于快速全面的网络访问。
BMC Bioinformatics. 2008 Oct 10;9:428. doi: 10.1186/1471-2105-9-428.
3
Online resources for SNP analysis: a review and route map.单核苷酸多态性分析的在线资源:综述与路线图
Mol Biotechnol. 2007 Jan;35(1):65-97. doi: 10.1385/mb:35:1:65.
SInC: an accurate and fast error-model based simulator for SNPs, Indels and CNVs coupled with a read generator for short-read sequence data.
SInC:一种准确且快速的基于错误模型的 SNP、Indel 和 CNV 模拟器,结合了用于短读序列数据的读取生成器。
BMC Bioinformatics. 2014 Feb 5;15:40. doi: 10.1186/1471-2105-15-40.
4
Uniparental markers of contemporary Italian population reveals details on its pre-Roman heritage.当代意大利人群的单亲遗传标记揭示了其罗马前遗产的细节。
PLoS One. 2012;7(12):e50794. doi: 10.1371/journal.pone.0050794. Epub 2012 Dec 10.
5
ENGINES: exploring single nucleotide variation in entire human genomes.引擎:探索整个人类基因组中的单核苷酸变异。
BMC Bioinformatics. 2011 Apr 19;12:105. doi: 10.1186/1471-2105-12-105.
6
Investigating the role of mitochondrial haplogroups in genetic predisposition to meningococcal disease.研究线粒体单倍群在脑膜炎奈瑟菌病遗传易感性中的作用。
PLoS One. 2009 Dec 17;4(12):e8347. doi: 10.1371/journal.pone.0008347.
4
Standardized subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, accounting for atypical and duplicated samples and pairs of close relatives.HGDP-CEPH人类基因组多样性细胞系面板的标准化子集,包括非典型和重复样本以及近亲对。
Ann Hum Genet. 2006 Nov;70(Pt 6):841-7. doi: 10.1111/j.1469-1809.2006.00285.x.
5
A haplotype map of the human genome.人类基因组单倍型图谱。
Nature. 2005 Oct 27;437(7063):1299-320. doi: 10.1038/nature04226.
6
The International HapMap Project Web site.国际人类基因组单体型图计划网站。
Genome Res. 2005 Nov;15(11):1592-3. doi: 10.1101/gr.4413105.
7
Mapping by admixture linkage disequilibrium: advances, limitations and guidelines.通过混合连锁不平衡进行基因定位:进展、局限与指南
Nat Rev Genet. 2005 Aug;6(8):623-32. doi: 10.1038/nrg1657.
8
Perlegen sciences, inc.珀勒根科学公司
Pharmacogenomics. 2005 Jun;6(4):439-42. doi: 10.1517/14622416.6.4.439.
9
Data mart based research in heart surgery: challenges and benefit.基于数据集市的心脏外科研究:挑战与益处
Stud Health Technol Inform. 2004;107(Pt 1):8-12.
10
A human genome diversity cell line panel.一个人类基因组多样性细胞系面板。
Science. 2002 Apr 12;296(5566):261-2. doi: 10.1126/science.296.5566.261b.