• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

人类DNA、mRNA和蛋白质参考序列之间的差异及其与人类群体中单核苷酸变异的关系。

Discrepancies between human DNA, mRNA and protein reference sequences and their relation to single nucleotide variants in the human population.

作者信息

Shirota Matsuyuki, Kinoshita Kengo

机构信息

Graduate School of Medicine, Tohoku University, Sendai, Miyagi 9808575, Japan Tohoku Medical Megabank Organization, Tohoku University, Sendai, Miyagi 9808575, Japan Graduate School of Information Sciences, Tohoku University, Sendai, Miyagi 9808579, Japan.

Tohoku Medical Megabank Organization, Tohoku University, Sendai, Miyagi 9808575, Japan Graduate School of Information Sciences, Tohoku University, Sendai, Miyagi 9808579, Japan Institute for Development, Aging and Cancer, Tohoku University, Sendai, Miyagi 9808575, Japan

出版信息

Database (Oxford). 2016 Sep 1;2016. doi: 10.1093/database/baw124. Print 2016.

DOI:10.1093/database/baw124
PMID:27589963
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5009343/
Abstract

The protein coding sequences of the human reference genome GRCh38, RefSeq mRNA and UniProt protein databases are sometimes inconsistent with each other, due to polymorphisms in the human population, but the overall landscape of the discordant sequences has not been clarified. In this study, we comprehensively listed the discordant bases and regions between the GRCh38, RefSeq and UniProt reference sequences, based on the genomic coordinates of GRCh38. We observed that the RefSeq sequences are more likely to represent the major alleles than GRCh38 and UniProt, by assigning the alternative allele frequencies of the discordant bases. Since some reference sequences have minor alleles, functional and structural annotations may be performed based on rare alleles in the human population, thereby biasing these analyses. Some of the differences between the RefSeq and GRCh38 account for biological differences due to known RNA-editing sites. The definitions of the coding regions are frequently complicated by possible micro-exons within introns and by SNVs with large alternative allele frequencies near exon-intron boundaries. The mRNA or protein regions missing from GRCh38 were mainly due to small deletions, and these sequences need to be identified. Taken together, our results clarify overall consistency and remaining inconsistency between the reference sequences.

摘要

由于人类群体中的多态性,人类参考基因组GRCh38、RefSeq mRNA和UniProt蛋白质数据库的蛋白质编码序列有时会相互不一致,但不一致序列的整体情况尚未明确。在本研究中,我们基于GRCh38的基因组坐标,全面列出了GRCh38、RefSeq和UniProt参考序列之间不一致的碱基和区域。通过指定不一致碱基的替代等位基因频率,我们观察到RefSeq序列比GRCh38和UniProt更有可能代表主要等位基因。由于一些参考序列含有次要等位基因,可能会基于人类群体中的罕见等位基因进行功能和结构注释,从而使这些分析产生偏差。RefSeq和GRCh38之间的一些差异是由已知RNA编辑位点导致的生物学差异。编码区的定义常常因内含子中可能存在的微小外显子以及外显子-内含子边界附近具有较大替代等位基因频率的单核苷酸变异而变得复杂。GRCh38中缺失的mRNA或蛋白质区域主要是由于小的缺失,这些序列需要被识别。综上所述,我们的结果阐明了参考序列之间的整体一致性和剩余的不一致性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c1a/5009343/f99428bac7be/baw124f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c1a/5009343/9bcdfea56e30/baw124f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c1a/5009343/9c01190f68d1/baw124f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c1a/5009343/970967534d34/baw124f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c1a/5009343/d79ea319cbca/baw124f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c1a/5009343/f99428bac7be/baw124f5p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c1a/5009343/9bcdfea56e30/baw124f1p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c1a/5009343/9c01190f68d1/baw124f2p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c1a/5009343/970967534d34/baw124f3p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c1a/5009343/d79ea319cbca/baw124f4p.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7c1a/5009343/f99428bac7be/baw124f5p.jpg

相似文献

1
Discrepancies between human DNA, mRNA and protein reference sequences and their relation to single nucleotide variants in the human population.人类DNA、mRNA和蛋白质参考序列之间的差异及其与人类群体中单核苷酸变异的关系。
Database (Oxford). 2016 Sep 1;2016. doi: 10.1093/database/baw124. Print 2016.
2
[Analysis, identification and correction of some errors of model refseqs appeared in NCBI Human Gene Database by in silico cloning and experimental verification of novel human genes].[通过新型人类基因的电子克隆和实验验证对NCBI人类基因数据库中出现的模型参考序列的一些错误进行分析、鉴定和校正]
Yi Chuan Xue Bao. 2004 May;31(5):431-43.
3
De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data.两个瑞典基因组的从头组装揭示了人类GRCh38参考基因组中缺失的片段,并改进了群体规模测序数据的变异检测。
Genes (Basel). 2018 Oct 9;9(10):486. doi: 10.3390/genes9100486.
4
Comparison of GENCODE and RefSeq gene annotation and the impact of reference geneset on variant effect prediction.GENCODE与RefSeq基因注释的比较以及参考基因集对变异效应预测的影响。
BMC Genomics. 2015;16 Suppl 8(Suppl 8):S2. doi: 10.1186/1471-2164-16-S8-S2. Epub 2015 Jun 18.
5
Analysis of human mRNAs with the reference genome sequence reveals potential errors, polymorphisms, and RNA editing.将人类mRNA与参考基因组序列进行分析可揭示潜在的错误、多态性和RNA编辑。
Genome Res. 2004 Oct;14(10B):2034-40. doi: 10.1101/gr.2467904.
6
Assembly and annotation of an Ashkenazi human reference genome.阿什肯纳兹人参考基因组的组装和注释。
Genome Biol. 2020 Jun 2;21(1):129. doi: 10.1186/s13059-020-02047-7.
7
Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis.GRCh38人类参考基因组对高通量测序数据分析的改进及影响
Genomics. 2017 Mar;109(2):83-90. doi: 10.1016/j.ygeno.2017.01.005. Epub 2017 Jan 26.
8
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation.美国国立生物技术信息中心的参考序列(RefSeq)数据库:当前状态、分类扩展及功能注释。
Nucleic Acids Res. 2016 Jan 4;44(D1):D733-45. doi: 10.1093/nar/gkv1189. Epub 2015 Nov 8.
9
Clinical Validation of Genome Reference Consortium Human Build 38 in a Laboratory Utilizing Next-Generation Sequencing Technologies.利用下一代测序技术的实验室中人类基因组参考联盟构建 38 的临床验证。
Clin Chem. 2022 Sep 1;68(9):1177-1183. doi: 10.1093/clinchem/hvac113.
10
A variant by any name: quantifying annotation discordance across tools and clinical databases.无论名称如何的变体:量化不同工具和临床数据库之间的注释不一致性。
Genome Med. 2017 Jan 26;9(1):7. doi: 10.1186/s13073-016-0396-7.

引用本文的文献

1
Single Cell RNA Sequencing: A New Frontier in Pancreatic Ductal Adenocarcinoma.单细胞RNA测序:胰腺导管腺癌研究的新前沿
Cancers (Basel). 2022 Sep 22;14(19):4589. doi: 10.3390/cancers14194589.
2
Current status and future perspectives of the evaluation of missense variants by using three-dimensional structures of proteins.利用蛋白质三维结构评估错义变体的现状与未来展望
Biophys Physicobiol. 2022 Jul 14;19:e190023. doi: 10.2142/biophysico.bppb-v19.0023. eCollection 2022.
3
Making fundamental scientific discoveries by combining information from literature, databases, and computational tools - An example.

本文引用的文献

1
Structural Insights into the Quaternary Catalytic Mechanism of Hexameric Human Quinolinate Phosphoribosyltransferase, a Key Enzyme in de novo NAD Biosynthesis.六聚体人喹啉酸磷酸核糖基转移酶(从头合成NAD的关键酶)四级催化机制的结构见解
Sci Rep. 2016 Jan 25;6:19681. doi: 10.1038/srep19681.
2
ClinVar: public archive of interpretations of clinically relevant variants.ClinVar:临床相关变异解读的公共存档库。
Nucleic Acids Res. 2016 Jan 4;44(D1):D862-8. doi: 10.1093/nar/gkv1222. Epub 2015 Nov 17.
3
Distribution of single-nucleotide variants on protein-protein interaction sites and its relationship with minor allele frequency.
通过整合来自文献、数据库和计算工具的信息来做出基础科学发现——一个例子。
Comput Struct Biotechnol J. 2021 May 14;19:3027-3033. doi: 10.1016/j.csbj.2021.04.052. eCollection 2021.
单核苷酸变异在蛋白质-蛋白质相互作用位点上的分布及其与次要等位基因频率的关系。
Protein Sci. 2016 Feb;25(2):316-21. doi: 10.1002/pro.2845. Epub 2015 Dec 9.
4
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.
5
The UK10K project identifies rare variants in health and disease.英国万人基因组计划识别健康与疾病中的罕见变异。
Nature. 2015 Oct 1;526(7571):82-90. doi: 10.1038/nature14962. Epub 2015 Sep 14.
6
Rare variant discovery by deep whole-genome sequencing of 1,070 Japanese individuals.通过对1070名日本个体进行全基因组深度测序发现罕见变异。
Nat Commun. 2015 Aug 21;6:8018. doi: 10.1038/ncomms9018.
7
Alström syndrome: current perspectives.阿尔斯特伦综合征:当前观点
Appl Clin Genet. 2015 Jul 21;8:171-9. doi: 10.2147/TACG.S56612. eCollection 2015.
8
Extending reference assembly models.扩展参考组装模型。
Genome Biol. 2015 Jan 24;16(1):13. doi: 10.1186/s13059-015-0587-3.
9
Single haplotype assembly of the human genome from a hydatidiform mole.来自葡萄胎的人类基因组单倍型组装
Genome Res. 2014 Dec;24(12):2066-76. doi: 10.1101/gr.180893.114. Epub 2014 Nov 4.
10
Discovery of common sequences absent in the human reference genome using pooled samples from next generation sequencing.利用来自下一代测序的混合样本发现人类参考基因组中缺失的常见序列。
BMC Genomics. 2014 Aug 16;15(1):685. doi: 10.1186/1471-2164-15-685.