Roche Pharmaceutical Research and Early Development, Pharmaceutical Sciences, Roche Innovation Center Basel, F. Hoffmann-La Roche Ltd., Grenzacherstr. 124, 4070, Basel, Switzerland.
Group Systems Biology of Motor Proteins, Department of NMR-based Structural Biology, Max-Planck-Institute for Biophysical Chemistry, Am Fassberg 11, 37077, Göttingen, Germany.
Bioessays. 2019 Nov;41(11):e1900066. doi: 10.1002/bies.201900066. Epub 2019 Sep 23.
The major transcript variants of human protein-coding genes are annotated to a certain degree of accuracy combining manual curation, transcript data, and proteomics evidence. However, there is considerable disagreement on the annotation of about 2000 genes-they can be protein-coding, noncoding, or pseudogenes-and on the annotation of most of the predicted alternative transcripts. Pure transcriptome mapping approaches seem to be limited in discriminating functional expression from noise. These limitations have partially been overcome by dedicated algorithms to detect alternative spliced micro-exons and wobble splice variants. Recently, knowledge about splice mechanism and protein structure are incorporated into an algorithm to predict neighboring homologous exons, often spliced in a mutually exclusive manner. Predicted exons are evaluated by transcript data, structural compatibility, and evolutionary conservation, revealing hundreds of novel coding exons and splice mechanism re-assignments. The emerging human pan-genome is necessitating distinctive annotations incorporating differences between individuals and between populations.
人类蛋白编码基因的主要转录变体是通过结合人工注释、转录组数据和蛋白质组学证据,在一定程度上进行注释的。然而,大约有 2000 个基因的注释存在相当大的分歧——它们可以是蛋白编码、非编码或假基因,并且大多数预测的选择性转录本的注释也存在分歧。单纯的转录组映射方法似乎在区分功能表达和噪声方面存在局限性。通过专门的算法来检测选择性剪接的微外显子和摆动剪接变体,部分克服了这些局限性。最近,关于剪接机制和蛋白质结构的知识被纳入到一个算法中,以预测通常以相互排斥的方式剪接的相邻同源外显子。预测的外显子通过转录组数据、结构相容性和进化保守性进行评估,揭示了数百个新的编码外显子和剪接机制的重新分配。新兴的人类泛基因组需要进行独特的注释,包括个体之间和群体之间的差异。