基因组邻近区域中多样化基因功能的模式可预测基因功能和表型。

Patterns of diverse gene functions in genomic neighborhoods predict gene function and phenotype.

机构信息

Ruđer Bošković Institute, Bijenička cesta 54, Zagreb, Croatia.

Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute of Science and Technology, C/Baldiri Reixac 10, 08028, Barcelona, Spain.

出版信息

Sci Rep. 2019 Dec 20;9(1):19537. doi: 10.1038/s41598-019-55984-0.

Abstract

Genes with similar roles in the cell cluster on chromosomes, thus benefiting from coordinated regulation. This allows gene function to be inferred by transferring annotations from genomic neighbors, following the guilt-by-association principle. We performed a systematic search for co-occurrence of >1000 gene functions in genomic neighborhoods across 1669 prokaryotic, 49 fungal and 80 metazoan genomes, revealing prevalent patterns that cannot be explained by clustering of functionally similar genes. It is a very common occurrence that pairs of dissimilar gene functions - corresponding to semantically distant Gene Ontology terms - are significantly co-located on chromosomes. These neighborhood associations are often as conserved across genomes as the known associations between similar functions, suggesting selective benefits from clustering of certain diverse functions, which may conceivably play complementary roles in the cell. We propose a simple encoding of chromosomal gene order, the neighborhood function profiles (NFP), which draws on diverse gene clustering patterns to predict gene function and phenotype. NFPs yield a 26-46% increase in predictive power over state-of-the-art approaches that propagate function across neighborhoods, thus providing hundreds of novel, high-confidence gene function inferences per genome. Furthermore, we demonstrate that copy number-neutral structural variation that shapes gene function distribution across chromosomes can predict phenotype of individuals from their genome sequence.

摘要

在染色体上具有相似作用的细胞簇中的基因,因此受益于协调的调控。这使得基因功能可以通过从基因组邻居转移注释来推断,遵循“关联即有罪”的原则。我们在 1669 个原核生物、49 个真菌和 80 个后生动物基因组中对>1000 个基因功能的共现进行了系统搜索,揭示了普遍存在的模式,这些模式不能用功能相似基因的聚类来解释。两对不相似的基因功能——对应于语义上遥远的基因本体术语——在染色体上显著共定位的情况非常常见。这些邻域关联在基因组之间的保守程度与已知相似功能之间的关联一样,这表明从某些不同功能的聚类中选择了益处,这些功能可能在细胞中发挥互补作用。我们提出了一种简单的染色体基因顺序编码方法,即邻域功能谱(NFP),它利用不同的基因聚类模式来预测基因功能和表型。与跨邻域传播功能的最先进方法相比,NFPs 可提高 26-46%的预测能力,从而为每个基因组提供数百个新的、高可信度的基因功能推断。此外,我们证明了塑造基因功能在染色体上分布的拷贝数中性结构变异可以预测个体的表型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f68b/6925100/25624130ab8a/41598_2019_55984_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索