National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA.
Genome Res. 2022 Jan;32(1):175-188. doi: 10.1101/gr.275819.121. Epub 2021 Dec 7.
Eukaryotic genomes contain many nongenic elements that function in gene regulation, chromosome organization, recombination, repair, or replication, and mutation of those elements can affect genome function and cause disease. Although numerous epigenomic studies provide high coverage of gene regulatory regions, those data are not usually exposed in traditional genome annotation and can be difficult to access and interpret without field-specific expertise. The National Center for Biotechnology Information (NCBI) therefore provides RefSeq Functional Elements (RefSeqFEs), which represent experimentally validated human and mouse nongenic elements derived from the literature. The curated data set is comprised of richly annotated sequence records, descriptive records in the NCBI Gene database, reference genome feature annotation, and activity-based interactions between nongenic regions, target genes, and each other. The data set provides succinct functional details and transparent experimental evidence, leverages data from multiple experimental sources, is readily accessible and adaptable, and uses a flexible data model. The data have multiple uses for basic functional discovery, bioinformatics studies, genetic variant interpretation; as known positive controls for epigenomic data evaluation; and as reference standards for functional interactions. Comparisons to other gene regulatory data sets show that the RefSeqFE data set includes a wider range of feature types representing more areas of biology, but it is comparatively smaller and subject to data selection biases. RefSeqFEs thus provide an alternative and complementary resource for experimentally assayed functional elements, with future data set growth expected.
真核生物基因组包含许多非基因元件,这些元件在基因调控、染色体组织、重组、修复或复制中发挥作用,这些元件的突变会影响基因组的功能并导致疾病。尽管许多表观基因组研究提供了基因调控区域的高覆盖率,但这些数据通常不会在传统的基因组注释中公开,如果没有特定领域的专业知识,这些数据可能很难访问和解释。因此,美国国家生物技术信息中心 (NCBI) 提供了 RefSeq 功能元件 (RefSeqFE),它代表了从文献中提取的经过实验验证的人类和小鼠非基因元件。经过精心整理的数据集包含丰富注释的序列记录、NCBI Gene 数据库中的描述性记录、参考基因组特征注释以及非基因区域、靶基因和彼此之间的基于活性的相互作用。该数据集提供简洁的功能细节和透明的实验证据,利用来自多个实验来源的数据,易于访问和适应,并且使用灵活的数据模型。这些数据可用于基础功能发现、生物信息学研究、遗传变异解释;作为表观基因组数据评估的已知阳性对照;以及作为功能相互作用的参考标准。与其他基因调控数据集的比较表明,RefSeqFE 数据集包含更广泛的特征类型,代表了更多的生物学领域,但它相对较小,并且受到数据选择偏差的影响。因此,RefSeqFE 提供了经过实验验证的功能元件的替代和补充资源,预计未来数据集将增长。