Bioinformatics Institute (BII), Agency for Science and Technology (A*STAR), Matrix, 138671, Singapore.
Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark.
Proteomics. 2018 Nov;18(21-22):e1800093. doi: 10.1002/pmic.201800093. Epub 2018 Oct 30.
The mentioning of gene names in the body of the scientific literature 1901-2017 and their fractional counting is used as a proxy to assess the level of biological function discovery. A literature score of one has been defined as full publication equivalent (FPE), the amount of literature necessary to achieve one publication solely dedicated to a gene. It has been found that less than 5000 human genes have each at least 100 FPEs in the available literature corpus. This group of elite genes (4817 protein-coding genes, 119 non-coding RNAs) attracts the overwhelming majority of the scientific literature about genes. Yet, thousands of proteins have never been mentioned at all, ≈2000 further proteins have not even one FPE of literature and, for ≈4600 additional proteins, the FPE count is below 10. The protein function discovery rate measured as numbers of proteins first mentioned or crossing a threshold of accumulated FPEs in a given year has grown until 2000 but is in decline thereafter. This drop is partially offset by function discoveries for non-coding RNAs. The full human genome sequencing does not boost the function discovery rate. Since 2000, the fastest growing group in the literature is that with at least 500 FPEs per gene.
在 1901 年至 2017 年的科学文献正文中提到的基因名称及其分数计数被用作评估生物功能发现水平的指标。文献评分达到 1 被定义为全出版物等效(FPE),即实现仅针对一个基因的单一出版物所需的文献量。研究发现,在可用文献库中,不到 5000 个人类基因每个基因都有至少 100 篇 FPE。这组精英基因(4817 个编码蛋白的基因,119 个非编码 RNA)吸引了绝大多数关于基因的科学文献。然而,仍有数千种蛋白质从未被提及过,还有约 2000 种蛋白质甚至没有一篇 FPE 的文献,对于约 4600 种额外的蛋白质,其 FPE 计数低于 10。以给定年份首次提到或累积 FPE 超过阈值的蛋白质数量来衡量,蛋白质功能发现率在 2000 年之前一直在增长,但此后呈下降趋势。这一下降部分被非编码 RNA 的功能发现所抵消。完整的人类基因组测序并不能提高功能发现率。自 2000 年以来,文献中增长最快的是每个基因至少有 500 篇 FPE 的基因。