Institut Curie, PSL Research University, F-75005 Paris, France.
INSERM, U900, F-75005 Paris, France.
PLoS Comput Biol. 2020 Feb 18;16(2):e1007652. doi: 10.1371/journal.pcbi.1007652. eCollection 2020 Feb.
English Wikipedia, containing more than five millions articles, has approximately eleven thousands web pages devoted to proteins or genes most of which were generated by the Gene Wiki project. These pages contain information about interactions between proteins and their functional relationships. At the same time, they are interconnected with other Wikipedia pages describing biological functions, diseases, drugs and other topics curated by independent, not coordinated collective efforts. Therefore, Wikipedia contains a directed network of protein functional relations or physical interactions embedded into the global network of the encyclopedia terms, which defines hidden (indirect) functional proximity between proteins. We applied the recently developed reduced Google Matrix (REGOMAX) algorithm in order to extract the network of hidden functional connections between proteins in Wikipedia. In this network we discovered tight communities which reflect areas of interest in molecular biology or medicine and can be considered as definitions of biological functions shaped by collective intelligence. Moreover, by comparing two snapshots of Wikipedia graph (from years 2013 and 2017), we studied the evolution of the network of direct and hidden protein connections. We concluded that the hidden connections are more dynamic compared to the direct ones and that the size of the hidden interaction communities grows with time. We recapitulate the results of Wikipedia protein community analysis and annotation in the form of an interactive online map, which can serve as a portal to the Gene Wiki project.
英文维基百科拥有超过 500 万篇文章,其中约有 11000 个网页专门介绍蛋白质或基因,这些网页大部分由基因维基项目生成。这些网页包含了关于蛋白质之间相互作用及其功能关系的信息。同时,它们与其他维基百科页面相互连接,这些页面描述了生物功能、疾病、药物和其他由独立、非协调的集体努力策展的主题。因此,维基百科包含了一个嵌入到百科全书术语的全球网络中的蛋白质功能关系或物理相互作用的有向网络,该网络定义了蛋白质之间隐藏(间接)功能的接近度。我们应用了最近开发的简化 Google 矩阵(REGOMAX)算法,以提取维基百科中蛋白质之间隐藏功能连接的网络。在这个网络中,我们发现了紧密的社区,它们反映了分子生物学或医学领域的兴趣领域,可以被视为由集体智慧塑造的生物功能定义。此外,通过比较维基百科图的两个快照(来自 2013 年和 2017 年),我们研究了直接和隐藏蛋白质连接的网络的演变。我们得出结论,隐藏连接比直接连接更具动态性,并且隐藏交互社区的大小随着时间的推移而增长。我们以互动在线地图的形式总结了维基百科蛋白质社区分析和注释的结果,该地图可以作为基因维基项目的门户。