Wang Pei
School of Mathematics and Statistics, Institute of Applied Mathematics, Laboratory of Data Analysis Technology, Henan University, Kaifeng, 475004 China.
J Syst Sci Complex. 2021 Jan 12:1-17. doi: 10.1007/s11424-021-0001-2.
Biological systems can be modeled and described by biological networks. Biological networks are typical complex networks with widely real-world applications. Many problems arising in biological systems can be boiled down to the identification of important nodes. For example, biomedical researchers frequently need to identify important genes that potentially leaded to disease phenotypes in animal and explore crucial genes that were responsible for stress responsiveness in plants. To facilitate the identification of important nodes in biological systems, one needs to know network structures or behavioral data of nodes (such as gene expression data). If network topology was known, various centrality measures can be developed to solve the problem; while if only behavioral data of nodes were given, some sophisticated statistical methods can be employed. This paper reviewed some of the recent works on statistical identification of important nodes in biological systems from three aspects, that is, 1) in general complex networks based on complex networks theory and epidemic dynamic models; 2) in biological networks based on network motifs; and 3) in plants based on RNA-seq data. The identification of important nodes in a complex system can be seen as a mapping from the system to the ranking score vector of nodes, such mapping is not necessarily with explicit form. The three aspects reflected three typical approaches on ranking nodes in biological systems and can be integrated into one general framework. This paper also proposed some challenges and future works on the related topics. The associated investigations have potential real-world applications in the control of biological systems, network medicine and new variety cultivation of crops.
生物系统可以用生物网络进行建模和描述。生物网络是具有广泛实际应用的典型复杂网络。生物系统中出现的许多问题都可以归结为重要节点的识别。例如,生物医学研究人员经常需要识别可能导致动物疾病表型的重要基因,并探索植物中负责应激反应的关键基因。为了便于识别生物系统中的重要节点,人们需要了解网络结构或节点的行为数据(如基因表达数据)。如果已知网络拓扑结构,可以开发各种中心性度量来解决该问题;而如果只给出节点的行为数据,则可以采用一些复杂的统计方法。本文从三个方面综述了生物系统中重要节点统计识别的一些近期工作,即:1)基于复杂网络理论和流行病动力学模型的一般复杂网络;2)基于网络基序的生物网络;3)基于RNA测序数据的植物。复杂系统中重要节点的识别可以看作是从系统到节点排名得分向量的映射,这种映射不一定具有显式形式。这三个方面反映了生物系统中节点排名的三种典型方法,并且可以整合到一个通用框架中。本文还提出了相关主题的一些挑战和未来工作。相关研究在生物系统控制、网络医学和作物新品种培育方面具有潜在的实际应用。