Afxenti Sotiroula, Tomazou Marios, Tsouloupas George, Lambrianides Anastasia, Pantzaris Marios, Spyrou George M
Bioinformatics Department, The Cyprus Institute of Neurology and Genetics, 6 International Airport Avenue, 2370 Nicosia, Cyprus.
HPC Facility, The Cyprus Institute, 20 Constantinou Kavafi Street, 2121 Nicosia, Cyprus.
Comput Struct Biotechnol J. 2023 Nov 14;23:10-21. doi: 10.1016/j.csbj.2023.11.020. eCollection 2024 Dec.
A common task in scientific research is the comparison of lists or sets of diverse biological entities such as biomolecules, ontologies, sequences and expression profiles. Such comparisons rely, one way or another, on calculating a measure of similarity either by means of vector correlation metrics, set operations such as union and intersection, or specific measures to capture, for example, sequence homology. Subsequently, depending on the data type, the results are often visualized using heatmaps, Venn, Euler, or Alluvial diagrams. While most of the abovementioned representations offer simplicity and interpretability, their effectiveness holds only for a limited number of lists and specific data types. Conversely, network representations provide a more versatile approach where data lists are viewed as interconnected nodes, with edges representing pairwise commonality, correlation, or any other similarity metric. Networks can represent an arbitrary number of lists of any data type, offering a holistic perspective and most importantly, enabling analytics for characterizing and discovering novel insights in terms of centralities, clusters and motifs that can exist in such networks. While several tools that implement the translation of lists to the various commonly used diagrams, such as Venn and Euler, have been developed, a similar tool that can parse, analyze the commonalities and generate networks from an arbitrary number of lists of the same or heterogenous content does not exist.
To address this gap, we introduce , a web-based tool that can rapidly process and represent lists in a network context, either in a single-layer or multi-layer mode, facilitating network analysis on multi-source/multi-layer data. Specifically, List2Net can seamlessly handle lists encompassing a wide variety of biological data types, such as named entities or ontologies (e.g., lists containing gene symbols), sequences (e.g., protein/peptide sequences), and numeric data types (e.g., omics-based expression or abundance profiles). Once the data is imported, the tool then (i) calculates the commonalities or correlations (edges) between the lists (nodes) of interest, (ii) generates and renders the network for visualization and analysis and (iii) provides a range of exporting options, including vector, raster format visualization but also the calculated edge lists and metrics in tabular format for further analysis in other tools. is a fast, lightweight, yet informative application that provides network-based holistic insights into the conditions represented by the lists of interest (e.g., disease-to-disease, gene-to-phenotype, drug-to-disease, etc.). As a case study, we demonstrate the utility of this tool applied on publicly available datasets related to Multiple Sclerosis (MS). Using the tool, we showcase the translation of various ontologies characterizing this specific condition on disease-to-disease subnetworks of neurodegenerative, autoimmune and infectious diseases generated from various levels of information such as genetic variation, genes, proteins, metabolites and phenotypic terms.
科学研究中的一个常见任务是比较各种生物实体的列表或集合,如生物分子、本体论、序列和表达谱。此类比较以某种方式依赖于通过向量相关度量、并集和交集等集合运算或用于捕获序列同源性等的特定度量来计算相似性度量。随后,根据数据类型,结果通常使用热图、维恩图、欧拉图或冲积图进行可视化。虽然上述大多数表示方式都具有简单性和可解释性,但它们的有效性仅适用于有限数量的列表和特定数据类型。相反,网络表示提供了一种更通用的方法,其中数据列表被视为相互连接的节点,边表示成对的共性、相关性或任何其他相似性度量。网络可以表示任意数量的任何数据类型的列表,提供整体视角,最重要的是,能够针对此类网络中可能存在的中心性、聚类和基序进行表征和发现新见解的分析。虽然已经开发了几种将列表转换为各种常用图表(如维恩图和欧拉图)的工具,但不存在一种类似的工具,它可以解析、分析共性并从任意数量的相同或异构内容的列表生成网络。
为了填补这一空白,我们引入了List2Net,这是一个基于网络的工具,它可以在网络环境中以单层或多层模式快速处理和表示列表,便于对多源/多层数据进行网络分析。具体而言,List2Net可以无缝处理包含各种生物数据类型的列表,如命名实体或本体论(例如,包含基因符号的列表)、序列(例如,蛋白质/肽序列)和数值数据类型(例如,基于组学的表达或丰度谱)。一旦导入数据,该工具然后(i)计算感兴趣的列表(节点)之间的共性或相关性(边),(ii)生成并渲染网络以进行可视化和分析,以及(iii)提供一系列导出选项,包括矢量、光栅格式的可视化以及表格格式的计算出的边列表和度量,以便在其他工具中进行进一步分析。List2Net是一个快速、轻量级但信息丰富的应用程序,它为感兴趣的列表所代表的条件(例如,疾病与疾病、基因与表型、药物与疾病等)提供基于网络的整体见解。作为一个案例研究,我们展示了该工具应用于与多发性硬化症(MS)相关的公开可用数据集的效用。使用该工具,我们展示了从遗传变异、基因、蛋白质、代谢物和表型术语等不同层次的信息生成的神经退行性疾病、自身免疫性疾病和感染性疾病的疾病与疾病子网中,表征这种特定疾病的各种本体论的转换。