Department of Controls and Computer Engineering, Politecnico di Torino, 10129 Torino, Italy ; Consorzio Interuniversitario Nazionale per l'Informatica, 11029 Verres, Italy.
Biomed Res Int. 2013;2013:676328. doi: 10.1155/2013/676328. Epub 2013 Oct 7.
Undirected gene coexpression networks obtained from experimental expression data coupled with efficient computational procedures are increasingly used to identify potentially relevant biological information (e.g., biomarkers) for a particular disease. However, coexpression networks built from experimental expression data are in general large highly connected networks with an elevated number of false-positive interactions (nodes and edges). In order to infer relevant information, the network must be properly filtered and its complexity reduced. Given the complexity and the multivariate nature of the information contained in the network, this requires the development and application of efficient feature selection algorithms to be able to exploit the topological characteristics of the network to identify relevant nodes and edges. This paper proposes an efficient multivariate filtering designed to analyze the topological properties of a coexpression network in order to identify potential relevant genes for a given disease. The algorithm has been tested on three datasets for three well known and studied diseases: acute myeloid leukemia, breast cancer, and diffuse large B-cell lymphoma. Results have been validated resorting to bibliographic data automatically mined using the ProteinQuest literature mining tool.
无定向基因共表达网络是从实验表达数据中获取的,并结合有效的计算程序,用于识别特定疾病的潜在相关生物学信息(例如生物标志物)。然而,从实验表达数据构建的共表达网络通常是大型的高度连接的网络,具有较高数量的假阳性相互作用(节点和边缘)。为了推断相关信息,必须对网络进行适当的过滤并降低其复杂性。鉴于网络中包含的信息的复杂性和多变量性质,这需要开发和应用有效的特征选择算法,以便能够利用网络的拓扑特征来识别相关节点和边缘。本文提出了一种有效的多变量过滤方法,用于分析共表达网络的拓扑性质,以识别给定疾病的潜在相关基因。该算法已经在三个数据集上针对三种已知和研究的疾病进行了测试:急性髓性白血病、乳腺癌和弥漫性大 B 细胞淋巴瘤。结果通过使用 ProteinQuest 文献挖掘工具自动挖掘的文献数据进行了验证。