Sun Jiangwen, Bi Jinbo, Kranzler Henry R
Department of Computer Science and Engineering, University of Connecticut, 371 Fairfield Way, Storrs, CT 06269, USA.
BMC Genet. 2014 Jun 17;15:73. doi: 10.1186/1471-2156-15-73.
Accurate classification of patients with a complex disease into subtypes has important implications for medicine and healthcare. Using more homogeneous disease subtypes in genetic association analysis will facilitate the detection of new genetic variants that are not detectible using the non-differentiated disease phenotype. Subtype differentiation can also improve diagnostic classification, which can in turn inform clinical decision making and treatment matching. Currently, the most sophisticated methods for disease subtyping perform cluster analysis using patients' clinical features. Without guidance from genetic information, the resultant subtypes are likely to be suboptimal and efforts at genetic association may fail.
We propose a multi-view matrix decomposition approach that integrates clinical features with genetic markers to detect confirmatory evidence for a disease subtype. This approach groups patients into clusters that are consistent between the clinical and genetic dimensions of data; it simultaneously identifies the clinical features that define the subtype and the genotypes associated with the subtype. A simulation study validated the proposed approach, showing that it identified hypothesized subtypes and associated features. In comparison to the latest biclustering and multi-view data analytics using real-life disease data, the proposed approach identified clinical subtypes of a disease that differed from each other more significantly in the genetic markers, thus demonstrating the superior performance of the proposed approach.
The proposed algorithm is an effective and superior alternative to the disease subtyping methods employed to date. Integration of phenotypic features with genetic markers in the subtyping analysis is a promising approach to identify concurrently disease subtypes and their genetic associations.
将患有复杂疾病的患者准确分类为不同亚型对医学和医疗保健具有重要意义。在基因关联分析中使用更具同质性的疾病亚型将有助于检测使用未分化疾病表型无法检测到的新基因变异。亚型分化还可以改善诊断分类,进而为临床决策和治疗匹配提供依据。目前,最复杂的疾病亚型分类方法是利用患者的临床特征进行聚类分析。在没有基因信息指导的情况下,所得到的亚型可能不是最优的,基因关联分析的努力可能会失败。
我们提出了一种多视图矩阵分解方法,该方法将临床特征与基因标记相结合,以检测疾病亚型的确证证据。这种方法将患者分组为在数据的临床和基因维度上一致的聚类;它同时识别定义亚型的临床特征和与该亚型相关的基因型。一项模拟研究验证了所提出的方法,表明它能够识别假设的亚型和相关特征。与使用实际疾病数据的最新双聚类和多视图数据分析相比,所提出的方法识别出的疾病临床亚型在基因标记上彼此差异更大,从而证明了所提出方法的优越性能。
所提出的算法是迄今为止用于疾病亚型分类方法的一种有效且优越的替代方法。在亚型分析中将表型特征与基因标记相结合是一种同时识别疾病亚型及其基因关联的有前途的方法。