Zhang Dabao, Zhang Min
Department of Statistics, Purdue University, West Lafayette, Indiana 47907-2067, USA.
Theor Biol Med Model. 2007 Jan 19;4:3. doi: 10.1186/1742-4682-4-3.
It is of particular interest to identify cancer-specific molecular signatures for early diagnosis, monitoring effects of treatment and predicting patient survival time. Molecular information about patients is usually generated from high throughput technologies such as microarray and mass spectrometry. Statistically, we are challenged by the large number of candidates but only a small number of patients in the study, and the right-censored clinical data further complicate the analysis.
We present a two-stage procedure to profile molecular signatures for survival outcomes. Firstly, we group closely-related molecular features into linkage clusters, each portraying either similar or opposite functions and playing similar roles in prognosis; secondly, a Bayesian approach is developed to rank the centroids of these linkage clusters and provide a list of the main molecular features closely related to the outcome of interest. A simulation study showed the superior performance of our approach. When it was applied to data on diffuse large B-cell lymphoma (DLBCL), we were able to identify some new candidate signatures for disease prognosis.
This multivariate approach provides researchers with a more reliable list of molecular features profiled in terms of their prognostic relationship to the event times, and generates dependable information for subsequent identification of prognostic molecular signatures through either biological procedures or further data analysis.
识别癌症特异性分子特征以用于早期诊断、监测治疗效果和预测患者生存时间尤为重要。患者的分子信息通常来自微阵列和质谱等高通量技术。从统计学角度来看,研究中候选对象众多但患者数量较少,且截尾临床数据使分析进一步复杂化。
我们提出了一种两阶段程序来描绘生存结局的分子特征。首先,我们将密切相关的分子特征分组为连锁簇,每个连锁簇描绘相似或相反的功能,并在预后中发挥相似作用;其次,开发了一种贝叶斯方法对这些连锁簇的质心进行排名,并提供与感兴趣结局密切相关的主要分子特征列表。模拟研究表明了我们方法的优越性能。当将其应用于弥漫性大B细胞淋巴瘤(DLBCL)数据时,我们能够识别出一些疾病预后的新候选特征。
这种多变量方法为研究人员提供了一份根据与事件时间的预后关系描绘的更可靠分子特征列表,并通过生物学程序或进一步数据分析为后续识别预后分子特征生成可靠信息。