Huang Hongtai, Fava Andrea, Guhr Tara, Cimbro Raffaello, Rosen Antony, Boin Francesco, Ellis Hugh
Department of Geography and Environmental Engineering, GWC Whiting School of Engineering, The Johns Hopkins University, Baltimore, MD, USA.
Division of Rheumatology, Department of Medicine, Johns Hopkins School of Medicine, Baltimore, MD, USA.
BMC Bioinformatics. 2015 Sep 15;16:293. doi: 10.1186/s12859-015-0722-x.
This work seeks to develop a methodology for identifying reliable biomarkers of disease activity, progression and outcome through the identification of significant associations between high-throughput flow cytometry (FC) data and interstitial lung disease (ILD) - a systemic sclerosis (SSc, or scleroderma) clinical phenotype which is the leading cause of morbidity and mortality in SSc. A specific aim of the work involves developing a clinically useful screening tool that could yield accurate assessments of disease state such as the risk or presence of SSc-ILD, the activity of lung involvement and the likelihood to respond to therapeutic intervention. Ultimately this instrument could facilitate a refined stratification of SSc patients into clinically relevant subsets at the time of diagnosis and subsequently during the course of the disease and thus help in preventing bad outcomes from disease progression or unnecessary treatment side effects. The methods utilized in the work involve: (1) clinical and peripheral blood flow cytometry data (Immune Response In Scleroderma, IRIS) from consented patients followed at the Johns Hopkins Scleroderma Center. (2) machine learning (Conditional Random Forests - CRF) coupled with Gene Set Enrichment Analysis (GSEA) to identify subsets of FC variables that are highly effective in classifying ILD patients; and (3) stochastic simulation to design, train and validate ILD risk screening tools.
Our hybrid analysis approach (CRF-GSEA) proved successful in predicting SSc patient ILD status with a high degree of success (>82% correct classification in validation; 79 patients in the training data set, 40 patients in the validation data set).
IRIS flow cytometry data provides useful information in assessing the ILD status of SSc patients. Our new approach combining Conditional Random Forests and Gene Set Enrichment Analysis was successful in identifying a subset of flow cytometry variables to create a screening tool that proved effective in correctly identifying ILD patients in the training and validation data sets. From a somewhat broader perspective, the identification of subsets of flow cytometry variables that exhibit coordinated movement (i.e., multi-variable up or down regulation) may lead to insights into possible effector pathways and thereby improve the state of knowledge of systemic sclerosis pathogenesis.
本研究旨在开发一种方法,通过识别高通量流式细胞术(FC)数据与间质性肺病(ILD)之间的显著关联,来确定疾病活动、进展和预后的可靠生物标志物。ILD是系统性硬化症(SSc,或硬皮病)的一种临床表型,是SSc发病和死亡的主要原因。这项工作的一个具体目标是开发一种临床有用的筛查工具,该工具可以准确评估疾病状态,如SSc-ILD的风险或存在情况、肺部受累的活动情况以及对治疗干预的反应可能性。最终,该工具可以在诊断时以及疾病过程中促进将SSc患者精细分层为临床相关亚组,从而有助于预防疾病进展导致的不良后果或不必要的治疗副作用。该研究中使用的方法包括:(1)来自约翰霍普金斯硬皮病中心随访的同意参与研究的患者的临床和外周血流式细胞术数据(硬皮病免疫反应,IRIS)。(2)机器学习(条件随机森林 - CRF)与基因集富集分析(GSEA)相结合,以识别在分类ILD患者方面高度有效的FC变量子集;以及(3)随机模拟,以设计、训练和验证ILD风险筛查工具。
我们的混合分析方法(CRF-GSEA)在预测SSc患者的ILD状态方面取得了高度成功(验证中的正确分类率>82%;训练数据集中有79名患者,验证数据集中有40名患者)。
IRIS流式细胞术数据在评估SSc患者的ILD状态方面提供了有用信息。我们将条件随机森林和基因集富集分析相结合的新方法成功地识别出了流式细胞术变量的一个子集,以创建一种筛查工具,该工具在训练和验证数据集中被证明能有效地正确识别ILD患者。从更广泛的角度来看,识别出表现出协同变化(即多变量上调或下调)的流式细胞术变量子集,可能会深入了解可能的效应途径,从而改善系统性硬化症发病机制的知识水平。