Scientific Product Assessment Center, R&D Group, Japan Tobacco Inc., 6-2 Umegaoka, Aoba-ku, Yokohama, Kanagawa, 227-8512, Japan.
BMC Pulm Med. 2020 Feb 3;20(1):29. doi: 10.1186/s12890-020-1062-9.
Chronic obstructive pulmonary disease (COPD) is combination of progressive lung diseases. The diagnosis of COPD is generally based on the pulmonary function testing, however, difficulties underlie in prognosis of smokers or early stage of COPD patients due to the complexity and heterogeneity of the pathogenesis. Computational analyses of omics technologies are expected as one of the solutions to resolve such complexities.
We obtained transcriptomic data by in vitro testing with exposures of human bronchial epithelial cells to the inducers for early events of COPD to identify the potential descriptive marker genes. With the identified genes, the machine learning technique was employed with the publicly available transcriptome data obtained from the lung specimens of COPD and non-COPD patients to develop the model that can reflect the risk continuum across smoking and COPD.
The expression levels of 15 genes were commonly altered among in vitro tissues exposed to known inducible factors for earlier events of COPD (exposure to cigarette smoke, DNA damage, oxidative stress, and inflammation), and 10 of these genes and their corresponding proteins have not previously reported as COPD biomarkers. Although these genes were able to predict each group with 65% accuracy, the accuracy with which they were able to discriminate COPD subjects from smokers was only 29%. Furthermore, logistic regression enabled the conversion of gene expression levels to a numerical index, which we named the "potential risk factor (PRF)" index. The highest significant index value was recorded in COPD subjects (0.56 at the median), followed by smokers (0.30) and non-smokers (0.02). In vitro tissues exposed to cigarette smoke displayed dose-dependent increases of PRF, suggesting its utility for prospective risk estimation of tobacco products.
Our experimental-based transcriptomic analysis identified novel genes associated with COPD, and the 15 genes could distinguish smokers and COPD subjects from non-smokers via machine-learning classification with remarkable accuracy. We also suggested a PRF index that can quantitatively reflect the risk continuum across smoking and COPD pathogenesis, and we believe it will provide an improved understanding of smoking effects and new insights into COPD.
慢性阻塞性肺疾病(COPD)是一系列进行性肺部疾病的组合。COPD 的诊断通常基于肺功能测试,但是,由于发病机制的复杂性和异质性,对于吸烟者或 COPD 早期患者的预后存在困难。对组学技术的计算分析有望成为解决这些复杂性问题的方法之一。
我们通过体外测试获得了人类支气管上皮细胞暴露于 COPD 早期事件诱导物的转录组数据,以确定潜在的描述性标记基因。利用鉴定出的基因,我们使用机器学习技术和从 COPD 和非 COPD 患者肺部标本中获得的公开转录组数据,开发了一种能够反映吸烟和 COPD 风险连续体的模型。
在体外组织中,有 15 个基因的表达水平在暴露于已知的 COPD 早期事件诱导物(吸烟、DNA 损伤、氧化应激和炎症)时普遍发生改变,其中 10 个基因及其相应的蛋白质以前没有被报道为 COPD 生物标志物。尽管这些基因能够以 65%的准确率预测每个组,但它们能够区分 COPD 患者和吸烟者的准确率仅为 29%。此外,逻辑回归能够将基因表达水平转换为数值指数,我们将其命名为“潜在风险因素(PRF)”指数。在 COPD 患者中记录到最高的显著指数值(中位数为 0.56),其次是吸烟者(0.30)和非吸烟者(0.02)。体外组织暴露于香烟烟雾时,PRF 呈剂量依赖性增加,表明其可用于烟草产品的前瞻性风险估计。
我们基于实验的转录组分析确定了与 COPD 相关的新基因,并且这 15 个基因可以通过机器学习分类以显著的准确率区分吸烟者和 COPD 患者。我们还提出了一个 PRF 指数,可以定量反映吸烟和 COPD 发病机制之间的风险连续体,我们相信它将提供对吸烟影响的更深入理解和对 COPD 的新见解。