University of Oxford, Department of Zoology, Oxford, UK.
University College London, Division of infection and immunity, London, UK.
Sci Rep. 2019 Mar 11;9(1):4049. doi: 10.1038/s41598-019-40346-7.
Streptococcus pneumoniae, a normal commensal of the upper respiratory tract, is a major public health concern, responsible for substantial global morbidity and mortality due to pneumonia, meningitis and sepsis. Why some pneumococci invade the bloodstream or CSF (so-called invasive pneumococcal disease; IPD) is uncertain. In this study we identify genes associated with IPD. We transform whole genome sequence (WGS) data into a sequence typing scheme, while avoiding the caveat of using an arbitrary genome as a reference by substituting it with a constructed pangenome. We then employ a random forest machine-learning algorithm on the transformed data, and find 43 genes consistently associated with IPD across three geographically distinct WGS data sets of pneumococcal carriage isolates. Of the genes we identified as associated with IPD, we find 23 genes previously shown to be directly relevant to IPD, as well as 18 uncharacterized genes. We suggest that these uncharacterized genes identified by us are also likely to be relevant for IPD.
肺炎链球菌是上呼吸道的正常共生菌,是一个主要的公共卫生关注点,可导致肺炎、脑膜炎和败血症,造成巨大的全球发病率和死亡率。为什么有些肺炎球菌会侵入血液或脑脊液(所谓的侵袭性肺炎球菌病;IPD)还不确定。在这项研究中,我们确定了与 IPD 相关的基因。我们将全基因组序列(WGS)数据转化为一种序列分型方案,同时通过用构建的泛基因组替代任意基因组作为参考来避免使用任意基因组作为参考的陷阱。然后,我们在转换后的数据上使用随机森林机器学习算法,在三个地理位置不同的肺炎球菌携带分离株的 WGS 数据集之间找到了 43 个与 IPD 一致相关的基因。在我们确定与 IPD 相关的基因中,我们发现了 23 个先前被证明与 IPD 直接相关的基因,以及 18 个未被描述的基因。我们认为,我们鉴定的这些未被描述的基因也可能与 IPD 相关。