Kargarfard Fatemeh, Sami Ashkan, Mohammadi-Dehcheshmeh Manijeh, Ebrahimie Esmaeil
Department of Computer Science and Engineering, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran.
School of Animal and Veterinary Sciences, The University of Adelaide, Adelaide, Australia.
BMC Genomics. 2016 Nov 16;17(1):925. doi: 10.1186/s12864-016-3250-9.
Recent (2013 and 2009) zoonotic transmission of avian or porcine influenza to humans highlights an increase in host range by evading species barriers. Gene reassortment or antigenic shift between viruses from two or more hosts can generate a new life-threatening virus when the new shuffled virus is no longer recognized by antibodies existing within human populations. There is no large scale study to help understand the underlying mechanisms of host transmission. Furthermore, there is no clear understanding of how different segments of the influenza genome contribute in the final determination of host range.
To obtain insight into the rules underpinning host range determination, various supervised machine learning algorithms were employed to mine reassortment changes in different viral segments in a range of hosts. Our multi-host dataset contained whole segments of 674 influenza strains organized into three host categories: avian, human, and swine. Some of the sequences were assigned to multiple hosts. In point of fact, the datasets are a form of multi-labeled dataset and we utilized a multi-label learning method to identify discriminative sequence sites. Then algorithms such as CBA, Ripper, and decision tree were applied to extract informative and descriptive association rules for each viral protein segment.
We found informative rules in all segments that are common within the same host class but varied between different hosts. For example, for infection of an avian host, HA14V and NS1230S were the most important discriminative and combinatorial positions.
Host range identification is facilitated by high support combined rules in this study. Our major goal was to detect discriminative genomic positions that were able to identify multi host viruses, because such viruses are likely to cause pandemic or disastrous epidemics.
近期(2013年和2009年)禽流感或猪流感向人类的人畜共患传播凸显了病毒通过跨越物种屏障而扩大宿主范围的现象。当来自两个或更多宿主的病毒之间发生基因重配或抗原转变,产生的新的重配病毒不再被人群中现有的抗体识别时,就可能产生一种新的危及生命的病毒。目前尚无大规模研究来帮助理解宿主传播的潜在机制。此外,对于流感基因组的不同片段如何最终决定宿主范围,也没有清晰的认识。
为深入了解宿主范围确定的规则,我们采用了各种监督机器学习算法来挖掘一系列宿主中不同病毒片段的重配变化。我们的多宿主数据集包含674株流感病毒的完整片段,分为三类宿主:禽类、人类和猪。一些序列被分配到多个宿主。实际上,这些数据集是一种多标签数据集,我们利用多标签学习方法来识别具有判别力的序列位点。然后应用诸如CBA、Ripper和决策树等算法,为每个病毒蛋白片段提取信息丰富且具有描述性的关联规则。
我们在所有片段中都发现了信息性规则,这些规则在同一宿主类别中是常见的,但在不同宿主之间有所不同。例如,对于禽类宿主感染,HA14V和NS1230S是最重要的判别性和组合性位点。
本研究中高支持度的组合规则有助于识别宿主范围。我们的主要目标是检测能够识别多宿主病毒的具有判别力的基因组位置,因为这类病毒很可能引发大流行或灾难性疫情。