Ford Colby T, Janies Daniel
Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, North Carolina, 28223, USA.
School of Data Science, University of North Carolina at Charlotte, Charlotte, North Carolina, 28223, USA.
F1000Res. 2020 Jan 29;9:62. doi: 10.12688/f1000research.21539.5. eCollection 2020.
Resistance in malaria is a growing concern affecting many areas of Sub-Saharan Africa and Southeast Asia. Since the emergence of artemisinin resistance in the late 2000s in Cambodia, research into the underlying mechanisms has been underway. The 2019 Malaria Challenge posited the task of developing computational models that address important problems in advancing the fight against malaria. The first goal was to accurately predict artemisinin drug resistance levels of isolates, as quantified by the IC . The second goal was to predict the parasite clearance rate of malaria parasite isolates based on transcriptional profiles. In this work, we develop machine learning models using novel methods for transforming isolate data and handling the tens of thousands of variables that result from these data transformation exercises. This is demonstrated by using massively parallel processing of the data vectorization for use in scalable machine learning. In addition, we show the utility of ensemble machine learning modeling for highly effective predictions of both goals of this challenge. This is demonstrated by the use of multiple machine learning algorithms combined with various scaling and normalization preprocessing steps. Then, using a voting ensemble, multiple models are combined to generate a final model prediction.
疟疾耐药性问题日益严重,影响着撒哈拉以南非洲和东南亚的许多地区。自21世纪末柬埔寨出现青蒿素耐药性以来,对其潜在机制的研究一直在进行。2019年疟疾挑战提出了开发计算模型的任务,以解决在推进疟疾防治工作中遇到的重要问题。第一个目标是准确预测分离株的青蒿素耐药水平,以IC 进行量化。第二个目标是根据转录谱预测疟原虫分离株的寄生虫清除率。在这项工作中,我们使用新颖的方法开发机器学习模型,用于转换分离株数据并处理这些数据转换操作产生的数万个变量。这通过对数据矢量化进行大规模并行处理以用于可扩展机器学习来证明。此外,我们展示了集成机器学习建模对于高效预测这一挑战的两个目标的效用。这通过使用多种机器学习算法结合各种缩放和归一化预处理步骤来证明。然后,使用投票集成,将多个模型组合以生成最终的模型预测。