Matlock Kevin, Rahman Raziur, Ghosh Souparno, Pal Ranadip
Department of Electrical and Computer Engineering.
Department of Mathematics and Statistics, Texas Tech University, Lubbock, TX, USA.
Bioinformatics. 2019 Sep 1;35(17):3143-3145. doi: 10.1093/bioinformatics/btz010.
Biological processes are characterized by a variety of different genomic feature sets. However, often times when building models, portions of these features are missing for a subset of the dataset. We provide a modeling framework to effectively integrate this type of heterogeneous data to improve prediction accuracy. To test our methodology, we have stacked data from the Cancer Cell Line Encyclopedia to increase the accuracy of drug sensitivity prediction. The package addresses the dynamic regime of information integration involving sequential addition of features and samples.
The framework has been implemented as a R package Sstack, which can be downloaded from https://cran.r-project.org/web/packages/Sstack/index.html, where further explanation of the package is available.
Supplementary data are available at Bioinformatics online.
生物过程由多种不同的基因组特征集所表征。然而,在构建模型时,数据集中的一部分样本常常会缺失这些特征的某些部分。我们提供了一个建模框架,以有效地整合这类异构数据,从而提高预测准确性。为了测试我们的方法,我们堆叠了来自癌症细胞系百科全书的数据,以提高药物敏感性预测的准确性。该软件包解决了信息整合的动态机制,包括特征和样本的顺序添加。
该框架已作为R软件包Sstack实现,可从https://cran.r-project.org/web/packages/Sstack/index.html下载,在该网站上可获得该软件包的进一步说明。
补充数据可在《生物信息学》在线获取。