Webber James W, Elias Kevin M
Department of Oncology and Gynecology, Brigham and Women's Hospital, Boston, MA, USA.
BMC Bioinformatics. 2022 Apr 22;23(1):145. doi: 10.1186/s12859-022-04656-4.
High dimensional transcriptome profiling, whether through next generation sequencing techniques or high-throughput arrays, may result in scattered variables with missing data. Data imputation is a common strategy to maximize the inclusion of samples by using statistical techniques to fill in missing values. However, many data imputation methods are cumbersome and risk introduction of systematic bias.
We present a new data imputation method using constrained least squares and algorithms from the inverse problems literature and present applications for this technique in miRNA expression analysis. The proposed technique is shown to offer an imputation orders of magnitude faster, with greater than or equal accuracy when compared to similar methods from the literature.
This study offers a robust and efficient algorithm for data imputation, which can be used, e.g., to improve cancer prediction accuracy in the presence of missing data.
高维转录组分析,无论是通过下一代测序技术还是高通量阵列,都可能产生带有缺失数据的分散变量。数据插补是一种常见策略,通过使用统计技术填充缺失值来最大化样本的纳入。然而,许多数据插补方法很繁琐,并且有引入系统偏差的风险。
我们提出了一种使用约束最小二乘法和反问题文献中的算法的新数据插补方法,并展示了该技术在miRNA表达分析中的应用。与文献中的类似方法相比,所提出的技术显示出插补速度快几个数量级,且准确性相同或更高。
本研究提供了一种用于数据插补的强大而有效的算法,例如可用于在存在缺失数据的情况下提高癌症预测准确性。