Wuhan National Laboratory for Optoelectronics, Huazhong University of Science & Technology, Wuhan, Hubei, China.
BGI PathoGenesis Pharmaceutical Technology, BGI-Shenzhen, Shenzhen 518083, China.
RNA Biol. 2021 Oct 15;18(sup1):172-181. doi: 10.1080/15476286.2021.1960688. Epub 2021 Aug 30.
The high-resolution feature of single-cell transcriptome sequencing technology allows researchers to observe cellular gene expression profiles at the single-cell level, offering numerous possibilities for subsequent biomedical investigation. However, the unavoidable technical impact of high missing values in the gene-cell expression matrices generated by insufficient RNA input severely hampers the accuracy of downstream analysis. To address this problem, it is essential to develop a more rapid and stable imputation method with greater accuracy, which should not only be able to recover the missing data, but also effectively facilitate the following biological mechanism analysis. The existing imputation methods all have their drawbacks and limitations, some require pre-assumed data distribution, some cannot distinguish between technical and biological zeros, and some have poor computational performance. In this paper, we presented a novel imputation software FRMC for single-cell RNA-Seq data, which innovates a fast and accurate singular value thresholding approximation method. The experiments demonstrated that FRMC can not only precisely distinguish 'true zeros' from dropout events and correctly impute missing values attributed to technical noises, but also effectively enhance intracellular and intergenic connections and achieve accurate clustering of cells in biological applications. In summary, FRMC can be a powerful tool for analysing single-cell data because it ensures biological significance, accuracy, and rapidity simultaneously. FRMC is implemented in Python and is freely accessible to non-commercial users on GitHub: https://github.com/HUST-DataMan/FRMC.
单细胞转录组测序技术的高分辨率特征使研究人员能够在单细胞水平观察细胞基因表达谱,为随后的生物医学研究提供了许多可能性。然而,由于 RNA 输入不足而产生的基因-细胞表达矩阵中不可避免的高缺失值的技术影响,严重阻碍了下游分析的准确性。为了解决这个问题,开发一种更快速、更稳定、更准确的插补方法至关重要,这种方法不仅要能够恢复缺失的数据,还要有效地促进后续的生物学机制分析。现有的插补方法都有其缺点和局限性,有些需要预先假设数据分布,有些不能区分技术零和生物学零,有些计算性能较差。在本文中,我们提出了一种新的用于单细胞 RNA-Seq 数据的插补软件 FRMC,该软件创新了一种快速准确的奇异值阈值逼近方法。实验表明,FRMC 不仅可以精确地区分“真正的零”和辍学事件,并正确地插补归因于技术噪声的缺失值,而且可以有效地增强细胞内和基因间的连接,并在生物应用中实现细胞的准确聚类。总之,FRMC 可以成为分析单细胞数据的有力工具,因为它同时确保了生物学意义、准确性和快速性。FRMC 是用 Python 实现的,非商业用户可以在 GitHub 上免费使用:https://github.com/HUST-DataMan/FRMC。
RNA Biol. 2021-10-15
Brief Bioinform. 2022-9-20
Bioinformatics. 2020-5-1
Comput Biol Med. 2022-7
Bioinformatics. 2023-12-1
Comput Biol Med. 2023-9
Nucleic Acids Res. 2020-9-4
IEEE/ACM Trans Comput Biol Bioinform. 2024
Brief Bioinform. 2023-1-19
Genome Biol. 2021-6-28
Bioinformatics. 2020-6-1
Front Genet. 2019-1-29
Nat Methods. 2018-6-25
JMLR Workshop Conf Proc. 2016