Suppr超能文献

基于机器学习的复杂环境系统采样深度确定:以 EBPR 系统中单细胞拉曼光谱数据为例。

Machine Learning-Based Determination of Sampling Depth for Complex Environmental Systems: Case Study with Single-Cell Raman Spectroscopy Data in EBPR Systems.

机构信息

Department of Civil and Environmental Engineering, Northeastern University, Boston, Massachusetts 02115-5026, United States.

School of Civil and Environmental Engineering, Cornell University, Ithaca, New York 14853-0001, United States.

出版信息

Environ Sci Technol. 2022 Sep 20;56(18):13473-13484. doi: 10.1021/acs.est.1c08768. Epub 2022 Sep 1.

Abstract

Rapid progress in various advanced analytical methods, such as single-cell technologies, enable unprecedented and deeper understanding of microbial ecology beyond the resolution of conventional approaches. A major application challenge exists in the determination of sufficient sample size without sufficient prior knowledge of the community complexity and, the need to balance between statistical power and limited time or resources. This hinders the desired standardization and wider application of these technologies. Here, we proposed, tested and validated a computational sampling size assessment protocol taking advantage of a metric, named kernel divergence. This metric has two advantages: First, it directly compares data set-wise distributional differences with no requirements on human intervention or prior knowledge-based preclassification. Second, minimal assumptions in distribution and sample space are made in data processing to enhance its application domain. This enables test-verified appropriate handling of data sets with both linear and nonlinear relationships. The model was then validated in a case study with Single-cell Raman Spectroscopy (SCRS) phenotyping data sets from eight different enhanced biological phosphorus removal (EBPR) activated sludge communities located across North America. The model allows the determination of sufficient sampling size for any targeted or customized information capture capacity or resolution level. Promised by its flexibility and minimal restriction of input data types, the proposed method is expected to be a standardized approach for sampling size optimization, enabling more comparable and reproducible experiments and analysis on complex environmental samples. Finally, these advantages enable the extension of the capability to other single-cell technologies or environmental applications with data sets exhibiting continuous features.

摘要

各种先进分析方法的快速发展,如单细胞技术,使我们能够以前所未有的深度理解微生物生态学,超越传统方法的分辨率。一个主要的应用挑战在于,在没有对群落复杂性的充分先验知识的情况下,确定足够的样本量,并且需要在统计能力和有限的时间或资源之间取得平衡。这阻碍了这些技术的标准化和更广泛的应用。在这里,我们利用一种名为核散度的度量标准,提出、测试和验证了一种计算采样大小评估协议。该度量标准有两个优点:首先,它直接比较数据集的分布差异,不需要人为干预或基于先验知识的预分类。其次,在数据处理中对分布和样本空间做出了最小的假设,以增强其应用领域。这使得经过测试验证的适用于具有线性和非线性关系的数据的处理。然后,该模型在一个案例研究中得到了验证,该案例研究使用了来自北美八个不同强化生物除磷(EBPR)活性污泥群落的单细胞拉曼光谱(SCRS)表型数据集。该模型允许确定足够的采样大小,以满足任何目标或定制的信息捕获能力或分辨率水平。由于其灵活性和对输入数据类型的最小限制,所提出的方法有望成为采样大小优化的标准化方法,使复杂环境样本的实验和分析更具可比性和可重复性。最后,这些优势使我们能够将该能力扩展到具有连续特征的数据集的其他单细胞技术或环境应用中。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验