Qu Evan B, Baker Jacob S, Markey Laura, Khadka Veda, Mancuso Chris, Tripp A Delphine, Lieberman Tami D
Institute for Medical Engineering and Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
Institute for Medical Engineering and Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Department of Systems Biology, Harvard University, Cambridge, MA 02138, USA.
Cell Rep. 2025 Aug 26;44(8):116134. doi: 10.1016/j.celrep.2025.116134. Epub 2025 Aug 12.
Genetically distinct strains of a species can vary widely in phenotype, reducing the utility of species-resolved microbiome measurements for detecting associations with health or disease. While metagenomics theoretically provides information on all strains in a sample, current strain-resolved analysis methods face a tradeoff: de novo genotyping approaches can detect novel strains but struggle when applied to strain-rich or low-coverage samples, while reference database methods work robustly across sample types but are insensitive to novel diversity. We present PHLAME, a method that bridges this divide by combining the advantages of reference database approaches with novelty awareness. PHLAME explicitly defines clades at multiple phylogenetic levels and introduces a probabilistic, mutation-based framework to quantify novelty from the nearest reference. By applying PHLAME to publicly available human skin and vaginal metagenomes, we find clade associations with coexisting species, geography, and host age. The ability to characterize intraspecies associations and dynamics in previously inaccessible environments will enable strain-level insights from accumulating metagenomic data.
一个物种的基因不同菌株在表型上可能有很大差异,这降低了物种解析微生物组测量在检测与健康或疾病关联方面的效用。虽然宏基因组学理论上能提供样本中所有菌株的信息,但当前的菌株解析分析方法面临权衡:从头基因分型方法可以检测到新菌株,但应用于富含菌株或低覆盖度样本时会遇到困难,而参考数据库方法在各种样本类型上都能稳健工作,但对新的多样性不敏感。我们提出了PHLAME,一种通过结合参考数据库方法的优势和对新菌株的识别来弥合这一差距的方法。PHLAME在多个系统发育水平上明确界定进化枝,并引入一个基于概率、基于突变的框架来量化与最接近参考菌株相比的新异性。通过将PHLAME应用于公开可用的人类皮肤和阴道宏基因组,我们发现进化枝与共存物种、地理位置和宿主年龄之间存在关联。在以前难以进入的环境中表征种内关联和动态的能力,将使我们能够从积累的宏基因组数据中获得菌株水平的见解。