Gao Chen-Yi, Cecconi Fabio, Vulpiani Angelo, Zhou Hai-Jun, Aurell Erik
Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, People's Republic of China. School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, People's Republic of China.
Phys Biol. 2019 Jan 29;16(2):026002. doi: 10.1088/1478-3975/aafbe0.
Direct coupling analysis (DCA) is a now widely used method to leverage statistical information from many similar biological systems to draw meaningful conclusions on each system separately. DCA has been applied with great success to sequences of homologous proteins, and also more recently to whole-genome population-wide sequencing data. We here argue that the use of DCA on the genome scale is contingent on fundamental issues of population genetics. DCA can be expected to yield meaningful results when a population is in the quasi-linkage equilibrium (QLE) phase studied by Kimura and others, but not, for instance, in a phase of clonal competition. We discuss how the exponential (Potts model) distributions emerge in QLE, and compare couplings to correlations obtained in a study of about 3000 genomes of the human pathogen Streptococcus pneumoniae.
直接耦合分析(DCA)是一种目前广泛使用的方法,它利用来自许多相似生物系统的统计信息,分别对每个系统得出有意义的结论。DCA已在同源蛋白质序列分析中取得了巨大成功,最近也应用于全基因组群体测序数据。我们在此认为,在基因组规模上使用DCA取决于群体遗传学的基本问题。当群体处于木村等人研究的准连锁平衡(QLE)阶段时,预计DCA会产生有意义的结果,但例如在克隆竞争阶段则不然。我们讨论了指数(Potts模型)分布如何在QLE中出现,并将耦合与在对人类病原体肺炎链球菌约3000个基因组的研究中获得的相关性进行了比较。