Bioproducts Sciences and Engineering Laboratory, Washington State University, 2710 Crimson Way, Richland, WA, 99354, USA.
Voiland School of Chemical Engineering and Bioengineering, Washington State University, Richland, WA, 99354, USA.
Sci Data. 2022 Oct 22;9(1):647. doi: 10.1038/s41597-022-01709-4.
Lignin is one of the most abundant biopolymers in nature and has great potential to be transformed into high-value chemicals. However, the limited availability of molecular structure data hinders its potential industrial applications. Herein, we present the Lignin Structural (LGS) Dataset that includes the molecular structure of milled wood lignin focusing on two major monomeric units (coniferyl and syringyl), and the six most common interunit linkages (phenylpropane β-aryl ether, resinol, phenylcoumaran, biphenyl, dibenzodioxocin, and diaryl ether). The dataset constitutes a unique resource that covers a part of lignin's chemical space characterized by polymer chains with lengths in the range of 3 to 25 monomer units. Structural data were generated using a sequence-controlled polymer generation approach that was calibrated to match experimental lignin properties. The LGS dataset includes 60 K newly generated lignin structures that match with high accuracy (~90%) the experimentally determined structural compositions available in the literature. The LGS dataset is a valuable resource to advance lignin chemistry research, including computational simulation approaches and predictive modelling.
木质素是自然界中最丰富的生物聚合物之一,具有转化为高价值化学品的巨大潜力。然而,其分子结构数据的有限可用性阻碍了其潜在的工业应用。在此,我们呈现木质素结构(LGS)数据集,其中包括木质素的分子结构,重点是两种主要的单体单元(松柏醇和丁香醇)和六个最常见的单元间键(苯丙烷β-芳基醚、树脂醇、苯并呋喃、联苯、二苯并二恶烷和二芳基醚)。该数据集构成了一个独特的资源,涵盖了木质素化学空间的一部分,其特征是聚合物链的长度在 3 到 25 个单体单元的范围内。结构数据是使用序列控制的聚合物生成方法生成的,该方法经过校准以匹配实验木质素特性。LGS 数据集包含 60K 个新生成的木质素结构,与文献中可用的实验确定的结构组成具有很高的准确性(~90%)。LGS 数据集是推进木质素化学研究的有价值资源,包括计算模拟方法和预测建模。