Laboratory for Atomistic and Molecular Mechanics (LAMM), Department of Civil and Environmental Engineering , Massachusetts Institute of Technology , 77 Massachusetts Avenue 1-290 , Cambridge , Massachusetts 02139 , United States.
ACS Nano. 2019 Jul 23;13(7):7471-7482. doi: 10.1021/acsnano.9b02180. Epub 2019 Jun 26.
We report a self-consistent method to translate amino acid sequences into audible sound, use the representation in the musical space to train a neural network, and then apply it to generate protein designs using artificial intelligence (AI). The sonification method proposed here uses the normal mode vibrations of the amino acid building blocks of proteins to compute an audible representation of each of the 20 natural amino acids, which is fully defined by the overlay of its respective natural vibrations. The vibrational frequencies are transposed to the audible spectrum following the musical concept of transpositional equivalence, playing or writing music in a way that makes it sound higher or lower in pitch while retaining the relationships between tones or chords played. This transposition method ensures that the relative values of the vibrational frequencies within each amino acid and among different amino acids are retained. The characteristic frequency spectrum and sound associated with each of the amino acids represents a type of musical scale that consists of 20 tones, the "amino acid scale". To create a playable instrument, each tone associated with the amino acids is assigned to a specific key on a piano roll, which allows us to map the sequence of amino acids in proteins into a musical score. To reflect higher-order structural details of proteins, the volume and duration of the notes associated with each amino acid are defined by the secondary structure of proteins, computed using DSSP and thereby introducing musical rhythm. We then train a recurrent neural network based on a large set of musical scores generated by this sonification method and use AI to generate musical compositions, capturing the innate relationships between amino acid sequence and protein structure. We then translate the musical data generated by AI into protein sequences, thereby obtaining protein designs that feature specific design characteristics. We illustrate the approach in several examples that reflect the sonification of protein sequences, including multihour audible representations of natural proteins and protein-based musical compositions solely generated by AI. The approach proposed here may provide an avenue for understanding sequence patterns, variations, and mutations and offers an outreach mechanism to explain the significance of protein sequences. The method may also offer insight into protein folding and understanding the context of the amino acid sequence in defining the secondary and higher-order folded structure of proteins and could hence be used to detect the effects of mutations through sound.
我们报告了一种将氨基酸序列翻译为可听见声音的自洽方法,使用音乐空间中的表示形式来训练神经网络,然后应用人工智能 (AI) 生成蛋白质设计。这里提出的声音合成方法使用蛋白质的氨基酸构建块的正常模式振动来计算 20 种天然氨基酸中的每一种的可听见表示,该表示完全由其各自的天然振动的叠加定义。振动频率按照转位等价的音乐概念转换为可听见的频谱,以一种听起来音高更高或更低的方式演奏或书写音乐,同时保留所演奏的音或和弦之间的关系。这种转位方法确保了每个氨基酸内和不同氨基酸之间的振动频率的相对值得以保留。与每个氨基酸相关的特征频谱和声音代表了一种由 20 个音组成的音阶,即“氨基酸音阶”。为了创建可演奏的乐器,与氨基酸相关的每个音都被分配到钢琴卷轴上的特定键上,这使我们能够将蛋白质中的氨基酸序列映射到乐谱中。为了反映蛋白质的高级结构细节,与每个氨基酸相关的音符的音量和持续时间由蛋白质的二级结构定义,使用 DSSP 计算,从而引入音乐节奏。然后,我们基于这种声音合成方法生成的大量乐谱训练了一个递归神经网络,并使用 AI 生成音乐作品,捕捉氨基酸序列和蛋白质结构之间的固有关系。然后,我们将 AI 生成的音乐数据转换回蛋白质序列,从而获得具有特定设计特征的蛋白质设计。我们在几个示例中说明了该方法,这些示例反映了蛋白质序列的声音合成,包括天然蛋白质的多小时可听见表示以及仅由 AI 生成的基于蛋白质的音乐作品。这里提出的方法可以为理解序列模式、变化和突变提供途径,并提供一种解释蛋白质序列意义的推广机制。该方法还可以深入了解蛋白质折叠以及理解氨基酸序列在定义蛋白质的二级和更高阶折叠结构中的上下文,因此可以通过声音检测突变的影响。