Department of Bioinformatics, Pirogov Russian National Research Medical University, 117997 Moscow, Russia.
Department of Bioinformatics, Institute of Biomedical Chemistry, 119121 Moscow, Russia.
Int J Mol Sci. 2023 Jan 27;24(3):2463. doi: 10.3390/ijms24032463.
Next Generation Sequencing (NGS) technologies are rapidly entering clinical practice. A promising area for their use lies in the field of newborn screening. The mass screening of newborns using NGS technology leads to the discovery of a large number of new missense variants that need to be assessed for association with the development of hereditary diseases. Currently, the primary analysis and identification of pathogenic variations is carried out using bioinformatic tools. Although extensive efforts have been made in the computational approach to variant interpretation, there is currently no generally accepted pathogenicity predictor. In this study, we used the sequence-structure-property relationships (SSPR) approach, based on the representation of protein fragments by molecular structural formula. The approach predicts the pathogenic effect of single amino acid substitutions in proteins related with twenty-five monogenic heritable diseases from the Uniform Screening Panel for Major Conditions recommended by the Advisory Committee on Hereditary Disorders in Newborns and Children. In order to create SSPR models of classification, we modified a piece of cheminformatics software, MultiPASS, that was originally developed for the prediction of activity spectra for drug-like substances. The created SSPR models were compared with traditional bioinformatic tools (SIFT 4G, Polyphen-2 HDIV, MutationAssessor, PROVEAN and FATHMM). The average AUC of our approach was 0.804 ± 0.040. Better quality scores were achieved for 15 from 25 proteins with a significantly higher accuracy for some proteins (, , ). The best SSPR models of classification are freely available in the online resource SAV-Pred (Single Amino acid Variants Predictor).
下一代测序(NGS)技术正在迅速进入临床实践。它们在新生儿筛查领域的应用前景广阔。使用 NGS 技术对新生儿进行大规模筛查会发现大量新的错义变异,需要评估它们与遗传性疾病发展的关联。目前,主要使用生物信息学工具对致病变异进行分析和鉴定。尽管在变异解释的计算方法上已经进行了广泛的研究,但目前还没有普遍接受的致病性预测器。在这项研究中,我们使用了序列-结构-性质关系(SSPR)方法,该方法基于蛋白质片段的分子结构公式表示。该方法预测了与新生儿和儿童遗传性疾病咨询委员会推荐的主要疾病统一筛查面板中二十五种单基因遗传性疾病相关的蛋白质中单个氨基酸取代的致病效应。为了创建分类的 SSPR 模型,我们修改了一个原本用于预测类药性物质活性谱的化学信息学软件 MultiPASS。创建的 SSPR 模型与传统生物信息学工具(SIFT 4G、Polyphen-2 HDIV、MutationAssessor、PROVEAN 和 FATHMM)进行了比较。我们方法的平均 AUC 为 0.804 ± 0.040。对于 25 种蛋白质中的 15 种,我们的方法获得了更好的质量分数,并且对于某些蛋白质的准确性显著提高(,, )。最好的分类 SSPR 模型可在在线资源 SAV-Pred(单氨基酸变异预测器)中免费获得。