College of Chemistry, Sichuan University, Chengdu, Sichuan, China.
Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA.
Hum Mutat. 2021 Jun;42(6):667-684. doi: 10.1002/humu.24203. Epub 2021 Apr 23.
One of the greatest challenges in human genetics is deciphering the link between functional variants in noncoding sequences and the pathophysiology of complex diseases. To address this issue, many methods have been developed to sort functional single-nucleotide variants (SNVs) for neutral SNVs in noncoding regions. In this study, we integrated well-established features and commonly used datasets and merged them into large-scale datasets based on a random forest model, which yielded promising performance and outperformed some cutting-edge approaches. Our analyses of feature importance and data coverage also provide certain clues for future research in enhancing the prediction of functional noncoding SNVs.
人类遗传学面临的最大挑战之一是破译非编码序列中的功能变体与复杂疾病的病理生理学之间的联系。为了解决这个问题,已经开发了许多方法来对非编码区域中的中性单核苷酸变体 (SNV) 进行功能 SNV 的分类。在这项研究中,我们整合了成熟的特征和常用的数据集,并将它们合并到基于随机森林模型的大型数据集中,该模型产生了有希望的性能,并且优于一些最先进的方法。我们对特征重要性和数据覆盖范围的分析也为未来提高功能非编码 SNV 预测的研究提供了一定的线索。