Campos Tulio L, Korhonen Pasi K, Hofmann Andreas, Gasser Robin B, Young Neil D
Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia; Bioinformatics Core Facility, Instituto Aggeu Magalhães, Fundação Oswaldo Cruz (IAM-Fiocruz), Recife, Pernambuco, Brazil.
Department of Veterinary Biosciences, Melbourne Veterinary School, The University of Melbourne, Parkville, Victoria 3010, Australia.
Biotechnol Adv. 2022 Jan-Feb;54:107822. doi: 10.1016/j.biotechadv.2021.107822. Epub 2021 Aug 27.
The availability of high-quality genomes and advances in functional genomics have enabled large-scale studies of essential genes in model eukaryotes, including the 'elegant worm' (Caenorhabditis elegans; Nematoda) and the 'vinegar fly' (Drosophila melanogaster; Arthropoda). However, this is not the case for other, much less-studied organisms, such as socioeconomically important parasites, for which functional genomic platforms usually do not exist. Thus, there is a need to develop innovative techniques or approaches for the prediction, identification and investigation of essential genes. A key approach that could enable the prediction of such genes is machine learning (ML). Here, we undertake an historical review of experimental and computational approaches employed for the characterisation of essential genes in eukaryotes, with a particular focus on model ecdysozoans (C. elegans and D. melanogaster), and discuss the possible applicability of ML-approaches to organisms such as socioeconomically important parasites. We highlight some recent results showing that high-performance ML, combined with feature engineering, allows a reliable prediction of essential genes from extensive, publicly available 'omic data sets, with major potential to prioritise such genes (with statistical confidence) for subsequent functional genomic validation. These findings could 'open the door' to fundamental and applied research areas. Evidence of some commonality in the essential gene-complement between these two organisms indicates that an ML-engineering approach could find broader applicability to ecdysozoans such as parasitic nematodes or arthropods, provided that suitably large and informative data sets become/are available for proper feature engineering, and for the robust training and validation of algorithms. This area warrants detailed exploration to, for example, facilitate the identification and characterisation of essential molecules as novel targets for drugs and vaccines against parasitic diseases. This focus is particularly important, given the substantial impact that such diseases have worldwide, and the current challenges associated with their prevention and control and with drug resistance in parasite populations.
高质量基因组的可得性以及功能基因组学的进展,使得对模式真核生物中必需基因的大规模研究成为可能,这些模式真核生物包括“秀丽线虫”(秀丽隐杆线虫;线虫纲)和“果蝇”(黑腹果蝇;节肢动物门)。然而,对于其他研究较少的生物,情况并非如此,比如具有重要社会经济意义的寄生虫,针对这类生物通常不存在功能基因组平台。因此,需要开发创新技术或方法来预测、鉴定和研究必需基因。一种能够实现此类基因预测的关键方法是机器学习(ML)。在这里,我们对用于真核生物中必需基因表征的实验和计算方法进行了历史回顾,特别关注模式蜕皮动物(秀丽隐杆线虫和黑腹果蝇),并讨论了ML方法对诸如具有重要社会经济意义的寄生虫等生物的可能适用性。我们强调了一些最新结果,这些结果表明,高性能ML与特征工程相结合,能够从广泛的、公开可用的“组学”数据集中可靠地预测必需基因,具有很大的潜力(具有统计学置信度)将此类基因列为优先事项,以便随后进行功能基因组验证。这些发现可能会为基础研究和应用研究领域“打开大门”。这两种生物在必需基因互补方面存在一些共性的证据表明,如果有合适的大而信息丰富的数据集可用于适当的特征工程以及算法的稳健训练和验证,那么ML工程方法可能会在诸如寄生线虫或节肢动物等蜕皮动物中得到更广泛的应用。这个领域值得详细探索,例如,有助于识别和表征必需分子,将其作为抗寄生虫疾病药物和疫苗的新靶点。鉴于此类疾病在全球范围内产生的重大影响,以及当前在预防和控制这些疾病以及寄生虫群体耐药性方面所面临的挑战,这一重点尤为重要。