IEEE J Biomed Health Inform. 2023 Jun;27(6):3093-3103. doi: 10.1109/JBHI.2023.3257727. Epub 2023 Jun 5.
Data-driven approaches for molecular diagnostics are emerging as an alternative to perform an accurate and inexpensive multi-pathogen detection. A novel technique called Amplification Curve Analysis (ACA) has been recently developed by coupling machine learning and real-time Polymerase Chain Reaction (qPCR) to enable the simultaneous detection of multiple targets in a single reaction well. However, target classification purely relying on the amplification curve shapes faces several challenges, such as distribution discrepancies between different data sources (i.e., training vs testing). Optimisation of computational models is required to achieve higher performance of ACA classification in multiplex qPCR through the reduction of those discrepancies. Here, we proposed a novel transformer-based conditional domain adversarial network (T-CDAN) to eliminate data distribution differences between the source domain (synthetic DNA data) and the target domain (clinical isolate data). The labelled training data from the source domain and unlabelled testing data from the target domain are fed into the T-CDAN, which learns both domains' information simultaneously. After mapping the inputs into a domain-irrelevant space, T-CDAN removes the feature distribution differences and provides a clearer decision boundary for the classifier, resulting in a more accurate pathogen identification. Evaluation of 198 clinical isolates containing three types of carbapenem-resistant genes (bla, bla and bla) illustrates a curve-level accuracy of 93.1% and a sample-level accuracy of 97.0% using T-CDAN, showing an accuracy improvement of 20.9% and 4.9% respectively. This research emphasises the importance of deep domain adaptation to enable high-level multiplexing in a single qPCR reaction, providing a solid approach to extend qPCR instruments' capabilities in real-world clinical applications.
数据驱动的分子诊断方法正在兴起,作为一种替代方法,可以实现准确且廉价的多病原体检测。最近,一种名为扩增曲线分析(ACA)的新技术已经被开发出来,它通过将机器学习和实时聚合酶链反应(qPCR)相结合,能够在单个反应孔中同时检测多个靶标。然而,纯粹依赖扩增曲线形状的靶标分类方法面临着一些挑战,例如不同数据源(即训练与测试)之间的分布差异。需要对计算模型进行优化,以通过减少这些差异来提高 ACA 在多重 qPCR 中的分类性能。在这里,我们提出了一种基于变压器的条件域对抗网络(T-CDAN),以消除源域(合成 DNA 数据)和目标域(临床分离物数据)之间的数据分布差异。源域的标记训练数据和目标域的未标记测试数据被输入到 T-CDAN 中,T-CDAN 同时学习两个域的信息。在将输入映射到域无关空间之后,T-CDAN 消除了特征分布差异,并为分类器提供了更清晰的决策边界,从而实现更准确的病原体识别。对包含三种碳青霉烯耐药基因(bla、bla 和 bla)的 198 种临床分离物的评估表明,使用 T-CDAN 可以实现 93.1%的曲线级精度和 97.0%的样本级精度,分别提高了 20.9%和 4.9%。这项研究强调了深度域自适应的重要性,以实现单个 qPCR 反应中的高级多重化,为扩展 qPCR 仪器在实际临床应用中的功能提供了一种可靠的方法。