Auton Lab, School of Computer Science, Carnegie Mellon University Pittsburgh, PA, USA.
AMIA Annu Symp Proc. 2022 Feb 21;2021:536-545. eCollection 2021.
Analysing electrocardiograms (ECGs) is an inexpensive and non-invasive, yet powerful way to diagnose heart disease. ECG studies using Machine Learning to automatically detect abnormal heartbeats so far depend on large, manually annotated datasets. While collecting vast amounts of unlabeled data can be straightforward, the point-by-point annotation of abnormal heartbeats is tedious and expensive. We explore the use of multiple weak supervision sources to learn diagnostic models of abnormal heartbeats via human designed heuristics, without using ground truth labels on individual data points. Our work is among the first to define weak supervision sources directly on time series data. Results show that with as few as six intuitive time series heuristics, we are able to infer high quality probabilistic label estimates for over 100,000 heartbeats with little human effort, and use the estimated labels to train competitive classifiers evaluated on held out test data.
分析心电图(ECG)是一种廉价、非侵入性但功能强大的心脏病诊断方法。使用机器学习自动检测异常心跳的 ECG 研究迄今为止依赖于大型的、手动注释的数据集。虽然收集大量未标记的数据可能很简单,但逐个标记异常心跳是繁琐且昂贵的。我们探索了使用多种弱监督源通过人工设计的启发式方法学习异常心跳的诊断模型,而无需在各个数据点上使用真实标签。我们的工作是首批直接在时间序列数据上定义弱监督源的工作之一。结果表明,仅使用六个直观的时间序列启发式方法,我们就能够以很少的人工努力推断出超过 100,000 个心跳的高质量概率标签估计值,并使用这些估计的标签来训练在保留测试数据上评估的有竞争力的分类器。