Bianchi Matt T, Russo Kathryn, Gabbidon Harriett, Smith Tiaundra, Goparaju Balaji, Westover M Brandon
Neurology Department, Massachusetts General Hospital; Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA.
Neurology Department, Massachusetts General Hospital.
Nat Sci Sleep. 2017 Feb 16;9:11-29. doi: 10.2147/NSS.S130141. eCollection 2017.
Clinical polysomnography (PSG) databases are a rich resource in the era of "big data" analytics. We explore the uses and potential pitfalls of clinical data mining of PSG using statistical principles and analysis of clinical data from our sleep center. We performed retrospective analysis of self-reported and objective PSG data from adults who underwent overnight PSG (diagnostic tests, n=1835). Self-reported symptoms overlapped markedly between the two most common categories, insomnia and sleep apnea, with the majority reporting symptoms of both disorders. Standard clinical metrics routinely reported on objective data were analyzed for basic properties (missing values, distributions), pairwise correlations, and descriptive phenotyping. Of 41 continuous variables, including clinical and PSG derived, none passed testing for normality. Objective findings of sleep apnea and periodic limb movements were common, with 51% having an apnea-hypopnea index (AHI) >5 per hour and 25% having a leg movement index >15 per hour. Different visualization methods are shown for common variables to explore population distributions. Phenotyping methods based on clinical databases are discussed for sleep architecture, sleep apnea, and insomnia. Inferential pitfalls are discussed using the current dataset and case examples from the literature. The increasing availability of clinical databases for large-scale analytics holds important promise in sleep medicine, especially as it becomes increasingly important to demonstrate the utility of clinical testing methods in management of sleep disorders. Awareness of the strengths, as well as caution regarding the limitations, will maximize the productive use of big data analytics in sleep medicine.
临床多导睡眠图(PSG)数据库是“大数据”分析时代的丰富资源。我们运用统计原理并分析来自我们睡眠中心的临床数据,探讨PSG临床数据挖掘的用途和潜在陷阱。我们对接受过夜PSG检查的成年人(诊断测试,n = 1835)的自我报告和客观PSG数据进行了回顾性分析。自我报告的症状在两种最常见的类别——失眠和睡眠呼吸暂停之间有明显重叠,大多数人报告有这两种疾病的症状。对客观数据中常规报告的标准临床指标进行了基本属性(缺失值、分布)、成对相关性和描述性表型分析。在41个连续变量中,包括临床和PSG衍生变量,没有一个通过正态性检验。睡眠呼吸暂停和周期性肢体运动的客观发现很常见,51%的人呼吸暂停低通气指数(AHI)>5次/小时,25%的人腿部运动指数>15次/小时。展示了常见变量的不同可视化方法以探索总体分布。讨论了基于临床数据库的睡眠结构、睡眠呼吸暂停和失眠的表型分析方法。使用当前数据集和文献中的案例讨论了推理陷阱。临床数据库用于大规模分析的可用性不断提高,在睡眠医学中具有重要前景,特别是因为证明临床测试方法在睡眠障碍管理中的效用变得越来越重要。认识到优势以及对局限性保持谨慎,将最大限度地提高大数据分析在睡眠医学中的有效应用。