Kiciman Emre, Fox Armando
Department of Computer Science, Stanford University, Stanford, CA 94305, USA.
IEEE Trans Neural Netw. 2005 Sep;16(5):1027-41. doi: 10.1109/TNN.2005.853411.
Most Internet services (e-commerce, search engines, etc.) suffer faults. Quickly detecting these faults can be the largest bottleneck in improving availability of the system. We present Pinpoint, a methodology for automating fault detection in Internet services by: 1) observing low-level internal structural behaviors of the service; 2) modeling the majority behavior of the system as correct; and 3) detecting anomalies in these behaviors as possible symptoms of failures. Without requiring any a priori application-specific information, Pinpoint correctly detected 89%-96% of major failures in our experiments, as compared with 20%-70% detected by current application-generic techniques.
大多数互联网服务(电子商务、搜索引擎等)都会出现故障。快速检测这些故障可能是提高系统可用性的最大瓶颈。我们提出了Pinpoint,一种通过以下方式实现互联网服务故障检测自动化的方法:1)观察服务的底层内部结构行为;2)将系统的多数行为建模为正确行为;3)将这些行为中的异常检测为可能的故障症状。无需任何先验的特定应用信息,Pinpoint在我们的实验中正确检测出了89%-96%的主要故障,相比之下,当前的通用应用技术只能检测出20%-70%的故障。