Latency Outliers Root Cause Analysis in the Field by Combining Aggregation and Tracing Tools - Mathieu Desnoyers & Julien Desfossez, EfficiOS Inc.

No ratings

Presented at LinuxCon 2016 by

We present how to combine tracing and aggregation tools to automatically detect latency outliers in production systems, capture detailed context information focused around each occurrence, and perform off-line root cause analysis. We present the LTTng (lttng.org) ecosystem, which includes tracing with the LTTng kernel and user-space tracers, online aggregation with the latency-tracker, as well as graphical and batch-mode post-processing analyses with Trace Compass (tracecompass.org) and lttng-analyses. The run-time portion of these tools is designed for low-impact on the systems, thus allowing tracing and aggregation to be enabled in production. We demonstrate how to use these tools in embedded, real-time and server environments through realistic user stories.