We present how to combine tracing and aggregation tools to automatically detect latency outliers in production systems, capture detailed context information focused around each occurrence, and perform off-line root cause analysis. We present the LTTng (lttng.org) ecosystem, which includes tracing with the LTTng kernel and user-space tracers, online aggregation with the latency-tracker, as well as graphical and batch-mode post-processing analyses with Trace Compass (tracecompass.org) and lttng-analyses. The run-time portion of these tools is designed for low-impact on the systems, thus allowing tracing and aggregation to be enabled in production. We demonstrate how to use these tools in embedded, real-time and server environments through realistic user stories.