Is your root cause really “human error”? How did your environment let the human make the error? How did their error take down the service? How many outages did humans prevent? Can your dev teams’ priorities be aligned with reliability, instead of only with churning out features? At Heroku, we do ops as a service -- reliability is our product. If we go down, we take thousands of businesses with us. In SRE, we push for reliability and resiliency in designs, sure, but it’s more than that. We iterate on process, automation, tooling, and incident response, because people are at the heart of everything we do.