found drama

get oblique

canary releases and adaptive LIFO queues

by !undefined

Facebook’s Ben Maurer makes some great points in his Fail at Scale. I didn’t watch the accompanying video presentation, but it’s an extremely interesting read about how they try to anticipate and manage failures. The observation that it’s so often linked back to configuration changes is an interesting one. I also enjoyed the bit about canary releases and the adaptive LIFO queues.

Being the Allspaw fan that I am, I always cringe a bit when I see someone so cavalierly throw out the phrases “human error” and “root cause” — no matter what their data say. But their “DERP” methodology softens the blow a bit. If you’re not doing post mortems incident reviews using something like that, then there’s a good chance that yours are toxic.

About !undefined

Syndicated content from the !undefined Tumblr blog where Rob Friesel posts items related to software engineering, user interface/experience design, and Agile software development. Lots of JavaScript here. View all posts by !undefined →

Leave a Reply

Your email address will not be published. Required fields are marked *

