Good write up from the eBay team about how they’re using Hystrix to put circuit breakers around problematic methods (read: “remote service calls”). Their experience seems to pretty closely match my own (though they’ve been using it longer, and at larger scale).
Also, let’s take a moment to sanity check that closing sentiment:
…but Hystrix has proven to be a sound and mature library for maintaining a resilient environment for our critical applications, providing high availability during any time period.
One thing to remember when working with circuit breakers in the context of a service-oriented architecture (i.e., micro-services) is that there’s the need to differentiate between being “available” and providing the desired level of functioning. Ask yourself, what’s worse: quickly getting back an error that you can handle? or waiting a (potentially) really long time for a successful response? Obviously there’s no one-size-fits-all answer to this question, as it depends on who your consumers are, what your SLAs are, etc. But it’s probably a good idea to get in the habit of preferring fast responses, coping well with errors and failures, and being the good citizen that doesn’t cause cascading failures because you’ve allowed all your request threads to back up and wait on long-running queries that maybe never complete.