Thursday, March 30, 2023

Let Unsustainable Things Fail

I read a post by Max Countryman recently titled Let It Fail (Hacker News discussion), where he talks about the importance of failure in software engineering organizations. He recounts an experience he had where his boss gave him surprising advice to let a legacy system that was not getting enough attention fail visibly rather than paper over the cracks.

I've written before on this blog about how heroics cover up systemic issues. Organizations become dependent on heroics. Heroics beget more heroics. Heroics are normalized. Eventually the heroes burn out.

That one long-serving, loyal employee who had become a silo of critical knowledge up and quit. That manual, error-prone process we had been using to deploy our software for way too long finally caused an outage. We've been rushing code changes right before big releases, and one finally bit us hard.

It's important to talk openly about unsustainable engineering practices and the failure they portend. But issues of technical debt are hard to understand when placed alongside new feature development and other business priorities. The value of under-the-covers work is hard to understand. Sometimes you need to pull the covers away.