Legacy Systems Age in Reverse

I feel like every job I've ever had in software there has been some old legacy system hanging around that everyone denigrated and perennially spoke about replacing with a newer, shinier system. That replacement was always coming any day now. By the time I'd moved on to a new job, that legacy system was still running, quietly doing some important function for the business, having somehow survived the demise everyone predicted.

The first release of jQuery was in 2006. Within a few years--and every year after--web developers would talk continuously about how outmoded jQuery was, and that no self-respecting developer would still be using it when so many newer, better JavaScript libraries had come along to replace it. Well, guess what. In 2024 (18 years later), jQuery is still in active development, and according to the official jQuery blog, 90% of all websites use jQuery currently. As someone who's been working in the web development space since roughly the dawn of jQuery, I can say that it is still extremely rare that I see a codebase that doesn't use jQuery somewhere, whether as a direct dependency, a transitive dependency, or as part of some 3rd-party tool integrated into the product. jQuery will outlast the human species.

Software systems exhibit the Lindy effect:

The Lindy effect (also known as Lindy's Law) is a theorized phenomenon by which the future life expectancy of some non-perishable things, like a technology or an idea, is proportional to their current age. Thus, the Lindy effect proposes the longer a period something has survived to exist or be used in the present, the longer its remaining life expectancy. Longevity implies a resistance to change, obsolescence, or competition, and greater odds of continued existence into the future. Where the Lindy effect applies, mortality rate decreases with time.

They actually age in reverse. Every year they exist doubles their additional life expectancy. That old system that everyone thinks will be replaced any day now is gaining strength before your eyes, becoming every day less likely to be replaced.

New technologies are the least likely to survive another day, another year. Just as a business that's existed for 100 years is more likely to survive to its 101st year than a 2-year-old business is likely to see its 3rd year.

The implications of the Lindy effect to any of us working in "technology" are so widespread that it's hard to overstate. We're in an industry obsessed with the new, but the new things are constantly passing away as the old geezers are running laps around them. Any new programming language you're excited about right now will be dead long before C is. VS Code will kick the bucket before Vim. Be kind to those elderly technologies around you, because they'll be here long after you're gone.

Scrum Tensions: Code Review

There are tensions in Scrum anywhere you have intra-sprint cycles that must resolve by the end of the sprint. In my last post I wrote about manual quality assurance and the tension that causes in a Scrum setting. Another one of these intra-sprint cycles that most Scrums teams include in their process is code review.

You could, in a way, consider code review to be another form of "QA". In both cases we're initiating a cycle-within-a-sprint with a loop of approval, rejection, re-work, and acceptance.

In the sense that Scrum teams strive to get every sprint backlog item to "done" by the end of the sprint, back-and-forth cycles kill sprints. But they're also essential. Therein lies the tension.

Most Scrum teams conduct estimation sessions prior to starting a sprint. The estimates that a team assigns to product backlog items are typically based on a gut feel of how long it will take or how complicated it will be to "complete" a backlog item. In most teams I've worked with, that estimate is implied to mean how long it will take a software engineer on the team to submit a code review (via a pull request or something similar), i.e., when the engineer thinks their part is "done". Some teams also build into the estimate a duration or complexity estimate based on the work necessary by the QA people.

What estimates don't ever capture, in my experience, is how much effort/time/complexity is required to complete the code review or QA cycles on a backlog item. How can we know ahead of time how many pieces of feedback are going to be left on a pull request? How many of those will necessitate re-work on the item before the end of the sprint? How long will the re-work take? What if the re-work is still not up to snuff? Etc., etc., etc. Each code review starts a cycle of unknown duration.

And all the while everyone who cares about the completion of the sprint is tap-tap-tapping their fingers, nervously wondering if these intra-sprint cycles are going to complete on time.

Even though people mean well, what happens in practice is that thorough code review is tacitly disincentivized. Most product owners are not engineers, and although they understand intuitively that code review, like any form of quality assurance, is necessary, it also adds delay to the immediate "done-ness" of work.

Engineers also know that code reviews are important, but there is always pressure to get to them faster. When bugs crop up later, or the codebase slowly accumulates technical debt, no one in management is going to track down the specific pull request that introduced the problem, read the list of names in the approvers list and assign blame to those individuals. A swift click on the "Approve" button satisfies everyone in the moment.

What do we do about this tension? We can't have code review without introducing an unpredictable amount of delay to each sprint backlog item. You can never fully resolve this tension, in my opinion. But there are some ways to ease the tension.

Be Your Own Reviewer

I'm going to put this one on the engineers. And, no, I'm not suggesting that an engineer who submits a pull request should be clicking "Approve" on their own pull request. What I am suggesting is that before an engineer submits a pull request, it should only be after they have examined their own code to the degree that a reviewer would.

I can't tell you how many times over my career that I've reviewed a pull request where it was clear that the submitter had not even looked at what they were submitting for review. You're not ready to submit a pull request until you've looked line-by-line at the diff you're about to submit and pre-corrected every issue you can find. We're not talking about architectural judgment calls here, we're talking about misspelling method names, pushing blank files, including huge sections of commented-out code from various abandoned local experiments.

It should be rare that a reviewer needs to point out an obvious mistake in a review. I personally feel embarrassed when a reviewer points out something that I know I could have found myself. And I feel doubly embarrassed if it's a mistake they've pointed out multiple times.

Automate, Automate, Automate

Take advantage of every opportunity to automate the tasks associated with code review. We have linters, pre-commit hooks in version control systems, IDE integrations. There are ways to automate basically any tedious, repetitive aspects of code review. If reviewers are repeatedly finding the same kinds of mistakes, it's time to automate the checks for those mistakes, so submitters can't submit them in the first place.

Back-and-forth intra-sprint cycles are what kill sprints. Code review is a cycle, but what can we do to get down to one iteration per cycle as often as possible?

The tension between any kind of quality assurance (including code review) and sprint-based methodologies is impossible to erase, but we have techniques to make it less tense.

Scrum Tensions: Manual QA

After working for many years in the software industry, almost always using a methodology resembling Scrum, there are certain tensions that I've come to believe have no good solution--at least--I've never seen them solved elegantly.

One of those tensions comes from manual quality assurance. When sprint backlog items need to be manually tested by dedicated QA people, there is simply no clean way to do it in my experience.

Here are some of the issues that I've seen repeatedly:

  • QA does not have time to test items where the development work is completed late in the sprint
  • QA people are sitting idle for the first few days of the sprint with no completed development work to test yet
  • If a bug is found, there is no time in the current sprint to fix it

Here are some attempted remedies that I've seen:

  • Let's build some slack into the sprint for the developers so they can complete their development work with X days to spare at the end to allow time for QA
  • Let's have the developers "work ahead" of the QA people, as in, the development work that we say is "done" within one sprint is actually tested in the next sprint

And here's where those remedies fall down:

  • Inevitably, there is pressure to work more items into successive sprints, and development work starts creeping over the false "deadline"
  • We end each sprint not knowing what is actually "done", and any bugs that are found have probably already been merged into whatever common branch the developers work from

QA people are in a really tough position as they are always sort of passively pressured to not "hold up" the sprint. They know that everyone wants to see a clean sprint where every item is signed off on by QA by the last day. And the double whammy is that they get heat when bugs are found in production.

This is one of those essential tensions in Scrum-based development that I've come to accept is not resolvable. I have never seen manual QA fit neatly into Scrum. 

As I see it, there are two options to break the tension:

  • Do manual QA, and abandon sprint-based methodologies (try Kanban, for example)
  • Use a sprint-based methodology, and avoid manual QA (some teams fully automate testing)
In my experience, organizations do neither of the above, and the tension continues.

Continuity of Leadership

There's a Kafkaesque situation that develops in companies that cannot retain senior employees. A team feels a sense of urgency about shipping software, but no one can tell them what to build.

Engineers are responsible for making the products that sustain the company, but there's no one around who can say what the products should do exactly.

It's like a movie where the protagonist wakes up every morning with no knowledge of what happened the day before, and tries to piece together his identity from artifacts scattered about.

Big organizational initiatives fail repeatedly because there's no one around from the beginning to the end of the initiative. People working on the initiative find that they can't even remember why the initiative was necessary. The leader who championed the initiative isn't around to take credit for its completion, so there's no one working on the initiative at any given time who cares about its completion.

Cross-team relationships barely exist because whoever is leading one team at the moment doesn't know the leaders of the other teams and has no history with them.

Leaders are continually "backfilling" for other leaders that left, being asked questions that they can't answer.

Morale stays at a low simmer. People show up to work each Monday morning knowing they won’t be productive. Another week where they won’t know if they did good work or not, because no one can define the goal.

An organization without continuity of leadership is an organization with amnesia.

Daring to Care

The most effective technical leader I ever worked with had a track record for coming onto a project and whipping it into shape. His ideas were not groundbreaking. He was not a genius engineer. He was a smart guy, but not necessarily the smartest guy in the room. He wasn't an expert politician, or a charmer. His superpower was that he simply cared more than anyone else around him about the project's success, and he would not back down when implementing improvements. When he noticed an area for improvement, he just did whatever it took to fix it. He would calmly but with absolute persistence run through his argument with whoever he needed to convince that it was the right thing to do. It didn't matter how many people he had to convince, how high up they were, how difficult it was to convince them. He just wouldn't stop if he knew he was right. No improvement was too small or too big. I remember him saying to me once that he didn't understand why people around him would acknowledge obvious problems, but not fix them themselves. What I didn't say, but was thinking silently, was "because no one else here cares as much as you do."

There's risk that comes with caring about something. If I take the initiative to fix the slow build process, then I'm now responsible for it. If I break something, everyone's going to look at me. I might have to talk to the Infrastructure team. Jeez, those guys take so long to get back to you. I'll have to open tickets. I might have to bother people I don't know well to make my task their priority. The build process works now, right? Yeah, it takes longer than it probably needs to, but it works. Do I really care enough to take all this on?

The individuals who rise to the top aren't always the smartest, the most creative, the most charming. They just give a damn. They care more than the people around them. When someone truly cares, they will find no shortage of problems to solve around them. They will find solutions. They will push through objections. They will argue with people. They will take on responsibility for things they don't strictly need to, things no one asked them to take responsibility for.

The open question is: How do you get someone to care? Why do some people care so much more about a project than those around them? Those people are worth their weight in gold.

Movin' Tickets

Recently I was re-reading Joel Spolsky's classic blog post The Joel Test: 12 Steps to Better Code. I hadn't read that post in many years. Although a lot of the advice in that post seems almost quaint now, as many of the practices it encourages are ubiquitous and taken for granted in 2024 (Joel wrote that post in 2000), there is one passage that seems just as fresh as ever to me...

...project managers had been so insistent on keeping to the “schedule” that programmers simply rushed through the coding process, writing extremely bad code, because the bug fixing phase was not a part of the formal schedule. There was no attempt to keep the bug-count down. Quite the opposite. The story goes that one programmer, who had to write the code to calculate the height of a line of text, simply wrote “return 12;” and waited for the bug report to come in about how his function is not always correct. The schedule was merely a checklist of features waiting to be turned into bugs.

One of my frustrations with Scrum, or with many teams who say they are "doing Scrum" is the obsession with the Sprint. Teams develop a short-term mindset, where all things begin and end within a two-week period.

We have some sort of "Board" where all the tickets (or PBIs, or cards, or whatever you call them) are shown in one of several different columns each representing a "status" of the ticket. A ticket starts in the far-left column on the first day of the sprint, and by the last day of the sprint, it must be in the last column. That's how we know we had a "good sprint".

Obviously the tickets on the board are just an abstraction representing work. But it's easy to get obsessed with this abstraction. Instead of making our users' lives easier and adding valuable features to the product they use, we're just moving tickets across a virtual board, sprint after sprint.

I always think it's fascinating to hear the language people use to talk about a team's work. In standup, people will say they plan to "have that ticket moved over" today. The team's manager might talk about how "good the board looks" today. In a retrospective meeting at the end of a sprint, the team might talk positively about how quickly tickets were "moving across the board" during that sprint.

The people on the team actually doing the work know they're doing well when they've moved a ticket from one column to a column to the right of that column. This is what they optimize for: efficient ticket-moving.

The necessary work of software engineering that doesn't have a ticket on the board feels downward pressure. A thorough code review for one ticket takes ticket-moving time away from the reviewer. If there are issues to be corrected, then the ticket being reviewed is stalled in its own rightward journey.

QA people on the team are in a difficult position of doing their quality assurance on tickets that are just to the left of the ticket's final destination--the place we all want it to be.

The whole team is incentivized to make sure all the tickets on the board are in the right-most column on the final day of the sprint. As in Joel's anecdote above, bugs found later merely become new tickets to move from left-to-right in a future sprint. Long-term concerns like sound architecture don't have a place on the board. 

When a team judges its effectiveness based on the movement of virtual tickets from one status to another, it can lose sight of the big picture. Who is ultimately benefiting from these ticket movements? Why are we moving them exactly? Where do they come from?

I think it's important that teams talk about their work, at least occasionally, without mentioning tickets. What are we accomplishing at a higher level? What are our users saying about our work? How is the business that pays our salaries benefiting from our work?

Surely we're not just movin' tickets.

Vision

Vision is so important in software development. Without the engineers understanding the overall vision, they can't resolve ambiguity in their daily work without consulting someone who holds the vision.

If the engineers don't understand why they're doing any of these things, then they can't fill in the gaps logically. They can't suggest improvements, improvise, or have confidence that they're moving the organization closer to the vision. Teammates talk past each other. One person has more of the vision than another, but doesn't know that. Misunderstandings are common. The track being laid from each end doesn't meet up in the middle.

Every little bit of vision transmission compounds in value. The decisions we make today form the foundation for work that comes later. A misunderstanding in vision today requires re-work tomorrow, a week from now, a month from now.

One of the things that can get left behind in the just-in-time fashion of Agile sprints is that the team can get lost in the weeds. We have to remember that we're building toward a significant milestone of some kind for the business, not just a random sequence of tasks.

It's difficult when backlog items are being entered by one person or a small group of people separate from the engineers and QAs who will be actually building and testing the stuff. They get queued up and drip-fed every two weeks to the broader team. But often there's no shared context transmitted to the whole team about what broad goal we're doing all these tasks for.

Sprint goals are tricky to set. But if a team can't ever seem to define a clear sprint goal, that's almost certainly a sign that the team is lacking a vision.

People like to make fun of heavyweight methodologies like SAFe, but doing a Program Increment Planning event--although tedious--sure does get everyone on the same page with a shared vision for the next few months of work.

Most of us are not working on the Manhattan Project--the end goal should not be obscured from the individuals making the parts. In fact, the end goal should be so clear to everyone that any person on the team could describe it clearly in their own words.