Estimating Effort – An Explicitly Implicit Approach

It is difficult to make predictions, especially about the future.Unknown

Sage advice.

So why bother estimating the amount of work needed to complete a product backlog item? After all, since estimates are about the future the probability is high that they will be wrong. Actually, they may very well be guaranteed to be wrong. It’s just that some of the guesses happened to match what the effort ended up to be and just look like they were “right.”

I’ve written in the past expressing my thoughts about estimating the effort needed to complete product backlog items, particularly with respect to story points. I believe working to find a relative gauge to how well teams are estimating work is important. Without them, cognitive biases such as the optimism bias and planning fallacy can significantly distort a project delivery timeline. However, the phrase “story point” is burdened with a lot of baggage. It has been abused and misused such that invoking the phrase often causes more harm than good.

I’ve been experimenting recently with a different approach to estimating effort. The method I’ll describe in this post got a bit of a boost after listening to a recent interview with Psychologist and Nobel laureate Daniel Kahneman. In this interview, Kahneman describes an experience he had while serving in the Israeli army some sixty years ago. He was assigned the job of setting up an interview process that would determine how well a recruit would do as a combat soldier. For this process, he selected six traits and instructed the interviewers to ask questions designed to evaluate each trait independently and score them. The interviewers were not happy with this approach. As a compromise, Kahneman instructed the interviewers, when they were finished asking about the six traits, to close their eyes and just jot down a number they felt matched how good a soldier the recruit might be. What he discovered:

When we validated the results of the interview, it was a big improvement on what had gone on before. But the other surprise was that the final intuitive judgments added, it was good. It was as good as the average of the six traits, and not the same. It added information, so actually we ended up with a score that was half determined by the specific ratings, and the intuition got half the weight. That, by the way, stayed in the Israeli army for well over 50 years.Daniel Kahneman

This intuitive evaluation made by the interviewers is similar to what Agile methods ask of development teams when determining a value for “story points.” T-shirt sizes, planning poker, dot voting, affinity mapping and many similar techniques are all designed to elicit an intuitive sense of the effort involved. If there is a dependency between team members, than a dialog follows to understand what that discrepancy is all about. This continues until there is alignment on what the team believes the effort to be. When it works, it works well.

So on to the details of the approach I’ve been experimenting with. (It doesn’t have a name yet.) The result of this approach is a number I call the “effort value.” The word “value” is a reference to the actual elementary mathematics value being derived. Much like the answer to the question “What value results from adding 2 and 2?” Answer: 4. The word “value” also suggests an intrinsic worth, something beyond a hard number. My theory is that this will help teams think beyond the mere number and think also about the value they are delivering to stakeholders. The word “point” correlates to a hard number and lacks any association to intrinsic worth or value.

Changing the words introduces a simple and small shift that nonetheless has a significant impact. With the change, teams are more open to considering a different approach to determining estimates.

So how is the effort value derived?

I begin by having the team define 4-5 characteristics or attributes that, to them, describe what they mean by “effort.” It is important for the team to define these attributes. By doing so, they own the definition and it becomes much harder for them to dismiss the attributes as “someone else’s” and thereby object to their use in deriving an effort value. These attributes can be anything that is meaningful to the team. Examples:

  • Complexity – Is the work straightforward (e.g. code a bubble sort function) or does it involve interrelated systems (e.g. code a predictive inventory control algorithm)?
  • Dependencies – How dependent is the product backlog item on other backlog items or other teams?
  • Familiarity – Is this work very similar to work the team has done in the past or something quite new?
  • Information – Is the detail in the product backlog item complete? Are the acceptance criteria and definition of done clear?
  • Technical Debt Risk – Does the PBI require any refactoring of related code? Is any technical debt being incurred with the PBI?
  • Design Stability – Is there a lot of discovery and exploration needed to complete the PBI?
  • Confidence for Completing PBI within the Sprint – This category may roll up several categories.

The team can define any attribute they wish. However, there are a few criteria to consider:

  • Keep the list limited to 4-6 attributes. More than that risks turning the derivation of an effort value into the equivalent of a product backlog item navel-gazing exercise.
  • Time cannot be one of the attributes.
  • The attributes should be reasonable. Assessing a product backlog item’s effort value by evaluating it’s “aura” or the current position of the stars are generally not useful attributes. On the other hand, I’ve listened to arguments against evaluating estimates in terms of “complexity” as being similarly useless. I see the point of those arguments, but my view is that the attributes must first and foremost be meaningful to the entire team. In the end, it’s an educated guess and arguments about the definition of terms like “complexity” are counterproductive to the overall intent of deriving an effort value.

Each of these attributes is then given a scale, the same scale for each attribute – 1 to 10, 1 to 15 – whatever the team feels is most appropriate. The team then goes through each of these attributes and evaluates the product backlog item attribute on the scale. The low number on the scale represents very little impact. If dependency, for example, is one of the attributes then a 1 might mean that the product backlog item is entirely self-contained. A 10 might represent a case where the product backlog item is dependent on several other product backlog items or perhaps the output from other teams.

When this is done, ask the team where on the modified Fibonacci scale they think this particular product backlog item’s effort value should be. If they’re struggling you can do the math: find the average for all the attributes and match that number in the modified Fibonacci scale. If the average is a decimal, for example 3.1, match the value to the next highest modified Fibonacci scale number. In this case the value would be 5. Then ask the team if they feel that number it’s a good representation of the effort value for the product backlog item.

This may seem like a lot of unnecessary gyrations, but for technical people it’s a simple process they can understand. The bonus is a number they can calculate. The number isn’t what’s important here. What’s important is the conversation that happens around the attributes and what the team feels about the number that results from the conversation. This exercise is meant to develop their intuitive muscles for considering multiple aspects and dimensions behind the “effort” needed for them to get the work done.

Use this process enough times and eventually calculating the average can be dropped from the process. Continue using this process and eventually calculating the numbers for the individual attributes can be dropped from the process. I don’t know if it’s a good idea to drop the use of the attributes for generating the needed conversation around the effort needed, but it will certainly be valuable to reconsider the list of attributes from time to time so as to fine tune the list to match what the team feels is important.

With this approach I’m turning the estimation process on its head (or back on its feet, if Kahneman is right.) Rather than seek the intuitive response first (e.g. t-shirt size) and elicit details later if there is a mismatch between team members, this method seeks to better prime and develop the team’s intuition about the effort value by having them explicitly consider a list of self-selected attributes (or traits) for effort first and then include an intuitive evaluation for effort.

Don’t try to form an intuition quickly, which was what we normally do. Focus on the separate points, and then when you have the whole profile, then you can have an intuition and it’s going to be better. Because people form intuitions too quickly, and the rapid intuitions are not particularly good. If you delay intuition until you have more information, it’s going to be better.Daniel Kahneman

Show your work

A presentation I gave last week sparked the need to reach back into personal history and ask when I first programed a computer. That would be high school. On an HP 9320 using HP Educational Basic and an optical card reader. The cards looked like this:

(Click to enlarge)

What occurred to me was that in the early days – before persistent storage like cassette tapes, floppy disks, and hard drives – a software developer could actually hold their program in their hands. Much like a woodworker or a glass blower or a baker or a candlestick maker, we could actually show something to friends and family! Woe to the student who literally dropped their program in the hallway.

Then that went away. Keyboards soaked up our coding thoughts and stored them in places impossible to see. We could only tell people about what we had created, often using lots of hand waving and so much jargon that it undoubtedly must have seemed as if we were speaking a foreign language. In fact, the effort pretty much resembled the same fish-that-got-away story told by Uncle Bert every Thanksgiving. “I had to parse a data file THIIIIIIIIIS BIG using nothing but Python as an ETL tool!”

Yawn.

This is at the heart of why it is I burned out on writing code as a profession. There was no longer anything satisfying about it. At least, not in the way one gets satisfaction from working with wood or clay or fabric or cooking ingredients. The first time I created a predictive inventory control algorithm was a lot of fun and satisfying. But there were only 4-5 people on the planet who could appreciate what I’d done and since it was proprietary, I couldn’t share it. And just how many JavaScript-based menu systems can you write before the challenge becomes a task and eventually a tedious chore.

Way bigger yawn.

I’ve found my way back into coding. A little. Python, several JavaScript libraries, and SQL are where I spend most of my time. What I code is what serves me. Tools for my use only. Tools that free up my time or help me achieve greater things in other areas of my life.

I can compare this to woodworking. (Something I very much enjoy and from which I derive a great deal of satisfaction.) If I’m making something for someone else, I put in extra effort to make it beautiful and functional. To do that, I may need to make a number of tools to support the effort – saw fences, jigs, and clamps. These hand-made tools certainly don’t look very pretty. They may not even be distinguishable from scrap wood to anybody but myself. But they do a great job of helping me achieve greater things. Things I can actually show and handle. And if the power goes down in the neighborhood, they’ll still be there when the lights come back on.

Heroes

A bit of a break from what could be considered the usual theme of this blog to recognize several amazing heroes from World War II – Major General Maurice Rose and Corporal Clarence Smoyer. Both have a connection to Colorado, at least for today.

General Rose was educated in Denver and graduated from East High School in 1916. He lied about his age so that he could join the Colorado National Guard after graduating high school. Seventy four years ago today, General Rose was killed in action. He was the highest-ranking American killed by enemy fire in the European Theater of Operations during World War II. Rose Medical Center in Denver is named in his honor.

Corporal Smoyer, at age 95, is still with us. He served under General Rose and was in Denver today for a book signing – Adam Makos’ latest book, “Spearhead.” It was well worth the two hour wait to shake the man’s hand, thank him for his service, and – a distant third on the list – receive an autographed copy of the book.

(click to enlarge)

The plan was for Corporal Smoyer to ride in on a tank and stop in front of Union Station, the site of uncounted final “goodbye’s” during the war. Eighty trains a day, the Union Station historian said, arrived and departed from Denver in the early 1940’s carrying many young men on their way to war. Given it was a cold, wet, rainy-snowy day in Denver, the turnout was actually quite good.

Corporal Smoyer had ample escort!

(click to enlarge)

At long last, the Corporal arrived, mounted atop a WWII era Stewart tank. Although not the tank in which Corporal Smoyer went to war, it is the only civilian owned operable tank in Colorado.

(click to enlarge)

It took a little time to help the 95 year old soldier down from his mount. With his feet on solid ground, Corporal Smoyer stepped over to the Stewart tank and hung his handicap parking placard on the cannon barrel. Well play, sir. Well played.

(click to enlarge)

After a brief recounting of several stories and the unveiling of a commemorative painting to be hung in Union Station, he was off to begin signing copies of the book.

Today I shook the hand of a hero and am feeling profoundly grateful to Corporal Smoyer and all the men and women from his time that defeated a fearsome evil and preserved the Freedom of which I am a direct beneficiary.

(click to enlarge)