In a recent conversation with colleagues we were debating the merits of using story point velocity as a metric for team performance and, more specifically, how it relates to determining a team’s predictability. That is to say, how reliable the team is at completing the work they have promised to complete. At one point, the question of what is a story point came up and we hit on the idea of story points not being “points” at all. Rather, they are more like currency. This solved a number of issues for us.
First, it interrupts the all too common assumption that story points (and by extension, velocities) can be compared between teams. Experienced scrum practitioners know this isn’t true and that nothing good can come from normalizing story points and sprint velocities between teams. And yet this is something non-agile savvy management types are want to do. Thinking of a story’s effort in terms of currency carries with it the implicit assumption that one team’s “dollars” are not another team’s “rubles” or another teams “euros.” At the very least, an exchange evaluation would need to occur. Nonetheless, dollars, rubles, and euros convey an agreement of value, a store of value that serves as a reliable predictor of exchange. X number of story points will deliver Y value from the product backlog.
The second thing thinking about effort as currency accomplished was to clarify the consequences of populating the product backlog with a lot of busy work or non-value adding work tasks. By reducing the value of the story currency, the measure of the level of effort becomes inflated and the ability of the story currency to function as a store of value is diminished.
There are a host of other interesting economics derived thought experiments that can be played out with this frame around story effort. What’s the effect of supply and demand on available story currency (points)? What’s the state of the currency supply (resource availability)? Is there such a thing as counterfeit story currency? If so, what’s that look like? How might this mesh with the idea of technical or dark debt?
Try this out at your next backlog refinement session (or whenever it is you plan to size story efforts): Ask the team what you would have to pay them in order to complete the work. Choose whatever measure you wish – dollars, chickens, cookies – and use that as a basis for determining the effort needed to complete the story. You might also include in the conversation the consequences to the team – using the same measures – if they do not deliver on their promise.
An experienced scrum master describes their work cycles as going “from being very busy during sprint end/start weeks to be [sic] very bored.” While this scrum master works very hard to fill in the gaps with 1:1’s with the team members and providing regular training opportunities, they nonetheless ask the question, “Does anyone have any suggestions of things I am maybe not doing that I should be doing?” One response included the following:
“Now, it could be that you have worked to create a hyper-performing team and there is no further room for improvement. A measure of this is that velocity (or similar metric) has increased by an order of magnitude in the last year.
However, the most likely scenario is that you and your team have become ‘comfortable’ and velocity has not increased significantly in the last few Sprints and/or there is a high variance in velocity.”
This reflects a common misunderstanding of “velocity” and its confusion with “acceleration.” (It also reflects the “more is better” and “winners vs losers” thinking derived from the scrum sports metaphor and points as a way of keeping score. I’ve written about that elsewhere.) Neither does the commenter understand what “order of magnitude” means. A velocity that increases by an order of magnitude in a year isn’t a velocity, it’s an acceleration. That’s a bad thing. This wouldn’t be a “hyper-performing” team. This would be a team headed for a crash as a continual acceleration in story points completed is untenable. More and more points each sprint isn’t the goal of scrum. A product owner cannot predict when their team might complete a feature or a project if the delivery of work is accelerating throughout the project.
Assuming a typical project, something that continues for a year or more, the team and the project will eventually crash as they’ve been pressured to work more and more hours and cut more and more corners in the interests of completing more and more points. The accumulation of bugs, small and large, will slow progress. Team fatigue will increase and moral decrease, resulting in turn-over and further delays. In common parlance, this is referred to as a “death march.”
Strictly speaking, velocity is some displacement over time. In the case of scrum, it is the number of story points completed in a sprint. We’ve “displace” some number of story points from being “not done” to “done.” By itself, a single sprint’s velocity isn’t particularly useful. Looking at the velocity of a number of successive sprints, however, is useful. There are two pieces of information from looking at successive sprint velocities that, when considered together, can reveal useful aspects of how well a team is performing or not. The first is the average over the previous 5 to 8 sprints, a rolling average. As a yard stick, this can provide a measure of predictability. Using this average, a product owner can make a rough calculation for how many sprints remain before completing components or the project based on the story point information in the product backlog.
The measure of confidence for this prediction would come from an analysis of the variance demonstrated in the sprint velocity values over time. Figures 1 and 2 show the distinction between the value provided by a rolling average and the value provided by the variance in values over time.
In both cases the respective teams have an average velocity of 21 points per sprint. However, the variability in the values over time show that the team in Figure 1 would have a much higher level of confidence in any predictions based on their past performance than the team shown in Figure 2.
What matters is the trend, each sprint’s velocity over a number of sprints. The steady completion of story points (i.e. work) sprint to sprint is the desirable goal. Another way to say this is that a steady velocity makes it possible to predict project delivery dates. In real life, there will be a variance (up and down) of sprint velocity over time and the goal is to guide the project such that this variance is within a manageable range.
If a team were to set as its goal an increase in the number of story points completed from sprint to sprint then their performance chart might initially look like Figure 3.
Such a pace is unsustainable and eventually the team burns out. Fatigue, decreased moral, and overall dissatisfaction with the project cause team members to quit and progress grinds to a halt. The fallout of such a collapse is likely to include the buildup of significant technical debt and code errors as the run-up to the crescendo forced team members to cut corners, take shortcuts, and otherwise compromise the quality of their effort.  The resulting performance chart would look something like Figure 4.
All that said, I grant that there is merit in coaching teams to make reasonable improvements in their overall sprint performance. An increase in the overall average velocity might be one way to measure this. However, to press the team into achieving an order of magnitude increase in performance is a fools errand and more than likely to end in disaster for the team and the project.
 Lyneis, J.M, Ford, D.N. (2007). System dynamics applied to project management: a survey, assessment, and directions for future research. System Dynamics Review, 23 (2/3), 157-189.
The scrum framework is forever tied to the language of sports in general and rugby in particular. We organize our project work around goals, sprints, points, and daily scrums. An unfortunate consequence of organizing projects around a sports metaphor is that the language of gaming ends up driving behavior. For example, people have a natural inclination to associate the idea of story points to a measure of success rather than an indicator of the effort required to complete the story. The more points you have, the more successful you are. This is reflected in an actual quote from a retrospective on things a team did well:
We completed the highest number of points in this sprint than in any other sprint so far.
This was a team that lost sight of the fact they were the only team on the field. They were certain to be the winning team. They were also destine to be he losing team. They were focused on story point acceleration rather than a constant, predictable velocity.
More and more I’m finding less and less value in using story points as an indicator for level of effort estimation. If Atlassian made it easy to change the label on JIRA’s story point field, I’d change it to “Fuzzy Bunnies” just to drive this idea home. You don’t want more and more fuzzy bunnies, you want no more than the number you can commit to taking care of in a certain span of time typically referred to as a “sprint.” A team that decides to take on the care and feeding of 50 fuzzy bunnies over the next two weeks but has demonstrated – sprint after sprint – they can only keep 25 alive is going to lose a lot of fuzzy bunnies over the course of the project.
It is difficult for people new to scrum or Agile to grasp the purpose behind an abstract idea like story points. Consequently, they are unskilled in how to use them as a measure of performance and improvement. Developing this skill can take considerable time and effort. The care and feeding of fuzzy bunnies, however, they get. Particularly with teams that include non-technical domains of expertise, such as content development or learning strategy.
A note here for scrum masters. Unless you want to exchange your scrum master stripes for a saddle and spurs, be wary of your team turning story pointing into an animal farm. Sizing story cards to match the exact size and temperament from all manner of animals would be just as cumbersome as the sporting method of story points. So, watch where you throw your rope, Agile cowboys and cowgirls.
In Parkinson’s Law of Triviality and Story Sizing, I touched on the issue of relative expertise among team members during collaborative efforts to size story cards. I’d like to expand on that idea by considering several types of team compositions.
Team 1 is a tight knit band of four software developers represented in Figure 1.
Team 1 represents a near-ideal team composition for a typical software related project. However, the real world isn’t so generous in it’s allocation of near-ideal, let alone ideal, teams. A typical team for a software related project is more likely to resemble Team 2, as represented in Figure 2.
As Agile practices become more ubiquitous in the business world, team composition beings to resemble Team 3, as shown in Figure 3.
The mix now includes non-technical people – content developers and editors, strategists, and designers. Even assuming an equal level of experience in their respective domains, the company, and the business environment, there is very little overlap. Arriving at a consensus during a story sizing exercise now becomes a significant challenge. But again, the real world isn’t even so kind as this. We are increasingly more likely to encounter teams that resemble Team 4 as shown in Figure 4.
As before, the relative circle of expertise among team members can vary quite a bit. When a team resembles the composition of Team 4, the software developers (HTML5/CSS and C#) will have trouble understanding what the Learning Strategist is asking for while the Learning Strategist may not understand why what he wants the software developers to deliver isn’t possible.
When I’ve attempted to facilitate story sizing sessions with teams that resemble Team 4 they either become quite contentious (and therefore time consuming) or team members that don’t have the expertise to understand a particular card simply accept the opinion of the stronger voices. Neither one of these situations is desirable.
To counteract these possibilities, I’ve found it much more effective to have the card assignee determine the card size (points and time estimate) and work to have the other team members ask questions about the work described on the card such that the assignee and the team better understand the context in which the card is positioned. The team members that lack domain expertise, it turns out, are in a good position to help craft good acceptance criteria.
Who will consume the work product that results from the card? (dependencies)
What cards need to be completed before a particular card can be worked on? (dependencies)
Is everything known about what a particular card needs before it can be completed? (dependencies, discovery, exploration)
At the end of a brief conversation where the entire team is working to evaluate the card for anything other than level of effort (time) and complexity (points), it is not uncommon for the assignee to reconsider their sizing, break the card into multiple cards, or determine the card shouldn’t be included in the sprint backlog. In short, it ends up being a much more productive conversation if teammates aren’t haggling over point distinctions or passively accepting what more experienced teammates are advocating. The benefit to the product owner is that they now have additional information that will undoubtedly influence the product backlog prioritization.