Risk Kanban Board

A couple of days ago I was talking with a friend about the talk I gave last year during the Lean Kanban Europe tour and I realised that my unusual Kanban board may be able to solve my recent problem: I want to visualize and easily manage risks. The board is a horizontally sliced board with four horizontal lanes and was originally designed for an operations team (note: for risk management I won’t need WIP limits):

Each lane represents a certain severity using the visualization idea from Jeff Anderson. The severity (risk) comes from the correlation between time and impact: the later the team handles an issue the higher the impact is. Moreover, the cost or effort to “handle” a certain level is proportional to the area under the curve:

Level 1 category has an “infinite impact” and an unknown cost of delay. The problem of this level is that it is completely unpredictable. The costs can be extremely low or extremely high and this high uncertainty forces us to take care of it immediately. For example, the lead developer of the project cannot come to work any more. We can gamble and say that we’ll survive without him or we try to promote somebody else and train very-very fast. Level 1 doesn’t allow us to do proper risk management and mitigation, it forces us to do something. So it is better to take actions before a risk turns into level 1, but when it is on level 1, we have to take care of it immediately.

Level 2 category means that at the moment we know something is going to happen and we are aware of the consequences of a late action. For example, we know that the lead developer is leaving the company and we have 30 days to find a new lead. The more we wait with the promotion or the hire, the higher the cost of delay will be; it will be inversely proportional to the insufficient domain knowledge of the new lead developer. In other words: the more we wait the higher the risk reduced quality and delays in the project because of the domain knowledge of the new lead.

As you can see, if a level 2 item is not taken care of in time, it will turn into a level 1 item:

Level 3 is similar to level 2, but it has a free delay before turning into level 2:

Which means that there is a period of time when we can ignore the risk, but we cannot do it forever. Following our example, we know that the lead developer will leave the country later this year, so finding a replacement is not an immediate issue, but will be eventually.

Level 4 is a collection for nice to have issues or low risk level items. As I used to say: “this is the place where all the refactoring and code improvement tasks come to die”.

Like the other boards, this one also needs daily updates (the previously mentioned operations team did it twice a day). The risks (work items) have to be reviewed from top-right to bottom-left so that the team talks about the most “painful” risk first. The rest of the risks should be revisited and checked whether they have got closer to a level transformation or need to change level:

And finally, when a risk is sorted out - or a work item is done -, it gets into the done column. Similarly to other Kanban systems this approach provides measures such as lead time (time needed to mitigate a risk on a certain level) and throughput (number of risks mitigated) and needs to be continuously improved in order to be an effective system.


comments powered by Disqus