Dave Snowden

Statistical tails wagging the dogs of truth

RSS Feed

2: Confusing correlation with causation and the paucity of good data This seems to have been an innate feature of management science and to a lesser extent social science considered more broadly.   It is a growing problem in health care with statistics seeming to dominate over biology in respect of evidence.  I’ll write more on the health care issue in the new year but for the moment I want to focus on management science.   The general pattern of serious text books is to study a group of organisations that have a quality we desire or fear (and the boundary between those is more blurred that we think) and identify those aspects of the organisations which can be clearly identified by the researcher.   That statistical engines then run and correlations are identified which are then distilled into prescriptive or advisory recipes for managers seeking to know outcome before action.  There seems to me to be a correlation on the prescritpive-advisory scale between snake oil and integrity but I will leave that for another day.   There is a variation on this which involves misappropriating examples to fit a pet theory but that will be my subject for tomorrow. Now there are a range of problems with this approach, both in theory and in practice before we even come to the subject of this post.   They include, but are not limited to:

  1. Reliance on self-reported data from responsible managers who have a vested interest in reporting success, or may (and this is oh so common) genuinely think that things are fine as the informal networks make things work despite management action and design.    I’ve seen over a hundred knowledge management articles for peer review over the years which depend for their data on the reports of people in the knowledge management role.   Few if any go deeper and do the basic ethnographic work.
  2. Regardless of that issue we also have the wider problem of the way people report history depending on whether their actions in that history are perceived as successful or as a failure.   When I was in IBM we did a series of projects on outsourcing contracts which demonstrated clear differences in reporting facts before the customer decision was known.  In simulation environments teams told they had succeeded in a fictional case reported different facts (not just different perspectives) from teams told they had failed with identical source data.
  3. Ignoring the wider context within which the data capture has taken place.  Again I have reviewed articles for journals where the researcher had clearly stated that they have done this to reduce the number of variables.   That as they say is the tail of a statistical restriction wagging the dog of truth.  The reality of a complex systems (and most of the time these are the systems being studied) is that the same thing will only happen again the same way twice by accident.

So we can’t trust the data, but neither can we trust the correlation other than as an indicator of something we should pay attention to.   Part of the problem here is an attempt to create objectivity that will reduce dependence on human judgement when such an attempt is wrong in both theory and practice.   But even if we have a correlation it does not mean we have a causal link.   I frequently satirise this by pointing out that a correlation between regular bowel movements and good leadership should not involve diverting your personal department to toilet monitoring of new recruits. There may be a link to lack of stress but there is not a causal link. There are often simpler explanations that the correlations anyway that someone with a multi-disciplinary background will see.  When I read Good to Great I got a lot of learning from what is a well researched book.  But I don’t buy the recipes implicit in the text, if you look at the examples chosen for the study they are all dominant predators, so the rest of ecology self-organised around them.  Some of the examples have got a little embarrassing of late as well. I frequently say that if you don’t understand why you should not replication a what.   That is not to say we can’t learn from correlations, in particular they often allow us to ask interesting questions.  But they are not causal and statistics alone do not provide an explanation of why.  It’s why I don’t trust most of health care evidence as it is in the main statistics with only vague hypotheses as to the underlying biology.   In the development sector which is inherently culturally complex (which is a level over and above ordinary complexity) it is plain bloody dangerous.  In Industry and Government it is often harmless as people have learnt to parrot the language of the new initiative while carrying on as normal.  In practice that is more dangerous as tension builds up in the system

Top