The case for full frontal analysis – thoughts on Claesson et al. 2012

One of the tricky things about complex statistical methods is trying to display and interpret results eloquently. Multivariate analysis on high-throughput microbial community data generally requires several layers of dimensional compression before a clean answer emerges (i.e., relativize community abundances, calculate beta-diversity using one of many different indices, ordinate, cluster, and discriminate between groups, and then compare to a myriad of environmental data that have been similarly dimensionally compressed, etc). Add to this task that each layer of analysis carries with it assumptions that the analyst should consider and confirm. The visualization of such results might end up with points floating in undefined multidimensional space being pushed around by arrow vectors, each with a cryptic environmental variable hanging from the tip – and some floating factor-level centroids if you’re lucky. Immerse all of these points in thousands of OTU weightings, and voila – an unreadable, but groundbreaking result (see figure below – the default constrained ordination graphical output for the very simple dune sample dataset – then scale that up to a HTS dataset!). I assume I am not alone in occasionally giving in to overly-simplified ways of presenting a subset of what I though was a comprehensive set of results, simply because the results/visualizations were too difficult to present (or too complex for reviewers to review!). Being good at extensive analytical techniques entails being good at displaying such results in a way that describes why you chose to go to such lengths.

Screen Shot 2013-03-24 at 7.47.11 PM
Poor form – default constrained ordination output. This is readable for no one ever, though they do end up in publication not too much better than this.

For this reason I am ecstatic when I see complex community analysis performed unabashedly, in a high-profile venue, and done well. Such is the case with a paper that came out last summer in Nature. Our lab has been discussing this paper as ‘analysis porn’ – in a good way. Claesson et al. approached some very complex questions and used very complex methods to answer them well. One reason I really like their flashy approach to analysis: it got me (and many others) to read a paper about elderly people’s stool samples, and to get a lot out of it! Health-related results aside, I wanted to focus purely on the analytical choices made in this paper, and specifically their R mastery and data visualization chops – with some critique thrown in for good measure.

Their figures are complex and densely packed with facets and caveats, the way microbial data really exist. They also execute a few nice visuals that others often fail in attempting. An example: the stacked relative abundance bar plots in Figure 1. These often go so terribly wrong, and yet so many groups love to include them over more important figures. In this case the plots are situated as a complimentary piece of a much larger and more telling figure – they give the results context. The colors they used are also reasonably easy to distinguish (even for color-blind readers) and are phylogenetically coherent (i.e., blueish colors are Bacteroidetes, and reddish are Firmicutes, etc).  The ordination plots in the same figure are shown for both weighted unifrac and unweighted, which is generally omitted but informative when included. The comparison of these helps to understand some aspects of how abundance influences the results. On the downside, I can never understand why anyone would want to project points against a black background. This is default in some software, but since these figures appear to have been produced in R, I assume it was intentional? While this choice might help to distinguish the lighter colors, it is to the detriment of the other colors, and equally hard on toner cartridges and eyes.

Screen Shot 2013-03-24 at 7.04.06 PM
Figure 1 from Claesson et al 2012

The part I like most, though, is the presentation of Figure 2 – both the inclusion of Procrustes analysis, and the nice, adjacent comparisons of food weights to gut samples. By default these might all be put on the same biplot, and would be a mess requiring the stripping of important complexity. Separated, they lend depth and context to the results. Points in (a) are positioned by multivariate diet, with (b) the strongest food drivers for comparison. Then the (black again) PCoA plots show quite well that diet, living situation, and microbiota vary together, and that sample dissimilarity depends heavily on whether abundance is considered.

Screen Shot 2013-03-24 at 7.02.17 PM
Figure 2 from Claesson et al 2012

Figure 4 tops it all off with nicely displayed and nuanced Wiggum plots surrounding more ordinations, this time displayed as dispersion groups, and all layered with health and diet correlates. There is a LOT to digest in this figure, and I like the way that it is displayed for the reader to slowly digest, instead of simply boiling it down to a simplified figure that can easily be explained in the legend. Though it is an attractive figure, it is decorated construction rather than constructed decoration, to paraphrase the architect Pugin. And the thrust of the figure is a really important revelation – that we can perhaps arrive at a definition ‘healthy’ microbiome and an ‘unhealthy’ one, or at least indicators along the spectrum. This figure is complex and necessarily so to explain this important result.

Screen Shot 2013-03-24 at 7.05.20 PM
Figure 4 from Claesson et al 2012

It is curious that the authors spent considerable time refining the visual appeal of most of the graphics, but decided that rectangular, pixelesque data points would suffice for some ordinations and NMR results. Against the backdrop of the rest of the well-manicured figures, it seems distractingly unfinished.

The last think I’ll say about this paper is their extensive use of supplemental methods. A paper like this simply cannot leave details of analysis to be guessed upon. The authors did a nice job of including lots of details and analytical caveats for those patient enough to read. It’s a nice touch and one I certainly look for when reading a paper. This work by Claesson and colleagues certainly gives me something to think about the next time I’m faced with watering down the complexity in a study simply for an easier story to tell.

___________________________

citation:

Claesson MJ, Jeffery IB, Conde S, Power SE, O’Connor EM, Cusack S, Harris HMB, Coakley M, Lakshminarayanan B, O’Sullivan O, Fitzgerald GF, Deane J, O’Connor M, Harnedy N, O’Connor K, O’Mahony D, van Sinderen D, Wallace M, Brennan L, Stanton C, Marchesi JR, Fitzgerald AP, Shanahan F, Hill C, Ross RP, O’Toole PW. 2012. Gut microbiota composition correlates with diet and health in the elderly. Nature 488:178–184. http://dx.doi.org/10.1038/nature11319