[ Editor's note: this article is not an easy one to follow, because the topic is not an easy one to get your head around; intrepid readers will want to stick with it, though, because I think the conclusions are fodder for some amazing discussion on their implications on wine criticism. ]
The world of wine critique is fraught with logical contradictions.
Case in point: take this excerpt from a recent interview with critic James Suckling on Liv-Ex.com on the topic of evaluating wines while they are still in the barrel, as is often done during En Primeur in Bordeaux (emphasis mine):
“The key thing to remember is that the nose isn’t important at all. I learnt that from Daniel Lawton, one of the great negociants of Bordeaux. The important thing is the texture – the quality of the tannins and how they relate to the acidity and alcohol – and then the finish. Wines with long seamless finishes are really the great wines. It’s not all about power. It takes a long time before you can taste En Primeur properly. There’s a hierarchy in Bordeaux that helps as you can kind of figure out what should taste good. But to really understand how wines evolve you need a good 10 years of tasting.”
The logic issue here is that we know scientifically that the vast majority of our sensory experience in tasting wine comes aromatically and retro-nasally. So one (but not the only!) interpretation of the above quote is that En Primeur ratings are meaningless, or at least limited in value to consumers, because the aromas – and therefore the majority of the wine’s sensory experience – cannot be fully evaluated. The contradiction being that the wine world largely treats those ratings as not having any such limited usage.
Issues like that one crop up all over the place in the wine world, if you’re willing to look hard enough. And so it should be of little surprise to many of you when I tell you that the act of rating wines falls squarely into what is commonly called “bad science” in the scientific world…
To explain this, we first need to explore what’s meant by “bad science.” David Deutsch lays out a compelling definition in his excellent book The Beginning of Infinity: bad science offers explanations that are easy to vary, and therefore are not actually explanations of how things truly work.
For example, saying that someone lost a vintage of grapes to hail because of bad luck is bad science. It’s bad science even if that grapegrower has had many random unfavorable circumstances befall him before, a string of seemingly convincing “data” to bolster this claim. The data in this case are irrelvant because a) those random events cannot directly predict or explain the subsequent unpleasant events and b) we can substitute any manner of explanations in place of “bad luck” (“the will of the gods”for example) all of which are equally incorrect. In other words, the explanation is too easy to vary – accepting it is no better than acting on blind faith. That’s an over-simplification, but you get the drift.
Now, if we were able to trace back the meteorological events leading to that hail storm, and were able to gather enough data and causal links from the formation of the clouds to the weather events that triggered the hail above the poor guy’s vineyard, we would end up with an explanation and evidence (the formation of hail) that’s pretty difficult to vary. This is “good science” – it explains what happened, and the results could be used to accurately predict future similar events.
What does this have to do with criticism?
Wine ratings are most often presented via scales that imply scientific precision, however they are measuring something for which we have no scientificly reliable calibration: how people sense (mostly) qualitative aspects of wine. Yes, there may be objective qualities about a wine that can indeed be somewhat calibrated (the presence of faults, for example) but even with these we have varying thresholds of detection between critics. That’s important because it means that the objective (i.e., measurable) quantities of those elements are not perceived the same way by two different reviewers, and so their perception of the levels of those elements cannot reliable be calibrated.
But it’s the subjective stuff that really throws the money wrench into the works here. How we perceive those – and measure our enjoyment of them – will likely not be fully explainable in our lifetimes by science. That is because they are what is known as qualia: like happiness, depression, pain, and pleasure, those sensations can be described but cannot effectively be measured across individuals in any meaningful way scientifically.
Yes, we can come to pretty good agreement on a wine’s color, and on the fact that it smells like, say, strawberries. After that, the qualia perception gets pretty tricky, however: my perception on how vibrantly I perceive that strawberry aroma might be quite different from yours. Once that factors into how you and I would “rate” that wine’s aroma, we start to diverge, and potentially quite dramatically at that.
It’s a bit like you and I cutting our fingers accidentally when trying to open a wine bottle: ask us how we’d rate the pain on a scale of one to ten, and we might both say “2″ but there is NO calibration possible to ensure that our twos are meaningfully comparable – it only appears that way because we chose arbitrarily to use a numeric scale in reporting our perceptions of the pain. Substitute our perception of a wine’s balance for finger pain in that example and you can see we have a problem.
I am grossly oversimplifying this, by the way, in order to prevent this piece from ending up as ten thousand words (the qualia rabbit hole is very, very deep). The point is that the qualia can directly impact the rating, and so are not irrelevant.
Mistaking qualia perception as either accurately measurable or removing it as scientifically irrelevant has lead to a lot of “bad” science in the wine world, in some cases coming from what would otherwise qualify as “good” scientific data.
Interestingly, fine wine is acutely susceptible to this problem of differing qualia perceptions specifically because of how wonderful it is – it contains more complexity in offering more qualia (styles, flavors, primary and secondary aromas, textures, tannins, acidic structure, etc., and the senses of balance between them) and therefore a lot more room for error (of the scientific kind, I mean) between individuals.
A relatively recent example of this can be found in the Journal of Wine Economics’ publication of article by Robert T. Hodgson titled An Analysis of the Concordance Among 13 U.S. Wine Competitions. In that article, Hodgson gathered data on wines submitted to medal competitions and concluded that “the probability of winning a Gold medal at one competition is stochastically independent of the probability of receiving a Gold at another competition, indicating that winning a Gold medal is greatly influenced by chance alone.” The trouble was that Hodgson a) removed all of the factors that lead into how the judges came to their ratings, on the grounds that they are statistically irrelevant, which lead to him b) treating the qualia perception differences between judges and competitions as fungible when in reality they are not – they simply cannot yet be measured accurately (and may not be done so in out lifetimes, given the scientific complexity of that task). So in effect Hodgson made a critical error in assuming that the ratings in each competition are reached in way that can be measured scientifically and interchangeably. But they aren’t – the qualia rule the day when it comes to wine reviewing. Good approach, but the conclusion is bad science.
What does this mean for wine ratings?
Wine ratings (mine included!) are all at least partially “bad” science, and are not meaningfully comparable between reviewers, at least not scientifically, because all of them will at least partially perceive the qualia presented in those wines differently in ways that cannot be measured. This is true despite the fact that the scales are often comparable, and despite the fact that the wine world occasionally sports a serious hard-on for comparing ratings from different critics. It is true despite the fact that the wine business does not want it to be true. It is true because if wine ratings cannot be reliable compared in a scientifically meaningful way, then they should NOT be treated as fungible (even if everyone happens to be treating them that way now). They are NOT accurate science.
“So what?” you might ask, “aren’t they all close enough for government work? Don’t lots of wine critics appear to agree at least somewhat on wine ratings for many wines?”
The answer to that while many critics appear to agree, it is NOT close enough for any meaningful comparison, and we cannot safely assume that any two critics reach equivalent scores in exactly the same way (or even objectively). Why? Because we have no way to accurately measure their qualia perceptions, and so we have no way of estimating how close to objective accuracy those ratings are individually, let alone across individual critics. This assumes that there is some objective accuracy to those qualia perceptions (interestingly, The Beginning of Infinity does suggest that there might be objective qualia like beauty – but we are likely many generations away from getting a handle on that scientifically).
What does this mean for everyday imbibers? It means that you shouldn’t assume that different reviewers are rating wines in the same way or even on the same scale, even if those scales appear to be identical superficially.
Accepting any wine review, or even a collection of such reviews on the same wines, as an ultimate determinant of that wine’s present and future enjoyment across individuals is “bad” science, and no better than accepting them on blind faith.
There is a light at the end of this tunnel, though.
If you have found that a reviewer’s ratings and descriptors and – importantly! – their relative perception of the levels of those elements in a given wine at least seem to closely mirror your own, then you may have found someone who likely perceives fine wine qualia similarly to you. Seek out those reviewers and listen to them. Just don’t take their 95, A, 5 puffs, or any other scale to be replaceable by anyone else’s.