There’s been a good bit of discussion lately on the Global Interwebs over a recent blog post by the wine-data-focused David Morrison (to which I was alerted by intrepid friend-of-1WD Bob Henry).
In that post, Morrison puts the scores of two of Wine Spectator’s then-critics-both-named-James, James Laube and James Suckling, through the data-analysis wringer, focusing on scores they gave to wines as part of WS’s “Cabernet Challenge” of 1996.
Generally speaking, Morrison’s blog post, while enviably thorough, can justifiably be criticized as much ado about nothing, considering that no one in the right minds could draw any statistically relevant conclusions from such a small data set. The summary version is that he found a high level of disagreement in the scores that the two Jameses gave to the same wines. Morrison draws out some interesting suggestions from this finding, though, primarily about the use of numbers when evaluating wine quality; to wit (emphasis is mine):
“The formal explanation for the degree of disagreement is this: the tasters are not using the same scoring scheme to make their assessments, even though they are expressing those assessments using the same scale. This is not just a minor semantic distinction, but is instead a fundamental and important property of anything expressed mathematically. As an example, it means that when two tasters produce a score of 85 it does not necessarily imply that they have a similar opinion about the wine; and if one produces 85 points and the other 90 then they do not necessarily differ in their opinion.“
So… where have we heard that before?
Morrison gets to his point a different way than I did (and by that, I mean not only via data analysis, but also more eloquently and in about one-third as many words), but the point remains the same: specific numeric values are just a sucky way to talk about subjective experiences (something that the medical field has known for a long, long time), and wine criticism will always have large subjective elements baked into it.
Here’s a recap of my version of a similar conclusion (with newly-added emphasis):
“Wine ratings are most often presented via scales that imply scientific precision, however they are measuring something for which we have no scientifically reliable calibration: how people sense (mostly) qualitative aspects of wine. Yes, there may be objective qualities about a wine that can indeed be somewhat calibrated (the presence of faults, for example) but even with these we have varying thresholds of detection between critics. That’s important because it means that the objective (i.e., measurable) quantities of those elements are not perceived the same way by two different reviewers, and so their perception of the levels of those elements cannot reliable be calibrated.
But it’s the subjective stuff that really throws the money wrench into the works here. How we perceive those – and measure our enjoyment of them – will likely not be fully explainable in our lifetimes by science. That is because they are what is known as qualia: like happiness, depression, pain, and pleasure, those sensations can be described but cannot effectively be measured across individuals in any meaningful way scientifically.
Yes, we can come to pretty good agreement on a wine’s color, and on the fact that it smells like, say, strawberries. After that, the qualia perception gets pretty tricky, however: my perception on how vibrantly I perceive that strawberry aroma might be quite different from yours. Once that factors into how you and I would “rate” that wine’s aroma, we start to diverge, and potentially quite dramatically at that.”
Add to this quagmire the penchant of humans to treat numeric values as fungible (see Morrison article quote above), and you have a recipe for a not-so-great consumer experience when using specific numbers to rate a wine, and then comparing those specific numbers across critics, particularly when those numbers are stripped of their original context (which is, oh, just about every time they are presented…).